# Text analysis tasks with ChatGPT API

## Text analysis tasks:

- Text summarization
- Extraction of topics, named entities, etc.
- Sentiment analysis
- Translation to other languages
- Rephrasing to correct or address a need

---

### *Imports and declarations*

In [None]:
import os
import openai
import wikipedia
import tiktoken
from langchain import OpenAI
from langchain.prompts import PromptTemplate
from langchain.callbacks import get_openai_callback

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

llm_model = OpenAI(temperature=0.0)

tokenizer = tiktoken.encoding_for_model(llm_model.model_name)

# Cost of executing ChatGPT calls is accumulated in 'total_cost'
# Summary is printed at the end of this notebook
total_cost = 0.0

---

### Summarize Wikipedia article on GPT-3 

Python Wikipedia library documentation: https://wikipedia.readthedocs.io/en/latest/

In [None]:
def summarize(text, length, llm=llm_model, print_full_prompt=False):
    # text and length must be valid strings, length should be a string representation of an integer
    global total_cost
    
    summarization_template_string = """
    Summarize the text delimited by tripple backticks in {length} words.\
    text: ```{text}```
    """
    summarization_prompt_template = PromptTemplate(
        input_variables=["text", "length"],
        template=summarization_template_string
    )
    
    model_input = summarization_prompt_template.format(text=text, length=length)

    if print_full_prompt:
        print(f"Full prompt:\n{model_input}\n")
    
    with get_openai_callback() as cb:
        response = llm(model_input)
        
    total_cost += cb.total_cost
    
    return response

In [None]:
# Wikipedia page on GPT-3: https://en.wikipedia.org/wiki/GPT-3

wikipedia.set_lang("en")
gpt3_article = wikipedia.page("GPT-3", auto_suggest=False).content

In [None]:
print(gpt3_article[:500])

Check the article lenght in tokens to assure it fits into LLM's input limitation (together with prompt template text), which is 4096 tokens for GPT-3.5-Turbo

In [None]:
len(tokenizer.encode(gpt3_article))

In [None]:
gpt3_summary = summarize(gpt3_article, length="200", print_full_prompt=True)

In [None]:
print(f"Summary:\n{gpt3_summary}")

In [None]:
# count words in the summary
import re

len(re.findall(r'\w+', gpt3_summary))

In [None]:
# count tokens in the summary

len(tokenizer.encode(gpt3_summary))

### Summarize Wikipedia article on GPT-4

In [None]:
# Wikipedia page on GPT-4: https://en.wikipedia.org/wiki/GPT-4

gpt4_article = wikipedia.page("GPT-4", auto_suggest=False).content
len(tokenizer.encode(gpt4_article))

In [None]:
gpt4_summary = summarize(gpt4_article, length="200")
    
print(f"Summary:\n{gpt4_summary}")

In [None]:
short_gpt4_summary = summarize(gpt4_article, length="100")
    
print(f"Summary:\n{short_gpt4_summary}")

In [None]:
# count words in the short summary

len(re.findall(r'\w+', short_gpt4_summary))

---

## Extract topics, named entities, etc. from text

In [None]:
def extract(text, topic, llm=llm_model):
    # text and topics must be valid strings
    global total_cost
    
    extraction_template_string = """
    Extract {topic} from the text delimited by tripple backticks.\
    text: ```{text}```
    """
    extraction_prompt_template = PromptTemplate.from_template(extraction_template_string)
    
    model_input = extraction_prompt_template.format(text=text, topic=topic)

    with get_openai_callback() as cb:
        response = llm(model_input)
        
    total_cost += cb.total_cost
    
    return response

In [None]:
print(extract(gpt3_summary, "main topic"))

In [None]:
print(extract(gpt4_summary, "main topic"))

In [None]:
print(extract(gpt3_summary, "list of model names"))

In [None]:
print(extract(gpt4_summary, "list of applications"))

---

## Sentiment analysis

In [None]:
def sentiment_analysis(text, llm=llm_model):
    global total_cost
    
    sentiment_template_string = """
    Classify the sentiment expressed in the review delimited by tripple backticks.\
    review: ```{text}```
    """
    sentiment_prompt_template = PromptTemplate.from_template(sentiment_template_string)

    model_input = sentiment_prompt_template.format(text=text)

    with get_openai_callback() as cb:
        response = llm(model_input)
        
    total_cost += cb.total_cost
    
    return response

In [None]:
review_1 = """
I purchased the PixelPioneer Quantum 60" and it's a game-changer.
The 4K resolution is stunning and the smart features are easy to use.
Worth every penny! - George, Liverpool"""

print(sentiment_analysis(review_1))

In [None]:
review_2 = """
I'm not happy with the VisionCast UltraView 43".
The picture quality is subpar and the TV arrived with a scratch on the screen.
I expected better quality control. - Sarah, Los Angeles"""

print(sentiment_analysis(review_2))

In [None]:
review_3 = """
I bought the PixelPioneer Quantum 70" and it's simply fantastic.
The voice control remote is a game-changer.
However, the delivery was delayed by a week which was quite frustrating. - Emma, London"""

print(sentiment_analysis(review_3))

---

## Translation to other languages

In [None]:
def translate(text, target_language, llm=llm_model):
    global total_cost
    
    translation_template_string = """
    Translate the text delimited by tripple backticks into {language}.\
    text: ```{text}```
    """
    translation_prompt_template = PromptTemplate.from_template(translation_template_string)
    
    model_input = translation_prompt_template.format(text=text, language=target_language)

    with get_openai_callback() as cb:
        response = llm(model_input)
        
    total_cost += cb.total_cost
    
    return response

In [None]:
english_text = "Some of the capabilities of GPT-4 include describing humor in images, \
summarizing text from screenshots, and answering exam questions with diagrams."

spanish_translation = translate(english_text, "Spanish")

print(spanish_translation)

In [None]:
italian_translation = translate(english_text, "Italian")

print(italian_translation)

In [None]:
print(translate(spanish_translation, "Italian"))

In [None]:
# quote from Wikipedia: https://el.wikipedia.org/wiki/GPT-4

greek_text = "Ως μετασχηματιστής, το GPT-4 ήταν προεκπαιδευμένο για την πρόβλεψη του επόμενου διακριτικού \
(χρησιμοποιώντας δημόσια δεδομένα και «δεδομένα με άδεια από τρίτους παρόχους») και στη συνέχεια βελτιστοποιήθηκε \
με ενισχυτική μάθηση από την ανάδραση ανθρώπου και τεχνητής νοημοσύνης για ανθρώπινη ευθυγράμμιση και πολιτική συμμόρφωση."

print(translate(greek_text, "English"))

---

## Rephrasing to correct or address a need

In [None]:
def correct_text(text, llm=llm_model):
    global total_cost
    
    correct_grammar_template_string = """
    Correct grammar, punctuation and spelling in the text delimited by tripple backticks.\
    text: ```{text}```
    """
    correct_grammar_prompt_template = PromptTemplate.from_template(correct_grammar_template_string)
    
    model_input = correct_grammar_prompt_template.format(text=text)

    with get_openai_callback() as cb:
        response = llm(model_input)
        
    total_cost += cb.total_cost
    
    return response

In [None]:
original_text = """
The model has limitations, including the tendency to hallucinate and lack transparency
in its decision-making processes. It has also been found to have cognitive biases."""

altered_text = """
The mdel has limmitaions including, the tendency to halucinate and lsck trespacy
in its decision making processes. It has also been fond to hav cognitive biasses."""

print(correct_text(altered_text))

---

## Examples of other ChatGPT based applications:

- Chatbots
- Question answering over documents
- Customer support agents
- Querying and analyzis of structured data
- Personal assistants, etc.

---

# Get the total cost of running ChatGPT API calls in this notebook

In [None]:
print(f"Total cost: ${total_cost:.4f}")