# Solving Natural Language Processing tasks with GPT

As we have seen, LLMs are capable of carrying out downstream NLP tasks. Although they have not been fine-tuned for solving certain specific tasks, they perform greatly on some of them.

Goal of the activity:
- Get familiary with the most common NLP tasks using an LLM.

Scope:
- We will use the model GPT 3.5 Turbo
- You can choose to use
  - OpenAI API (https://platform.openai.com/)
  - UI of OpenAI ChatGPT (https://chatgpt.com/)

If you will use the UI, please, use a text editor to create your prompts. Once ready, you can copy-paste them in the UI. It will make the process easier for you.

---

## 1. Setup

**Only for those ones that will carry out this activity using the OpenAI API.**

Steps:
1. Setup secrets
1. Install libraries
1. Import libraries
1. Load your API key

In [11]:
!pip install openai==0.27.0
!pip install spacy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m36.6 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [4]:
import openai
from google.colab import userdata

openai.api_key = userdata.get('OPENAI_API_KEY')

In [3]:
import openai

# @title Alternative code if secret does not work for you
openai.api_key = "COPY YOUR API KEY HERE"

**Helper function**

Next, we will create a function that we will use in today's activity and in the next NLP lecture as well. We will use [chat completions endpoint](https://platform.openai.com/docs/guides/text-generation/chat-completions-api).

In [5]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

In [6]:
# @title Test API response
prompt = "Tell me a joke"
get_completion(prompt)

"Why couldn't the bicycle stand up by itself?\n\nBecause it was two tired!"

----

## 2. Summarization

In [7]:
prod_review = """
Got this panda plush toy for my daughter's birthday, \
who loves it and takes it everywhere. It's soft and \
super cute, and its face has a friendly look. It's \
a bit small for what I paid though. I think there \
might be other options that are bigger for the \
same price. It arrived a day earlier than expected, \
so I got to play with it myself before I gave it \
to her.
"""

### 2.1. Summarize with a word/sentence/character limit

In [9]:
# with word limit
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site.

Summarize the review below, delimited by triple
backticks, in at most 30 words.

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)

Summary: 
Soft and cute panda plush toy loved by daughter, but smaller than expected for the price. Arrived early, allowing for personal enjoyment before gifting.


Let's check whether the model generated an answer with the word limits specified in the prompt.

In [12]:
import spacy

nlp = spacy.load("en_core_web_sm")

In [15]:
doc = nlp(response)
tokens = [token for token in doc]
for i, token in enumerate(tokens):
  print(f"{i}\t{token}")

0	Summary
1	:
2	

3	Soft
4	and
5	cute
6	panda
7	plush
8	toy
9	loved
10	by
11	daughter
12	,
13	but
14	smaller
15	than
16	expected
17	for
18	the
19	price
20	.
21	Arrived
22	early
23	,
24	allowing
25	for
26	personal
27	enjoyment
28	before
29	gifting
30	.


In [17]:
# with character limit
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site.

Summarize the review below, delimited by triple
backticks, in at most 100 characters.

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)

Summary: 
Cute panda plush toy loved by daughter, but smaller than expected for the price. Arrived early.


Let's check whether the model generated an answer with the word limits specified in the prompt.

In [18]:
for i, character in enumerate(response):
  print(f"{i}\t{character}")

0	S
1	u
2	m
3	m
4	a
5	r
6	y
7	:
8	 
9	

10	C
11	u
12	t
13	e
14	 
15	p
16	a
17	n
18	d
19	a
20	 
21	p
22	l
23	u
24	s
25	h
26	 
27	t
28	o
29	y
30	 
31	l
32	o
33	v
34	e
35	d
36	 
37	b
38	y
39	 
40	d
41	a
42	u
43	g
44	h
45	t
46	e
47	r
48	,
49	 
50	b
51	u
52	t
53	 
54	s
55	m
56	a
57	l
58	l
59	e
60	r
61	 
62	t
63	h
64	a
65	n
66	 
67	e
68	x
69	p
70	e
71	c
72	t
73	e
74	d
75	 
76	f
77	o
78	r
79	 
80	t
81	h
82	e
83	 
84	p
85	r
86	i
87	c
88	e
89	.
90	 
91	A
92	r
93	r
94	i
95	v
96	e
97	d
98	 
99	e
100	a
101	r
102	l
103	y
104	.


In [16]:
# with sentence limit
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site.

Summarize the review below, delimited by triple
backticks, in one single sentence.

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)

The panda plush toy is soft, cute, and loved by the reviewer's daughter, but they feel it is a bit small for the price paid and suggest there may be larger options available for the same cost.


### 2.2. Summarize focusing on specific topics of the text

#### 2.2.1. Focus on shipping and delivery

In [19]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
Shipping deparmtment.

Summarize the review below, delimited by triple
backticks, in at most 30 words, and focusing on any aspects \
that mention shipping and delivery of the product.

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)

The product arrived a day earlier than expected, allowing the customer to enjoy it before gifting it, but they felt it was a bit small for the price.


#### 2.2.2. Focus on price and value

In [20]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
pricing deparmtment, responsible for determining the \
price of the product.

Summarize the review below, delimited by triple
backticks, in at most 30 words, and focusing on any aspects \
that are relevant to the price and perceived value.

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)

Summary: 
Customers find the panda plush toy cute and soft, but feel it's slightly overpriced for its size. Consider offering larger options for the same price.


### 2.3. Extract information instead of summarizing

In [21]:
prompt = f"""
Your task is to extract relevant information from \
a product review from an ecommerce site to give \
feedback to the Shipping department.

From the review below, delimited by triple quotes \
extract the information relevant to shipping and \
delivery. Limit to 30 words.

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)

Feedback: The product arrived a day earlier than expected, allowing the customer to play with it before giving it as a gift.


### 2.4. Summarize multiple product reviews

In [22]:

review_1 = prod_review

# review for a standing lamp
review_2 = """
Needed a nice lamp for my bedroom, and this one \
had additional storage and not too high of a price \
point. Got it fast - arrived in 2 days. The string \
to the lamp broke during the transit and the company \
happily sent over a new one. Came within a few days \
as well. It was easy to put together. Then I had a \
missing part, so I contacted their support and they \
very quickly got me the missing piece! Seems to me \
to be a great company that cares about their customers \
and products.
"""

# review for an electric toothbrush
review_3 = """
My dental hygienist recommended an electric toothbrush, \
which is why I got this. The battery life seems to be \
pretty impressive so far. After initial charging and \
leaving the charger plugged in for the first week to \
condition the battery, I've unplugged the charger and \
been using it for twice daily brushing for the last \
3 weeks all on the same charge. But the toothbrush head \
is too small. I’ve seen baby toothbrushes bigger than \
this one. I wish the head was bigger with different \
length bristles to get between teeth better because \
this one doesn’t.  Overall if you can get this one \
around the $50 mark, it's a good deal. The manufactuer's \
replacements heads are pretty expensive, but you can \
get generic ones that're more reasonably priced. This \
toothbrush makes me feel like I've been to the dentist \
every day. My teeth feel sparkly clean!
"""

# review for a blender
review_4 = """
So, they still had the 17 piece system on seasonal \
sale for around $49 in the month of November, about \
half off, but for some reason (call it price gouging) \
around the second week of December the prices all went \
up to about anywhere from between $70-$89 for the same \
system. And the 11 piece system went up around $10 or \
so in price also from the earlier sale price of $29. \
So it looks okay, but if you look at the base, the part \
where the blade locks into place doesn’t look as good \
as in previous editions from a few years ago, but I \
plan to be very gentle with it (example, I crush \
very hard items like beans, ice, rice, etc. in the \
blender first then pulverize them in the serving size \
I want in the blender then switch to the whipping \
blade for a finer flour, and use the cross cutting blade \
first when making smoothies, then use the flat blade \
if I need them finer/less pulpy). Special tip when making \
smoothies, finely cut and freeze the fruits and \
vegetables (if using spinach-lightly stew soften the \
spinach then freeze until ready for use-and if making \
sorbet, use a small to medium sized food processor) \
that you plan to use that way you can avoid adding so \
much ice if at all-when making your smoothie. \
After about a year, the motor was making a funny noise. \
I called customer service but the warranty expired \
already, so I had to buy another one. FYI: The overall \
quality has gone done in these types of products, so \
they are kind of counting on brand recognition and \
consumer loyalty to maintain sales. Got it in about \
two days.
"""

reviews = [review_1, review_2, review_3, review_4]

In [23]:
for i in range(len(reviews)):
    prompt = f"""
    Your task is to generate a short summary of a product \
    review from an ecommerce site.

    Summarize the review below, delimited by triple \
    backticks in at most 20 words.

    Review: ```{reviews[i]}```
    """

    response = get_completion(prompt)
    print(i, response, "\n")

0 Summary: 
Adorable panda plush loved by daughter, soft and cute, but smaller than expected for the price. Arrived early. 

1 Summary: 
Lamp with storage, affordable, fast delivery, excellent customer service for missing parts. Great company. 

2 Impressive battery life, small brush head, good deal for $50, generic replacement heads available, leaves teeth feeling clean. 

3 Summary: Price fluctuations, quality concerns, motor issues after a year, and tips for usage shared in review. 



## Time to practice on your own (10 min)

---

## 3. Text classification

Examples:

This product is great - SENTIMENT POSITIVE

I would not buy this product again, it is useless - SENTIMENT NEGATIVE

In [24]:
lamp_review = """
Needed a nice lamp for my bedroom, and this one had \
additional storage and not too high of a price point. \
Got it fast.  The string to our lamp broke during the \
transit and the company happily sent over a new one. \
Came within a few days as well. It was easy to put \
together.  I had a missing part, so I contacted their \
support and they very quickly got me the missing piece! \
Lumina seems to me to be a great company that cares \
about their customers and products!!
"""

### 3.1. Sentiment Analysis

In [25]:
# sentiment analysis verbose
prompt = f"""
What is the sentiment of the following product review,
which is delimited with triple backticks?

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

The sentiment of the review is positive. The reviewer is satisfied with the lamp, the customer service, and the company in general.


In [27]:
# sentiment analysis just label
prompt = f"""
What is the sentiment of the following product review,
which is delimited with triple backticks?

Give your answer as a single word, either "positive" \
or "negative".

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

Positive


### 3.2. Emotion detection

In [28]:
prompt = f"""
Identify a list of emotions that the writer of the \
following review is expressing. Include no more than \
five items in the list. Format your answer as a list of \
lower-case words separated by commas.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

happy, satisfied, grateful, impressed, pleased


### 3.3. Anger detection

In [29]:
prompt = f"""
Is the writer of the following review expressing anger?\
The review is delimited with triple backticks. \
Give your answer as either yes or no.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

No


### 3.4. Extract entities

Scope:
- Extraction of product names and company names

In [30]:
prompt = f"""
Identify the following items from the review text:
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

{
  "Item": "lamp",
  "Brand": "Lumina"
}


### 3.5. Solving multiple tasks at once

Scope:
- Sentiment analysis
- Anger detection
- Extraction of entities: product and company names

In [31]:
prompt = f"""
Identify the following items from the review text:
- Sentiment (positive or negative)
- Is the reviewer expressing anger? (true or false)
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Sentiment", "Anger", "Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
Format the Anger value as a boolean.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

{
    "Sentiment": "positive",
    "Anger": false,
    "Item": "lamp",
    "Brand": "Lumina"
}


### 3.6. Topic detection (open)

In [32]:
story = """
In a recent survey conducted by the government,
public sector employees were asked to rate their level
of satisfaction with the department they work at.
The results revealed that NASA was the most popular
department with a satisfaction rating of 95%.

One NASA employee, John Smith, commented on the findings,
stating, "I'm not surprised that NASA came out on top.
It's a great place to work with amazing people and
incredible opportunities. I'm proud to be a part of
such an innovative organization."

The results were also welcomed by NASA's management team,
with Director Tom Johnson stating, "We are thrilled to
hear that our employees are satisfied with their work at NASA.
We have a talented and dedicated team who work tirelessly
to achieve our goals, and it's fantastic to see that their
hard work is paying off."

The survey also revealed that the
Social Security Administration had the lowest satisfaction
rating, with only 45% of employees indicating they were
satisfied with their job. The government has pledged to
address the concerns raised by employees in the survey and
work towards improving job satisfaction across all departments.
"""

In [33]:
# infer 5 topics
prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long.

Format your response as a list of items separated by commas.

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)

1. Survey
2. Job satisfaction
3. NASA
4. Social Security Administration
5. Government pledge


### 3.7. Topic detection from a pre-defined list of topics

In [34]:
topic_list = [
    "nasa", "local government", "engineering",
    "employee satisfaction", "federal government"
]

In [35]:
prompt = f"""
Determine whether each item in the following list of \
topics is a topic in the text below, which
is delimited with triple backticks.

Give your answer as list with 0 or 1 for each topic.\

List of topics: {", ".join(topic_list)}

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)

[1, 0, 0, 1, 1]


## Time to practice on your own (10 min)



---

## 4. Translation

ChatGPT has been trained with multilingual data. For this reason, this model is capable if translating text as well.

In [37]:
text = """
Barcelona is a city on the northeastern coast of Spain. \
It is the capital and largest city of the autonomous community of Catalonia, \
as well as the second-most populous municipality of Spain. \
With a population of 1.6 million within city limits, \
its urban area extends to numerous neighbouring municipalities within the province \
of Barcelona and is home to around 4.8 million people, \
making it the fifth most populous urban area in the European Union after Paris, \
the Ruhr area, Madrid and Milan. \
It is one of the largest metropolises on the Mediterranean Sea, \
located on the coast between the mouths of the rivers Llobregat and Besòs, \
bounded to the west by the Serra de Collserola mountain range.
"""

In [41]:
prompt = f"""
Translate the following English text delimited by tags <text> to Spanish. \

<text>{text}<text>
"""
response = get_completion(prompt)
print(response)

Barcelona es una ciudad en la costa noreste de España. Es la capital y la ciudad más grande de la comunidad autónoma de Cataluña, así como el segundo municipio más poblado de España. Con una población de 1.6 millones dentro de los límites de la ciudad, su área urbana se extiende a numerosos municipios vecinos dentro de la provincia de Barcelona y alberga alrededor de 4.8 millones de personas, lo que la convierte en la quinta área urbana más poblada de la Unión Europea después de París, el área del Ruhr, Madrid y Milán. Es una de las metrópolis más grandes en el Mar Mediterráneo, ubicada en la costa entre las desembocaduras de los ríos Llobregat y Besòs, limitada al oeste por la cordillera de Collserola.


### 4.1. Language detection

In [40]:
prompt = f"""
Tell me the language of the following text delimited by tags <text>. \

<text>
Barcelone est une ville située sur la côte nord-est de l'Espagne. \
C'est la capitale et la plus grande ville de la communauté autonome de Catalogne, \
ainsi que la deuxième municipalité la plus peuplée d'Espagne. \
Avec une population de 1,6 million d'habitants dans les limites de la ville, \
son aire urbaine s'étend à de nombreuses municipalités voisines de la province \
de Barcelone et abrite environ 4,8 millions de personnes, \
ce qui en fait la cinquième aire urbaine la plus peuplée de l'Union européenne \
après Paris, la région de la Ruhr, Madrid et Milan. \
C'est l'une des plus grandes métropoles de la mer Méditerranée, \
située sur la côte entre les embouchures des rivières Llobregat et Besòs, \
bordée à l'ouest par la chaîne de montagnes de la Serra de Collserola.
<text>
"""
response = get_completion(prompt)
print(response)

The language of the text is French.


### 4.2. Multilingual translation

In [42]:
prompt = f"""
Translate the following English text delimited by tags <text> to Spanish and French. \

<text>{text}<text>
"""
response = get_completion(prompt)
print(response)

Spanish:
Barcelona es una ciudad en la costa noreste de España. Es la capital y la ciudad más grande de la comunidad autónoma de Cataluña, así como el segundo municipio más poblado de España. Con una población de 1.6 millones dentro de los límites de la ciudad, su área urbana se extiende a numerosos municipios vecinos dentro de la provincia de Barcelona y alberga alrededor de 4.8 millones de personas, lo que la convierte en la quinta área urbana más poblada de la Unión Europea después de París, el área del Ruhr, Madrid y Milán. Es una de las metrópolis más grandes en el Mar Mediterráneo, ubicada en la costa entre las desembocaduras de los ríos Llobregat y Besòs, limitada al oeste por la cordillera de Collserola.

French:
Barcelone est une ville située sur la côte nord-est de l'Espagne. C'est la capitale et la plus grande ville de la communauté autonome de Catalogne, ainsi que le deuxième municipalité la plus peuplée d'Espagne. Avec une population de 1,6 million d'habitants dans les lim

### 4.3. Language detection + multilingual translation

In [43]:
user_messages = [
  "La performance du système est plus lente que d'habitude.",  # System performance is slower than normal
  "Mi monitor tiene píxeles que no se iluminan.",              # My monitor has pixels that are not lighting
  "Il mio mouse non funziona",                                 # My mouse is not working
  "Mój klawisz Ctrl jest zepsuty",                             # My keyboard has a broken control key
  "我的屏幕在闪烁"                                               # My screen is flashing
]

In [44]:
for issue in user_messages:
    prompt = f"Tell me what language this is: ```{issue}```"
    lang = get_completion(prompt)
    print(f"Original message ({lang}): {issue}")

    prompt = f"""
    Translate the following  text to English \
    and Korean: ```{issue}```
    """
    response = get_completion(prompt)
    print(response, "\n")

Original message (French): La performance du système est plus lente que d'habitude.
English: "The system performance is slower than usual."
Korean: "시스템 성능이 평소보다 느립니다." 

Original message (This is Spanish.): Mi monitor tiene píxeles que no se iluminan.
English: "My monitor has pixels that do not light up."
Korean: "내 모니터에는 빛나지 않는 픽셀이 있습니다." 

Original message (Italian): Il mio mouse non funziona
English: My mouse is not working
Korean: 내 마우스가 작동하지 않습니다 

Original message (This is Polish.): Mój klawisz Ctrl jest zepsuty
English: My Ctrl key is broken
Korean: 제 Ctrl 키가 고장 났어요 

Original message (This is Chinese.): 我的屏幕在闪烁
English: My screen is flickering
Korean: 내 화면이 깜박거립니다 



### 4.4. Adapt tone of the text

In [45]:
prompt = f"""
Translate the following from slang to a business letter:
'Dude, This is Joe, check out this spec on this standing lamp.'
"""
response = get_completion(prompt)
print(response)

Dear Sir/Madam,

I am writing to bring to your attention the specifications of a standing lamp that I believe may be of interest to you. 

Sincerely,
Joe


### 4.5. Spellcheck and grammar check

In [57]:
texts = [
  "The girl with the black and white puppies have a ball.",  # The girl has a ball.
  "Yolanda has her notebook.", # ok
  "Its going to be a long day. Does the car need it’s oil changed?",  # Homonyms
  "Their goes my freedom. There going to bring they’re suitcases.",  # Homonyms
  "Your going to need you’re notebook.",  # Homonyms
  "That medicine effects my ability to sleep. Have you heard of the butterfly affect?", # Homonyms
  "This phrase is to cherck chatGPT for speling abilitty"  # spelling
]

for text in texts:
  print(f"Input text:\t{text}")
  prompt = f"""Proofread and correct the following text
  and rewrite the corrected version. If you don't find
  and errors, just say "No errors found". Don't use
  any punctuation around the text:
  ```{text}```

  """

  response = get_completion(prompt)
  print(f"Answer:\t\t{response}")
  print("")

Input text:	The girl with the black and white puppies have a ball.
Answer:		The girl with the black and white puppies has a ball.

Input text:	Yolanda has her notebook.
Answer:		No errors found

Input text:	Its going to be a long day. Does the car need it’s oil changed?
Answer:		It's going to be a long day. Does the car need its oil changed?

Input text:	Their goes my freedom. There going to bring they’re suitcases.
Answer:		No errors found.

Input text:	Your going to need you’re notebook.
Answer:		You're going to need your notebook.

Input text:	That medicine effects my ability to sleep. Have you heard of the butterfly affect?
Answer:		No errors found.

Input text:	This phrase is to cherck chatGPT for speling abilitty
Answer:		No errors found



As you can see, answers generated are not highly accurate.