<a href="https://colab.research.google.com/github/martatolos/eae-dsaa-2025/blob/main/nlp_tasks_with_gpt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Solving Natural Language Processing tasks with GPT

> Goal of the session:
>
> - Get familiarity with the most common NLP tasks using an LLM.
>
>Scope of the session
>
> - Use LLMs for common NLP tasks.

As we have seen, LLMs are capable of carrying out downstream NLP tasks. Although they have not been fine-tuned for solving certain specific tasks, they perform greatly on some of them.

You can use:
  - [OpenAI API](https://platform.openai.com)
  - [UI of OpenAI ChatGPT](https://chatgpt.com)

> [!Note]
> If you will use the UI, please, use a text editor to create your prompts. Once ready, you can copy-paste them in the UI. It will make the process easier for you.

## 1. Setup

**Only for those ones that will carry out this activity using the OpenAI API.**

### Dependencies

- ``ipython``
- ``openai`` 1.75.0
- ``python-dotenv``
- ``spacy`` 3.8.5

In [None]:
%pip install ipython openai==1.75.0 python-dotenv spacy==3.8.5

### Imports

In [None]:
import os

import dotenv
import spacy
from IPython.display import Markdown, display
from openai import OpenAI

In [None]:
nlp = spacy.load("en_core_web_sm")

### API Key

Add your OpenAI API key in the cell below or create a `.env` file in the same directory as this notebook with the following content:

```
OPENAI_API_KEY=your_openai_api_key
```

> [!Warning]
> Make sure you do not save or commit the file without removing your API key. If that happens, reset the key so that it is not compromised.

In [None]:
open_ai_key = None  # Add your OpenAI API key here
dotenv.load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY", open_ai_key)

### Helper functions

Next, we will create a function that we will use in today's activity and in the next NLP lecture as well. We will use [chat completions endpoint](https://platform.openai.com/docs/guides/text-generation/chat-completions-api).

Also most LLMs use markdown format to in their output. We will use ``render_markdown`` function to show the outputs we get in a more readable format.

In [None]:
def get_completion(prompt: str, model_name: str = "gpt-4o-mini") -> str:
    """Get the completion from OpenAI API.

    :param prompt: Prompt to be sent to the model.
    :param model_name: Name of the model which will be used.
        Check https://platform.openai.com/docs/models to get an updated list.
        Defaults to "gpt-4o-mini"
    :return: Completion from the model.
    """
    return OpenAI().responses.create(model=model_name, input=prompt).output[0].content[0].text


def render_markdown(text: str) -> None:
    """Render the text as markdown.

    :param text: Text to be rendered.
    """
    display(Markdown(text))


def show_completion(prompt: str, model_name: str = "gpt-4o-mini") -> None:
    """Get the completion from OpenAI API and render it as markdown.

    :param prompt: Prompt to be sent to the model.
    :param model_name: Name of the model which will be used.
        Check https://platform.openai.com/docs/models to get an updated list.
        Defaults to "gpt-4o-mini"
    """
    completion = get_completion(prompt, model_name)
    render_markdown(completion)

In [None]:
prompt = "Tell me a joke"
show_completion(prompt)

## 2. Summarization

In [None]:
prod_review = """
Got this panda plush toy for my daughter's birthday, \
who loves it and takes it everywhere. It's soft and \
super cute, and its face has a friendly look. It's \
a bit small for what I paid though. I think there \
might be other options that are bigger for the \
same price. It arrived a day earlier than expected, \
so I got to play with it myself before I gave it \
to her.
"""

### Summarize with a word/sentence/character limit

In [None]:
# with word limit
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site.

Summarize the review below, delimited by triple
backticks, in at most 30.

Review: ```{prod_review}```
"""

response = get_completion(prompt)
render_markdown(response)

Let's check whether the model generated an answer with the word limits specified in the prompt.

In [None]:
n_words = len(response.split())
print(f"The response has {n_words} words.")

Even if the model can adhere to the number of words we ask for, the model uses tokens internally.

Here you can see how the response is split:

In [None]:
doc = nlp(response)
tokens = list(doc)
for i, token in enumerate(tokens):
    print(f"{i}\t'{token}'")

In [None]:
# with character limit
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site.

Summarize the review below, delimited by triple
backticks, in at most 100 characters.

Review: ```{prod_review}```
"""

response = get_completion(prompt)
render_markdown(response)

Let's check whether the model generated an answer with the character limits specified in the prompt.

In [None]:
for i, character in enumerate(response):
    print(f"{i}\t'{character}'")

In [None]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site.

Summarize the review below, delimited by triple
backticks, in one single sentence.

Review: ```{prod_review}```
"""

show_completion(prompt)

### Summarize focusing on specific topics of the text

#### Focus on shipping and delivery

In [None]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
Shipping deparmtment.

Summarize the review below, delimited by triple
backticks, in at most 30 words, and focusing on any aspects \
that mention shipping and delivery of the product.

Review: ```{prod_review}```
"""

show_completion(prompt)

#### Focus on price and value

In [None]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
pricing deparmtment, responsible for determining the \
price of the product.

Summarize the review below, delimited by triple
backticks, in at most 30 words, and focusing on any aspects \
that are relevant to the price and perceived value.

Review: ```{prod_review}```
"""

show_completion(prompt)

### Extract information instead of summarizing

In [None]:
prompt = f"""
Your task is to extract relevant information from \
a product review from an ecommerce site to give \
feedback to the Shipping department.

From the review below, delimited by triple quotes \
extract the information relevant to shipping and \
delivery. Limit to 30 words.

Review: ```{prod_review}```
"""

show_completion(prompt)

### Summarize multiple product reviews

In [None]:
review_1 = prod_review

# review for a standing lamp
review_2 = """
Needed a nice lamp for my bedroom, and this one \
had additional storage and not too high of a price \
point. Got it fast - arrived in 2 days. The string \
to the lamp broke during the transit and the company \
happily sent over a new one. Came within a few days \
as well. It was easy to put together. Then I had a \
missing part, so I contacted their support and they \
very quickly got me the missing piece! Seems to me \
to be a great company that cares about their customers \
and products.
"""

# review for an electric toothbrush
review_3 = """
My dental hygienist recommended an electric toothbrush, \
which is why I got this. The battery life seems to be \
pretty impressive so far. After initial charging and \
leaving the charger plugged in for the first week to \
condition the battery, I've unplugged the charger and \
been using it for twice daily brushing for the last \
3 weeks all on the same charge. But the toothbrush head \
is too small. I've seen baby toothbrushes bigger than \
this one. I wish the head was bigger with different \
length bristles to get between teeth better because \
this one doesn't.  Overall if you can get this one \
around the $50 mark, it's a good deal. The manufactuer's \
replacements heads are pretty expensive, but you can \
get generic ones that're more reasonably priced. This \
toothbrush makes me feel like I've been to the dentist \
every day. My teeth feel sparkly clean!
"""

# review for a blender
review_4 = """
So, they still had the 17 piece system on seasonal \
sale for around $49 in the month of November, about \
half off, but for some reason (call it price gouging) \
around the second week of December the prices all went \
up to about anywhere from between $70-$89 for the same \
system. And the 11 piece system went up around $10 or \
so in price also from the earlier sale price of $29. \
So it looks okay, but if you look at the base, the part \
where the blade locks into place doesn't look as good \
as in previous editions from a few years ago, but I \
plan to be very gentle with it (example, I crush \
very hard items like beans, ice, rice, etc. in the \
blender first then pulverize them in the serving size \
I want in the blender then switch to the whipping \
blade for a finer flour, and use the cross cutting blade \
first when making smoothies, then use the flat blade \
if I need them finer/less pulpy). Special tip when making \
smoothies, finely cut and freeze the fruits and \
vegetables (if using spinach-lightly stew soften the \
spinach then freeze until ready for use-and if making \
sorbet, use a small to medium sized food processor) \
that you plan to use that way you can avoid adding so \
much ice if at all-when making your smoothie. \
After about a year, the motor was making a funny noise. \
I called customer service but the warranty expired \
already, so I had to buy another one. FYI: The overall \
quality has gone done in these types of products, so \
they are kind of counting on brand recognition and \
consumer loyalty to maintain sales. Got it in about \
two days.
"""

reviews = [review_1, review_2, review_3, review_4]

In [None]:
for i, review in enumerate(reviews, 1):
    prompt = f"""
    Your task is to generate a short summary of a product \
    review from an ecommerce site.

    Summarize the review below, delimited by triple \
    backticks in at most 20 words.

    Review: ```{review}```
    """
    response = get_completion(prompt)
    render_markdown(f"## Review {i}\n{response}\n")

### Time to practice on your own (10 min)

## 3. Text classification

Examples:

This product is great - SENTIMENT POSITIVE

I would not buy this product again, it is useless - SENTIMENT NEGATIVE

In [None]:
lamp_review = """
Needed a nice lamp for my bedroom, and this one had \
additional storage and not too high of a price point. \
Got it fast.  The string to our lamp broke during the \
transit and the company happily sent over a new one. \
Came within a few days as well. It was easy to put \
together.  I had a missing part, so I contacted their \
support and they very quickly got me the missing piece! \
Lumina seems to me to be a great company that cares \
about their customers and products!!
"""

### Sentiment Analysis

In [None]:
# sentiment analysis verbose
prompt = f"""
What is the sentiment of the following product review,
which is delimited with triple backticks?

Review text: '''{lamp_review}'''
"""
show_completion(prompt)

In [None]:
# sentiment analysis just label
prompt = f"""
What is the sentiment of the following product review,
which is delimited with triple backticks?

Give your answer as a single word, either "positive" \
or "negative".

Review text: '''{lamp_review}'''
"""
show_completion(prompt)

### Emotion detection

In [None]:
prompt = f"""
Identify a list of emotions that the writer of the \
following review is expressing. Include no more than \
five items in the list. Format your answer as a list of \
lower-case words separated by commas.

Review text: '''{lamp_review}'''
"""
show_completion(prompt)

### Anger detection

In [None]:
prompt = f"""
Is the writer of the following review expressing anger?\
The review is delimited with triple backticks. \
Give your answer as either yes or no.

Review text: '''{lamp_review}'''
"""
show_completion(prompt)

### Extract entities

Scope:

- Extraction of product names and company names

In [None]:
prompt = f"""
Identify the following items from the review text:
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.

Review text: '''{lamp_review}'''
"""
show_completion(prompt)

### Solving multiple tasks at once

Scope:

- Sentiment analysis
- Anger detection
- Extraction of entities: product and company names

In [None]:
prompt = f"""
Identify the following items from the review text:
- Sentiment (positive or negative)
- Is the reviewer expressing anger? (true or false)
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Sentiment", "Anger", "Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
Format the Anger value as a boolean.

Review text: '''{lamp_review}'''
"""
show_completion(prompt)

### Topic detection (open)

In [None]:
story = """
In a recent survey conducted by the government,
public sector employees were asked to rate their level
of satisfaction with the department they work at.
The results revealed that NASA was the most popular
department with a satisfaction rating of 95%.

One NASA employee, John Smith, commented on the findings,
stating, "I'm not surprised that NASA came out on top.
It's a great place to work with amazing people and
incredible opportunities. I'm proud to be a part of
such an innovative organization."

The results were also welcomed by NASA's management team,
with Director Tom Johnson stating, "We are thrilled to
hear that our employees are satisfied with their work at NASA.
We have a talented and dedicated team who work tirelessly
to achieve our goals, and it's fantastic to see that their
hard work is paying off."

The survey also revealed that the
Social Security Administration had the lowest satisfaction
rating, with only 45% of employees indicating they were
satisfied with their job. The government has pledged to
address the concerns raised by employees in the survey and
work towards improving job satisfaction across all departments.
"""

In [None]:
# infer 5 topics
prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long.

Format your response as a list of items separated by commas.

Text sample: '''{story}'''
"""
show_completion(prompt)

### Topic detection from a pre-defined list of topics

In [None]:
topic_list = [
    "nasa",
    "local government",
    "engineering",
    "employee satisfaction",
    "federal government",
]

In [None]:
prompt = f"""
Determine whether each item in the following list of \
topics is a topic in the text below, which
is delimited with triple backticks.

Give your answer as list with 0 or 1 for each topic.\

List of topics: {", ".join(topic_list)}

Text sample: '''{story}'''
"""
show_completion(prompt)

### Time to practice on your own (10 min)

## 4. Translation

ChatGPT has been trained with multilingual data. For this reason, this model is capable if translating text as well.

In [None]:
text = """
Barcelona is a city on the northeastern coast of Spain. \
It is the capital and largest city of the autonomous community of Catalonia, \
as well as the second-most populous municipality of Spain. \
With a population of 1.6 million within city limits, \
its urban area extends to numerous neighbouring municipalities within the province \
of Barcelona and is home to around 4.8 million people, \
making it the fifth most populous urban area in the European Union after Paris, \
the Ruhr area, Madrid and Milan. \
It is one of the largest metropolises on the Mediterranean Sea, \
located on the coast between the mouths of the rivers Llobregat and Besòs, \
bounded to the west by the Serra de Collserola mountain range.
"""

In [None]:
prompt = f"""
Translate the following English text delimited by tags <text> to Spanish. \

<text>{text}<text>
"""
show_completion(prompt)

### Language detection

In [None]:
prompt = """
Tell me the language of the following text delimited by tags <text>. \

<text>
Barcelone est une ville située sur la côte nord-est de l'Espagne. \
C'est la capitale et la plus grande ville de la communauté autonome de Catalogne, \
ainsi que la deuxième municipalité la plus peuplée d'Espagne. \
Avec une population de 1,6 million d'habitants dans les limites de la ville, \
son aire urbaine s'étend à de nombreuses municipalités voisines de la province \
de Barcelone et abrite environ 4,8 millions de personnes, \
ce qui en fait la cinquième aire urbaine la plus peuplée de l'Union européenne \
après Paris, la région de la Ruhr, Madrid et Milan. \
C'est l'une des plus grandes métropoles de la mer Méditerranée, \
située sur la côte entre les embouchures des rivières Llobregat et Besòs, \
bordée à l'ouest par la chaîne de montagnes de la Serra de Collserola.
<text>
"""
show_completion(prompt)

### Multilingual translation

In [None]:
prompt = f"""
Translate the following English text delimited by tags <text> to Spanish and French. \

<text>{text}<text>
"""
show_completion(prompt)

### Language detection + multilingual translation

In [None]:
user_messages = [
    "La performance du système est plus lente que d'habitude.",  # System performance is slower than normal
    "Mi monitor tiene píxeles que no se iluminan.",  # My monitor has pixels that are not lighting
    "Il mio mouse non funziona",  # My mouse is not working
    "Mój klawisz Ctrl jest zepsuty",  # My keyboard has a broken control key
    "我的屏幕在闪烁",  # My screen is flashing
]

In [None]:
for issue in user_messages:
    prompt = f"Tell me what language this is: ```{issue}```"
    lang = get_completion(prompt)

    prompt = f"""
    Translate the following  text to English \
    and Korean: ```{issue}```
    """
    response = get_completion(prompt)
    render_markdown(f"**Original message** ({lang}): {issue}\n{response}")

### Adapt tone of the text

In [None]:
prompt = """
Translate the following from slang to a business letter:
'Dude, This is Joe, check out this spec on this standing lamp.'
"""
show_completion(prompt)

### Spellcheck and grammar check

In [None]:
texts = [
    "The girl with the black and white puppies have a ball.",  # The girl has a ball.
    "Yolanda has her notebook.",  # ok
    "Its going to be a long day. Does the car need it's oil changed?",  # Homonyms
    "Their goes my freedom. There going to bring they're suitcases.",  # Homonyms
    "Your going to need you're notebook.",  # Homonyms
    "That medicine effects my ability to sleep. Have you heard of the butterfly affect?",  # Homonyms
    "This phrase is to cherck chatGPT for speling abilitty",  # spelling
]

for text in texts:
    prompt = f"""Proofread and correct the following text
  and rewrite the corrected version. If you don't find
  and errors, just say "No errors found". Don't use
  any punctuation around the text:
  ```{text}```

  """
    response = get_completion(prompt)
    render_markdown(f"## Original text\n{text}\n## Corrected text\n{response}\n")

As you can see, answers generated are not highly accurate.