# Day 1 - Examples: Classification

In this notebook we offer an example of classification via prompting. That is, given some text, we ask the model to classify the text into things like topics or sentiment. 

References/Further Reading:

* This tutorial is partially based on https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/examples/prompt-design/text_classification.ipynb

In [1]:
# Load environment variables
from dotenv import load_dotenv

load_dotenv("../../.env")

True

# News Classification

## Setup template

We setup a template here telling the model to classify news headlines into one of 5 categories. Notice that we have provided the model with a number of examples of what the output should look like. This is called *few-shot prompting*! This helps the model to "understand" the kind of answer that we want from it.

Later in this notebook, when we cover sentiment analysis, we perform *zero-shot prompting*. That is, we do not give the model an example of what it should do. We simply ask it for an answer directly. In-between zero-shot and few-shot prompting lies *one-shot prompting* where we give the model a single example. We illustrate this further towards the end of the notebook where we cover a concept known as Chain of Thought (CoT).

In [2]:
selected_article = "<article>"
template_prompt = f"""
Classify the given news headlines into one of the following categories: [business, entertainment, health, sports, technology]

Text: Pixel 7 Pro Expert Hands On Review. 
The answer is: technology 

Text: Quit smoking? 
The answer is: health 

Text: Birdies or bogeys? Top 5 tips to hit under par 
The answer is: sports 

Text: Relief from local minimum-wage hike looking more remote 
The answer is: business 

Text: {selected_article} 
The answer is: 
"""

print(template_prompt)


Classify the given news headlines into one of the following categories: [business, entertainment, health, sports, technology]

Text: Pixel 7 Pro Expert Hands On Review. 
The answer is: technology 

Text: Quit smoking? 
The answer is: health 

Text: Birdies or bogeys? Top 5 tips to hit under par 
The answer is: sports 

Text: Relief from local minimum-wage hike looking more remote 
The answer is: business 

Text: <article> 
The answer is: 



#### User Input

Here we add in the document we want the model to read. Feel free to modify this to a document of your choice!

In [3]:
selected_article = "Introducing Apple Vision Pro: Apple’s first spatial computer"

## OpenAI

As you might expect by now, it's just a matter of sending your prompt to the model!

In [4]:
import openai

prompt = template_prompt.replace('<article>', selected_article)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt},
]

chatbot_response = openai.ChatCompletion.create(
  model="gpt-4",
  messages=messages,
  temperature=1,
  max_tokens=1500,
)

print(chatbot_response.choices[0].message["content"])

technology


## Google

You know the drill now!

In [5]:
from vertexai.preview.language_models import TextGenerationModel

prompt = template_prompt.replace('<article>', selected_article)

model = TextGenerationModel.from_pretrained("text-bison@001")
response = model.predict(prompt, max_output_tokens=1024)
print(response.text)

technology


## Open Source LLM: Falcon-7B-Instruct

In [6]:
import requests
import os

prompt = template_prompt.replace('<article>', selected_article)

headers = {"Authorization": f"Bearer {os.environ.get('HUGGINGFACEHUB_API_TOKEN')}"}

def query(payload):
    response = requests.request("POST", os.environ.get("HUGGINGFACEHUB_ENDPOINT"), headers=headers, json=payload)
    return response.json()

data = query({"inputs": prompt, "parameters": {"max_new_tokens": 10, "return_full_text": False, "top_k": 10, "temperature": 1}})

print(data[0]['generated_text'])


Classification:
- Business
- Technology


Note how we had to fiddle with the parameters to get Falcon to work well. This is mostly just because Falcon-7B is a much smaller model than the other two. A larger open source model such as Falcon-40B would do a lot better!

# Sentiment Classification

Let's consider sentiment classification as another example. Given some text, we need to determine if the text is positive, negative or neutral. As mentioned above, we follow a *zero-shot prompting* technique here, so we don't give the model prior examples.

## Setup template

This time we setup a zero-shot prompting example to classify review sentiment.

Feel free to modify this to your own needs!

In [7]:
selected_article = "<article>"
template_prompt = f"""Classify the sentiment of the following review as "positive", "neutral" or "negative".

Text: {selected_article} 
The answer is: 
"""

print(template_prompt)

Classify the sentiment of the following review as "positive", "neutral" or "negative".

Text: <article> 
The answer is: 



#### User Input

Here we add in the document we want the model to read. Feel free to modify this to a document of your choice!

In [8]:
selected_article = "I loved the new Spider-Man movie!! The animation was really fluid"

## OpenAI

In [9]:
import openai

prompt = template_prompt.replace('<article>', selected_article)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt},
]

chatbot_response = openai.ChatCompletion.create(
  model="gpt-4",
  messages=messages,
  temperature=1,
  max_tokens=1500,
)

print(chatbot_response.choices[0].message["content"])

positive


## Google

In [10]:
from vertexai.preview.language_models import TextGenerationModel

prompt = template_prompt.replace('<article>', selected_article)

model = TextGenerationModel.from_pretrained("text-bison@001")
response = model.predict(prompt, max_output_tokens=1024)
print(response.text)

positive


## Open Source LLM: Falcon-7B-Instruct

In [11]:
import requests
import os

prompt = template_prompt.replace('<article>', selected_article)

headers = {"Authorization": f"Bearer {os.environ.get('HUGGINGFACEHUB_API_TOKEN')}"}

def query(payload):
    response = requests.request("POST", os.environ.get("HUGGINGFACEHUB_ENDPOINT"), headers=headers, json=payload)
    return response.json()

data = query({"inputs": prompt, "parameters": {"max_new_tokens": 250, "return_full_text": False}})

print(data[0]['generated_text'])


Positive


# Chain of Thought (CoT) Classification

Chain of Thought is a prompting technique that can help to improve a model's output by forcing it to reason about its actions. In essence, it allows us to "see" the model's thought process. You can read more about Chain of Thought [here](https://www.promptingguide.ai/techniques/cot).

Lets perform CoT reasoning on the same sentiment analysis problem from above. Note that we only show this with a single API, but given that only the prompt changes, you should be able to use it with any of the other APIs!

### First, lets find a situation where the model fails.

In [12]:
selected_article = "<article>"
template_prompt = f"""Classify the sentiment of the following review as "positive", "neutral" or "negative".

Text: {selected_article} 
The answer is: 
"""

print(template_prompt)

Classify the sentiment of the following review as "positive", "neutral" or "negative".

Text: <article> 
The answer is: 



In [13]:
selected_article = "I really loved the new Spider-Man movie. The sub-par animation and acting really made it amazing!"
# Note: This is a lie. The movie is actually fantastic and highly recommended!

In [14]:
import openai

prompt = template_prompt.replace('<article>', selected_article)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt},
]

chatbot_response = openai.ChatCompletion.create(
  model="gpt-4",
  messages=messages,
  temperature=1,
  max_tokens=1500,
)

print(chatbot_response.choices[0].message["content"])

neutral


We seem to have tricked our model into thinking we were neutral to the new Spider-Man movie with a little bit of sarcasm. 

### Now, let's try to fix this using CoT reasoning!

Notice that we provide a single example (one-shot prompting) and explain our reasoning in the answer there (chain of thought).

In [15]:
selected_article = "<article>"
template_prompt = f"""Classify the sentiment of the following review as "positive", "neutral" or "negative".

Text: My new pillow is so soft and fluffy! I don't like it.
The answer is: The reviewer initially seemed to praise the pillow but then said that they didn't actually like it.
Therefore the sentiment is "negative".

Text: {selected_article} 
The answer is: 
"""

print(template_prompt)

Classify the sentiment of the following review as "positive", "neutral" or "negative".

Text: My new pillow is so soft and fluffy! I don't like it.
The answer is: The reviewer initially seemed to praise the pillow but then said that they didn't actually like it.
Therefore the sentiment is "negative".

Text: <article> 
The answer is: 



In [16]:
selected_article = "I really loved the new Spider-Man movie. The sub-par animation and acting really made it amazing!"

In [17]:
import openai

prompt = template_prompt.replace('<article>', selected_article)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt},
]

chatbot_response = openai.ChatCompletion.create(
  model="gpt-4",
  messages=messages,
  temperature=1,
  max_tokens=1500,
)

print(chatbot_response.choices[0].message["content"])

The reviewer used sarcasm to express their dislike for the animation and acting in the movie. Therefore, the sentiment is "negative".


That's a lot better! The model has recognized the sarcasm there and predicts the right sentiment!

# Answer Spaces & Evaluation

A natural question to ask is - how do we evaluate these models?

For Natural Language Generation (NLG) tasks, the set of possible answers is unconstrained. That is, anything the model generates could potentially be an answer. There are a couple of methods to evaluate this, some of which we go into more detail on in day 2. 

For Natural Language Understanding (NLU), the set of possible answers is constrained. That is, there is a finite set of answers that we care about. For instance, negative, positive or neutral for sentiment analysis. We can then evaluate these models with tried and tested metrics like Precision and/or Recall. So how do we get these answers?

For open source models or models that we have direct access to, we can ensure we get only the probabilities of the answers we want by calculating the conditional probability of each of the possible answers. That is, for a given answer X, we calculate the probability `P(X|input)` for all possible answers. The highest probability then corresponds to the prediction.

However, this approach is not possible for closed source models where we only receive direct predictions. While we unfortunately don't have access to the actual confidence of a prediction, we can still get the prediction itself and evaluate the model. Consider the following prompts:

In [18]:
selected_article = "<article>"
template_prompt = f"""Classify the sentiment of the following review: {selected_article} """

print(template_prompt)

Classify the sentiment of the following review: <article> 


In [19]:
selected_article = "I loved the new Spider-Man movie!! The animation was really fluid"

In [20]:
import openai

prompt = template_prompt.replace('<article>', selected_article)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt},
]

chatbot_response = openai.ChatCompletion.create(
  model="gpt-4",
  messages=messages,
  temperature=1,
  max_tokens=1500,
)

print(chatbot_response.choices[0].message["content"])

The sentiment of the review is positive.


We get the answer we wanted but there's a extra text that makes it hard to work with! If we change the prompt a little, we get single words!

In [21]:
selected_article = "<article>"
template_prompt = f"""Classify the sentiment of the following review: {selected_article} 
The answer is: 
"""

print(template_prompt)

Classify the sentiment of the following review: <article> 
The answer is: 



In [22]:
selected_article = "I loved the new Spider-Man movie!! The animation was really fluid"

In [23]:
import openai

prompt = template_prompt.replace('<article>', selected_article)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt},
]

chatbot_response = openai.ChatCompletion.create(
  model="gpt-4",
  messages=messages,
  temperature=1,
  max_tokens=1500,
)

print(chatbot_response.choices[0].message["content"])

Positive


Thus by adjusting our prompts a little, we can nudge the model to give the outputs we want it to give in the format that we prefer. Now it's just a matter of compiling the model outputs for a number of different examples and compiling overall statistics!