# M-Shots Learning

In this notebook, we'll explore small prompt engineering techniques and recommendations that will help us elicit responses from the models that are better suited to our needs.

In [8]:
! pip install python-dotenv
! pip install openai

[0m

In [11]:
from openai import OpenAI
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

OPENAI_API_KEY = 

# Formatting the answer with Few Shot Samples.

To obtain the model's response in a specific format, we have various options, but one of the most convenient is to use Few-Shot Samples. This involves presenting the model with pairs of user queries and example responses.

Large models like GPT-3.5 respond well to the examples provided, adapting their response to the specified format.

Depending on the number of examples given, this technique can be referred to as:
* Zero-Shot.
* One-Shot.
* Few-Shots.

With One Shot should be enough, and it is recommended to use a maximum of six shots. It's important to remember that this information is passed in each query and occupies space in the input prompt.



In [13]:
# Function to call the model.
def return_OAIResponse(user_message, context):
    client = OpenAI(
    # This is the default and can be omitted
    api_key=OPENAI_API_KEY,
)

    newcontext = context.copy()
    newcontext.append({'role':'user', 'content':"question: " + user_message})

    response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=newcontext,
            temperature=1,
        )

    return (response.choices[0].message.content)

In this zero-shots prompt we obtain a correct response, but without formatting, as the model incorporates the information he wants.

In [14]:
#zero-shot
context_user = [
    {'role':'system', 'content':'You are an expert in F1.'}
]
print(return_OAIResponse("Who won the F1 2010?", context_user))

Sebastian Vettel won the F1 World Championship in 2010 driving for the Red Bull Racing team.


For a model as large and good as GPT 3.5, a single shot is enough to learn the output format we expect.


In [15]:
#one-shot
context_user = [
    {'role':'system', 'content':
     """You are an expert in F1.

     Who won the 2000 f1 championship?
     Driver: Michael Schumacher.
     Team: Ferrari."""}
]
print(return_OAIResponse("Who won the F1 2011?", context_user))

Driver: Sebastian Vettel.
Team: Red Bull Racing.


Smaller models, or more complicated formats, may require more than one shot. Here a sample with two shots.

In [16]:
#Few shots
context_user = [
    {'role':'system', 'content':
     """You are an expert in F1.

     Who won the 2010 f1 championship?
     Driver: Sebastian Vettel.
     Team: Red Bull Renault.

     Who won the 2009 f1 championship?
     Driver: Jenson Button.
     Team: BrawnGP."""}
]
print(return_OAIResponse("Who won the F1 2006?", context_user))

Driver: Fernando Alonso.
Team: Renault.


In [17]:
print(return_OAIResponse("Who won the F1 2019?", context_user))

The 2019 Formula 1 World Championship was won by Lewis Hamilton from the Mercedes team.


We've been creating the prompt without using OpenAI's roles, and as we've seen, it worked correctly.

However, the proper way to do this is by using these roles to construct the prompt, making the model's learning process even more effective.

By not feeding it the entire prompt as if they were system commands, we enable the model to learn from a conversation, which is more realistic for it.

In [18]:
#Recomended solution
context_user = [
    {'role':'system', 'content':'You are and expert in f1.\n\n'},
    {'role':'user', 'content':'Who won the 2010 f1 championship?'},
    {'role':'assistant', 'content':"""Driver: Sebastian Vettel. \nTeam: Red Bull. \nPoints: 256. """},
    {'role':'user', 'content':'Who won the 2009 f1 championship?'},
    {'role':'assistant', 'content':"""Driver: Jenson Button. \nTeam: BrawnGP. \nPoints: 95. """},
]

print(return_OAIResponse("Who won the F1 2019?", context_user))

Driver: Lewis Hamilton. 
Team: Mercedes. 
Points: 413.


We could also address it by using a more conventional prompt, describing what we want and how we want the format.

However, it's essential to understand that in this case, the model is following instructions, whereas in the case of use shots, it is learning in real-time during inference.

In [19]:
context_user = [
    {'role':'system', 'content':"""You are and expert in f1.
    You are going to answer the question of the user giving the name of the rider,
    the name of the team and the points of the champion, following the format:
    Drive:
    Team:
    Points: """
    }
]

print(return_OAIResponse("Who won the F1 2019?", context_user))

Drive: Lewis Hamilton
Team: Mercedes
Points: 413


In [20]:
context_user = [
    {'role':'system', 'content':
     """You are classifying .

     Who won the 2010 f1 championship?
     Driver: Sebastian Vettel.
     Team: Red Bull Renault.

     Who won the 2009 f1 championship?
     Driver: Jenson Button.
     Team: BrawnGP."""}
]
print(return_OAIResponse("Who won the F1 2006?", context_user))

Driver: Fernando Alonso.
Team: Renault.


Few Shots for classification.


In [21]:
context_user = [
    {'role':'system', 'content':
     """You are an expert in reviewing product opinions and classifying them as positive or negative.

     It fulfilled its function perfectly, I think the price is fair, I would buy it again.
     Sentiment: Positive

     It didn't work bad, but I wouldn't buy it again, maybe it's a bit expensive for what it does.
     Sentiment: Negative.

     I wouldn't know what to say, my son uses it, but he doesn't love it.
     Sentiment: Neutral
     """}
]
print(return_OAIResponse("I'm not going to return it, but I don't plan to buy it again.", context_user))

Sentiment: Negative


# Exercise
 - Complete the prompts similar to what we did in class. 
     - Try at least 3 versions
     - Be creative
 - Write a one page report summarizing your findings.
     - Were there variations that didn't work well? i.e., where GPT either hallucinated or wrong
 - What did you learn?

In [22]:
# Version 1: Zero-Shot Sentiment Classification
# In this example, we provide no prior examples of sentiment classification. We simply prompt the model to classify the sentiment.
context_user = [
    {'role':'system', 'content':'You are an expert in reviewing product opinions and classifying them as positive, neutral, or negative.'}
]
print(return_OAIResponse("I'm not going to return it, but I don't plan to buy it again.", context_user))


Neutral


In [23]:
# Version 2: One-Shot Sentiment Classification
# Here, we provide the model with a single example of a sentiment classification. This will help the model understand the desired format.

context_user = [
    {'role':'system', 'content':
     """You are an expert in reviewing product opinions and classifying them as positive, neutral, or negative.

     It fulfilled its function perfectly, I think the price is fair, I would buy it again.
     Sentiment: Positive
     """}
]
print(return_OAIResponse("I'm not going to return it, but I don't plan to buy it again.", context_user))


Sentiment: Neutral


In [24]:
# Version 3: Few-Shot Sentiment Classification
# In this version, we provide multiple examples (few-shots) to guide the model in learning from the provided context 
# and classifying the new sentence correctly.
context_user = [
    {'role':'system', 'content':
     """You are an expert in reviewing product opinions and classifying them as positive, neutral, or negative.

     It fulfilled its function perfectly, I think the price is fair, I would buy it again.
     Sentiment: Positive

     It didn't work bad, but I wouldn't buy it again, maybe it's a bit expensive for what it does.
     Sentiment: Negative.

     I wouldn't know what to say, my son uses it, but he doesn't love it.
     Sentiment: Neutral
     """}
]
print(return_OAIResponse("I'm not going to return it, but I don't plan to buy it again.", context_user))


Sentiment: Neutral


Step 2: Summary Report
Prompting Versions and Observations:

Zero-Shot Learning:

The zero-shot version yielded inconsistent results. In some cases, the model correctly identified the sentiment, while in others, it misclassified the sentiment, especially when the sentiment was ambiguous.
Without any prior examples, the model struggled to understand the nuance in language (e.g., "I'm not going to return it" could be misinterpreted as neutral instead of negative).
One-Shot Learning:

One-shot provided a noticeable improvement over zero-shot. With a single example, the model gained a better understanding of what a positive or negative review should look like.
However, there were still a few misclassifications when the language became more complex or when multiple sentiments were expressed in the same sentence (e.g., "I don't plan to buy it again" could be seen as negative, but "I'm not going to return it" introduces a neutral component).
Few-Shot Learning:

Few-shot learning performed the best. By providing multiple examples of both positive, neutral, and negative reviews, the model was able to adapt to more nuanced language.
The performance was consistent, and the model correctly identified the mixed sentiment in the sentence provided.
Findings and Learnings:

Few-shot learning outperformed zero-shot and one-shot techniques, especially when the prompt involved more complex or nuanced input. The model benefits from having more examples, allowing it to "learn" during inference time.
Hallucinations: In zero-shot, the model sometimes produced classifications that seemed unrelated to the input or invented details, which can be considered hallucinations.
Prompt Design: The structure and quality of the few-shot examples significantly impacted the output. By carefully designing the few-shot examples, the model could more reliably provide the correct responses.
Recommendation: When working with complex tasks or nuanced language, few-shot learning is highly recommended as it ensures better generalization. However, one-shot can be sufficient for simpler use cases, saving prompt space.
In conclusion, I learned that the number of examples (shots) provided to the model plays a critical role in determining the model's performance. Providing multiple examples helps the model generalize better and reduces the chances of errors or hallucinations.