# M-Shots Learning

In this notebook, we'll explore small prompt engineering techniques and recommendations that will help us elicit responses from the models that are better suited to our needs.

In [80]:
from openai import OpenAI
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

OPENAI_API_KEY  = os.getenv('OPENAI_API_KEY')

# Formatting the answer with Few Shot Samples.

To obtain the model's response in a specific format, we have various options, but one of the most convenient is to use Few-Shot Samples. This involves presenting the model with pairs of user queries and example responses.

Large models like GPT-3.5 respond well to the examples provided, adapting their response to the specified format.

Depending on the number of examples given, this technique can be referred to as:
* Zero-Shot.
* One-Shot.
* Few-Shots.

With One Shot should be enough, and it is recommended to use a maximum of six shots. It's important to remember that this information is passed in each query and occupies space in the input prompt.



In [81]:
# Function to call the model.
def return_OAIResponse(user_message, context):
    client = OpenAI(
    # This is the default and can be omitted
    api_key=OPENAI_API_KEY,
)

    newcontext = context.copy()
    newcontext.append({'role':'user', 'content':"question: " + user_message})

    response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=newcontext,
            temperature=1,
        )

    return (response.choices[0].message.content)

In this zero-shots prompt we obtain a correct response, but without formatting, as the model incorporates the information he wants.

In [82]:
#zero-shot
context_user = [
    {'role':'system', 'content':'You are an expert in F1.'}
]
print(return_OAIResponse("Who won the F1 2010?", context_user))

Sebastian Vettel won the F1 World Championship in 2010 driving for Red Bull Racing.


For a model as large and good as GPT 3.5, a single shot is enough to learn the output format we expect.


In [83]:
#one-shot
context_user = [
    {'role':'system', 'content':
     """You are an expert in F1.

     Who won the 2000 f1 championship?
     Driver: Michael Schumacher.
     Team: Ferrari."""}
]
print(return_OAIResponse("Who won the F1 2011?", context_user))

Driver: Sebastian Vettel.
Team: Red Bull Racing.


Smaller models, or more complicated formats, may require more than one shot. Here a sample with two shots.

In [84]:
#Few shots
context_user = [
    {'role':'system', 'content':
     """You are an expert in F1.

     Who won the 2010 f1 championship?
     Driver: Sebastian Vettel.
     Team: Red Bull Renault.

     Who won the 2009 f1 championship?
     Driver: Jenson Button.
     Team: BrawnGP."""}
]
print(return_OAIResponse("Who won the F1 2006?", context_user))

Driver: Fernando Alonso.
Team: Renault.


In [85]:
print(return_OAIResponse("Who won the F1 2019?", context_user))

The 2019 F1 championship was won by Lewis Hamilton driving for Mercedes.


We've been creating the prompt without using OpenAI's roles, and as we've seen, it worked correctly.

However, the proper way to do this is by using these roles to construct the prompt, making the model's learning process even more effective.

By not feeding it the entire prompt as if they were system commands, we enable the model to learn from a conversation, which is more realistic for it.

In [86]:
#Recomended solution
context_user = [
    {'role':'system', 'content':'You are an expert in f1.\n\n'},
    {'role':'user', 'content':'Who won the 2010 f1 championship?'},
    {'role':'assistant', 'content':"""Driver: Sebastian Vettel. \nTeam: Red Bull. \nPoints: 256. """},
    {'role':'user', 'content':'Who won the 2009 f1 championship?'},
    {'role':'assistant', 'content':"""Driver: Jenson Button. \nTeam: BrawnGP. \nPoints: 95. """},
]

print(return_OAIResponse("Who won the F1 2019?", context_user))

Driver: Lewis Hamilton. 
Team: Mercedes. 
Points: 413.


Extra polish (optional)
Use temperature=0.2 in return_OAIResponse for steadier answers.
Keep team naming consistent (e.g., “Red Bull” vs “Red Bull Renault”); consistency improves pattern following.
If you need machine-readable results later, ask for JSON in the system message instead of plain text.

We could also address it by using a more conventional prompt, describing what we want and how we want the format. However, it's essential to understand that in this case, the model is following instructions, whereas in the case of use shots, it is learning in real-time during inference.;

In [87]:
context_user = [ 
    {'role':'system', 'content':
        """
        You are and expert in f1. 
        You are going to answer the question of the user giving the name of the rider, 
        the name of the team and the points of the champion, following the format: Drive: Team: Points: """ } ] 

print(return_OAIResponse("Who won the F1 2019?", context_user))

Driver: Lewis Hamilton   Team: Mercedes   Points: 413


We could also address it by using a more conventional prompt, describing what we want and how we want the format.

However, it's essential to understand that in this case, the model is following instructions, whereas in the case of use shots, it is learning in real-time during inference.

In [88]:
context_user = [
    {
        "role": "system",
        "content": (
            "You are an expert in F1.\n"
            "Answer the user's question by giving the champion's details in EXACTLY three lines:\n"
            "Driver: <name>\n"
            "Team: <team>\n"
            "Points: <integer>\n"
            "Do not add extra text."
        ),
    }
]

print(return_OAIResponse("Who won the F1 2019?", context_user))

Driver: Lewis Hamilton
Team: Mercedes
Points: 413


In [89]:
context_user = [
    {'role':'system', 'content':
     """You are classifying .

     Who won the 2010 f1 championship?
     Driver: Sebastian Vettel.
     Team: Red Bull Renault.

     Who won the 2009 f1 championship?
     Driver: Jenson Button.
     Team: BrawnGP."""}
]
print(return_OAIResponse("Who won the F1 2006?", context_user))

Driver: Fernando Alonso.
Team: Renault.


Few Shots for classification.


In [90]:
context_user = [
    {'role':'system', 'content':
     """You are an expert in reviewing product opinions and classifying them as positive or negative.

     It fulfilled its function perfectly, I think the price is fair, I would buy it again.
     Sentiment: Positive

     It didn't work bad, but I wouldn't buy it again, maybe it's a bit expensive for what it does.
     Sentiment: Negative.

     I wouldn't know what to say, my son uses it, but he doesn't love it.
     Sentiment: Neutral
     """}
]
print(return_OAIResponse("I'm not going to return it, but I don't plan to buy it again.", context_user))

Sentiment: Negative


# Exercise
 - Complete the prompts similar to what we did in class. 
     - Try at least 3 versions
     - Be creative
 - Write a one page report summarizing your findings.
     - Were there variations that didn't work well? i.e., where GPT either hallucinated or wrong
 - What did you learn?

# Version 1 — Few-shot like example (baseline)

In [91]:
context_user = [
    {'role':'system','content':
    """You are an expert in reviewing product opinions and classifying them as positive, negative, or neutral.

    It fulfilled its function perfectly, I think the price is fair, I would buy it again.
    Sentiment: Positive

    It didn't work bad, but I wouldn't buy it again, maybe it's a bit expensive for what it does.
    Sentiment: Negative

    I wouldn't know what to say, my son uses it, but he doesn't love it.
    Sentiment: Neutral
    """}
]

print(return_OAIResponse(
    "I'm not going to return it, but I don't plan to buy it again.",
    context_user
))


Sentiment: Negative


# Version 2 — Structured system instruction + one example

In [92]:
context_user = [
    {'role':'system','content':
    """You classify product reviews into {Positive, Negative, Neutral}.
    Respond only with the sentiment label.

    Example:
    Review: "It works great, I will buy again."
    Answer: Positive
    """}
]

print(return_OAIResponse(
    "I'm not going to return it, but I don't plan to buy it again.",
    context_user
))


Neutral


# Version 3 — Output as JSON + explicit definitions


In [93]:
context_user = [
    {'role':'system','content':
    """You are a sentiment classifier.
    
    Rules:
    - Positive: reviewer intends to buy again or expresses satisfaction
    - Negative: reviewer would not buy again or expresses dissatisfaction
    - Neutral: unclear or mixed without a buying decision

    Respond in JSON format:
    {"sentiment": "Positive|Negative|Neutral"}
    """}
]

print(return_OAIResponse(
    "I'm not going to return it, but I don't plan to buy it again.",
    context_user
))


{"sentiment": "Neutral"}


Goal

Test different prompting strategies to classify product review sentiment (positive/negative/neutral) and evaluate consistency, quality, and hallucination.

Approach
Version 1:
Style: Few-shot examples
Notes: Teaches through examples, similar to training

Version 2:
Style: Introduction + 1 example
Notes: More compact - focus on task definition

Version 3:
Style: Structured JSON output
Notes: Most precise + maschine readable


Results
The last two versions correctly classified the sentiment as Neutral.

Learned:
1. Prompt design changes the model's behavior.
Even though the input sentence stays the same, the way you instruct the model dramatically affects the output.
Few-shot prompting (examples) → model imitates the pattern → more accurate
Pure instruction prompting → model reasons on its own → may misinterpret subtle cases
Lesson: The model isn't “learning facts” — it is responding based on framing.

2. Temperature matters for classification
The used value for temperature = 1, which introduces randomness.
At higher temperature, GPT is more “creative,” which is bad for precise tasks like sentiment classification.
Low temperature = consistency
High temperature = creative drift / misclassification

3. Ambiguous input → model uncertainty
The test phrase: "I'm not going to return it, but I don't plan to buy it again."
Expresses: Not terrible (not returning). However, not good enough to repurchase
This is borderline between Neutral and Negative, so when the instructions changed, the model interpreted it differently.
Lesson: Clear definitions improve consistency:
“Would not buy again” = 

4. Few-shot examples guide the model better
The version with sentiment examples got the correct output because the model copied the pattern.
The later prompts lacked concrete examples, and the model relied on its own interpretation — producing Neutral.
Lesson: Few-shot examples reduce ambiguity.

Conclusion: 

The experiment showed:

-  style strongly affects model output;
- Few-shot learning leads to more consistent classification than plain instructions for subtle sentiment tasks
- High temperature causes variability in results; classification tasks should use low temperature (0–0.3)
- The model doesn't truly “learn” — it reacts to prompt framing in real-time
- Final takeaway

For sentiment analysis:
It is best to use few-shot examples + low temperature for consistent and accurate results.