# In-Context Learning and Prompting


### Basic prompting

The simplest way to use a language model: provide a prompt `x` and sample a completion `y ~ p(y|x)`. The model treats the prompt as a prefix and
generates a continuation based on patterns learned during pretraining.

# Seleccionar el modelo a utilizar

In [1]:
SELECTED = 'meta350' # Ingresar el modelo de prefenrecia

'''
'meta350' para el modelo de facebook opt-350m
'smol135' para el modelo de HuggingFaceTB/SmolLM2-135M
'smol360' para el modelo de HuggingFaceTB/SmolLM2-360M # Modelo que usó Karen
'''

"\n'meta350' para el modelo de facebook opt-350m\n'smol135' para el modelo de HuggingFaceTB/SmolLM2-135M\n'smol360' para el modelo de HuggingFaceTB/SmolLM2-360M # Modelo que usó Karen\n"

In [2]:
from IPython.display import display, Markdown

# Diccionario de modelos
MODELS = {
    'meta350': 'facebook/opt-350m',
    'smol135': 'HuggingFaceTB/SmolLM2-135M',
    'smol360': 'HuggingFaceTB/SmolLM2-360M',
}

MODEL = MODELS[SELECTED]

texto_markdown = f"""
# Modelo Seleccionado
Estas usando **{MODEL}** en el notebook.
"""

display(Markdown(texto_markdown))


# Modelo Seleccionado
Estas usando **facebook/opt-350m** en el notebook.


In [3]:
import transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

model = MODEL

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model)

#### Make a prompt `x` and tokenize it

In [4]:
x = "Do Androids Dream of Electric Sheep?"

inputs = tokenizer(x, return_tensors='pt')
inputs

{'input_ids': tensor([[    2,  8275,   178,  1001,  7823,  7419,     9,  7219, 38112,   116]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

#### Generate a response

Here generating means autoregressive sampling, i.e.

```
context = `x`
for t in 0 .. max_new_tokens:
    Sample next token, y_t ~ p(y_t|context)
    Append y_t to context
```

do_sample: if True, the model samples from the probability distribution. If False, it uses greedy decoding.
temperature: scales the distribution before sampling.

*   smaller than 1.0 → sharper, more deterministic.
*   bigger than 1.0 → more random, creative.

top_k: restricts sampling to the top k most likely tokens. (top_k=50 is common).

top_p (aka nucleus sampling): restricts sampling to the smallest set of tokens whose probabilities sum to p. (top_p=0.9 is common).

In [5]:
outputs = model.generate(
    **inputs,
    max_new_tokens=20,
    do_sample=True,
    num_return_sequences=5,
    pad_token_id=tokenizer.eos_token_id
)

for i in range(5):
    print(f"===={i}====")
    print(tokenizer.decode(outputs[i]))

====0====
</s>Do Androids Dream of Electric Sheep?

In an interview with the Guardian, we discussed The Matrix, The Matrix: Covenant, and
====1====
</s>Do Androids Dream of Electric Sheep? I think I'm ready
>do Androids Dream of Electric Sheep? I think I'm
====2====
</s>Do Androids Dream of Electric Sheep?  I can't even remember if an episode or two of that is on tv.  I think
====3====
</s>Do Androids Dream of Electric Sheep?

I thought we could all agree that the AGE series was bad? If you’
====4====
</s>Do Androids Dream of Electric Sheep?

On a related topic, we’ve just learned that, since the opening night,


In [6]:
outputs = model.generate(
    **inputs,
    max_new_tokens=40,
    do_sample=True,
    num_return_sequences=3,
    top_p=0.5,
    pad_token_id=tokenizer.eos_token_id
)

for i in range(3):
    print(f"===={i}====")
    print(tokenizer.decode(outputs[i]))

====0====
</s>Do Androids Dream of Electric Sheep?

The answer to this question is no. Androids Dream of Electric Sheep is a great movie, but it is not a good movie. It is not a good movie. It is not
====1====
</s>Do Androids Dream of Electric Sheep?

This week’s episode of the Electric Sheep podcast is a special guest, John D.

John D. is the founder and editor-in-chief of the blog, The
====2====
</s>Do Androids Dream of Electric Sheep?

I am not sure if you have ever heard of the term “electric sheep”. I was going to say that it is a very common term used in the art world. But


In [7]:
outputs = model.generate(
    **inputs,
    max_new_tokens=40,
    do_sample=True,
    num_return_sequences=3,
    temperature=0.1,
    pad_token_id=tokenizer.eos_token_id
)

for i in range(3):
    print(f"===={i}====")
    print(tokenizer.decode(outputs[i]))

====0====
</s>Do Androids Dream of Electric Sheep?

Do Androids Dream of Electric Sheep? is a short film by American filmmaker and filmmaker, Michael A. Cohen. It was released on October 26, 2016.

Plot
The
====1====
</s>Do Androids Dream of Electric Sheep?

Do Androids Dream of Electric Sheep? is a short film by American filmmaker and actor, Michael Caine. It was released on DVD on October 29, 2011.

Plot

====2====
</s>Do Androids Dream of Electric Sheep?

Do Androids Dream of Electric Sheep? is a short film by American filmmaker and filmmaker, Michael A. Hirst. It was released on October 21, 2010.

Plot



### Instruction prompt ("zero shot")

Instead of just continuing text, we can prompt the model to perform a specific task by providing an instruction. This is "zero-shot" because we give no examples, just the task description. The model uses any instruction-following related patterns it learned during training.

In [7]:
prompt_template = """Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:
{sentence}
Classification:"""


sentences = [
    "I've seen things you people wouldn't believe.",
    "Attack ships on fire off the shoulder of Orion.",
    "I watched C-beams glitter in the dark near the Tannhäuser Gate.",
    "All those moments will be lost in time, like tears in rain. Time to die"
]

prompt = prompt_template.format(sentence=sentences[0])
print(prompt)


Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:
I've seen things you people wouldn't believe.
Classification:


In [9]:
for sentence in sentences:
    print(f"\n=============")

    prompt = prompt_template.format(sentence=sentence)
    inputs = tokenizer(prompt, return_tensors='pt')
    outputs = model.generate(
        **inputs,
        max_new_tokens=40,
        do_sample=True,
        num_return_sequences=2,
        pad_token_id=tokenizer.eos_token_id
    )

    for i in range(2):
        print(f"----{i}----")
        print(tokenizer.decode(outputs[i]))


----0----
</s>Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:
I've seen things you people wouldn't believe.
Classification: "the use of descriptive words to describe different aspects of things which, in turn, influence or are defined as a particular condition, action, or phenomenon, is typically done intentionally or otherwise as a way
----1----
</s>Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:
I've seen things you people wouldn't believe.
Classification: you should have thought about that before you thought about something like this.
Classification: what the fuck are you talking about. You do know that you can read anything on the Internet so why so

----0----
</s>Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:
Attack ships on fire off the shoulder of Orion.
Classification:
[a] Any action done as a result of an actual attack of

It's important to ensure that the output is formatted correctly!

### Instruction + examples ("few-shot")

We can provide examples of input-output pairs before the test input. This "few-shot" or "in-context learning" approach helps the model understand the task format and expected outputs without any parameter updates.

In [8]:
prompt_template = """Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:

Sentence:
"Attack ships on fire off the shoulder of Orion."
Classification:
action

Sentence:
I watched C-beams glitter in the dark near the Tannhäuser Gate.
Classification:
action

Sentence:
{sentence}
Classification:
"""



In [9]:
for sentence in sentences:
    print(f"\n=============")
    prompt = prompt_template.format(sentence=sentence)
    inputs = tokenizer(prompt, return_tensors='pt')
    outputs = model.generate(
        **inputs,
        max_new_tokens=20,
        do_sample=True,
        num_return_sequences=2,
        stop_strings=["\n\n"],
        tokenizer=tokenizer,
        pad_token_id=tokenizer.eos_token_id
    )

    for i in range(2):
        print(f"----{i}----")
        print(tokenizer.decode(outputs[i]))


----0----
</s>Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:

Sentence:
"Attack ships on fire off the shoulder of Orion."
Classification:
action

Sentence:
I watched C-beams glitter in the dark near the Tannhäuser Gate.
Classification:
action

Sentence:
I've seen things you people wouldn't believe.
Classification:
action

</s>
----1----
</s>Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:

Sentence:
"Attack ships on fire off the shoulder of Orion."
Classification:
action

Sentence:
I watched C-beams glitter in the dark near the Tannhäuser Gate.
Classification:
action

Sentence:
I've seen things you people wouldn't believe.
Classification:
abstract



----0----
</s>Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:

Sentence:
"Attack ships on fire off the shoulder of Orion."
Classification:
action

Sentence:
I watched C-beams glitter i

### Chat templates

Some models have been fine-tuned to operate as chat assistants. The chat is represented as a series of messages that are turned into a string using special tags. There is also a *system message* that provides instructions about how the model should behave.

These models are often called "instruct" models because they've been trained to follow instructions rather than just complete text.

In [10]:
import transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

model = "HuggingFaceTB/SmolLM2-135M-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model)

tokenizer_config.json: 0.00B [00:00, ?B/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/655 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/861 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/269M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

In [12]:
messages = [{
    "role": "user",
    "content": "What is consciousness?."
}]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
print("Input text: ", input_text, sep="\n")


Input text: 
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
What is consciousness?.<|im_end|>



Generate a response

In [13]:
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=500, do_sample=True)
print(tokenizer.decode(outputs[0]))

<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
What is consciousness?.<|im_end|>
<|im_start|>assistant
"Consciousness" is a mysterious concept in science and philosophy. At first glance, it seems like a philosophical inquiry of sorts, as it deals with the idea of subjective, self-aware existence. However, upon closer examination, it can be understood to mean the ability to experience and understand the intricate web of physical laws and phenomena that govern the universe.

The concept of consciousness is often described as the subjective aspect of a general phenomenon known as subjective experience. Intentionality, the perception of self-awareness or a subjective identity, is not essential for consciousness. It is not about having a direct brain function.

The fundamental idea is that our brains interact with the physical world through our perception of ourselves and our surroundings. This perception is often based on 

### System prompts

Chat models often support a system message that sets the model's behavior or role. This message is typically prepended to the conversation and instructs the model how to respond throughout the interaction. For example, we can use the system prompt to have the model respond in French.

In [14]:
messages = [
    {
        "role": "system",
        "content": "You are an assistant that speaks in French."
    },
    {
        "role": "user",
        "content": "What is the capital of France."
    }
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True)
print(tokenizer.decode(outputs[0]))

<|im_start|>system
You are an assistant that speaks in French.<|im_end|>
<|im_start|>user
What is the capital of France.<|im_end|>
<|im_start|>assistant
Le Grand Est, la réserve national pour le cité national, l'écit chretien, est la capital de la France. La réserve national pour le cité national est une véritable


### Instruction ("zero shot")

Using the chat format for zero-shot tasks. The instruction-tuned model may follow instructions more reliably than the base model, though output formatting can still be inconsistent.

In [15]:
messages = [
    {
        "role": "user",
        "content": ("Classify the sentence's sentiment as 'Positive' or 'Negative':\n" +
                    "Sentence: 'I love the NLP class!'\n")
    }
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, num_return_sequences=3)
for i in range(len(outputs)):
    print(f"----{i}----")
    print(tokenizer.decode(outputs[i]))

----0----
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
Classify the sentence's sentiment as 'Positive' or 'Negative':
Sentence: 'I love the NLP class!'
<|im_end|>
<|im_start|>assistant
Likely Positive<|im_end|>
----1----
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
Classify the sentence's sentiment as 'Positive' or 'Negative':
Sentence: 'I love the NLP class!'
<|im_end|>
<|im_start|>assistant
"Positive"<|im_end|><|im_end|>
----2----
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
Classify the sentence's sentiment as 'Positive' or 'Negative':
Sentence: 'I love the NLP class!'
<|im_end|>
<|im_start|>assistant
Positive<|im_end|><|im_end|><|im_end|><|im_end|>


#### Approach 1: write a detailed instruction (in the user or system prompt)

We can improve output formatting by providing explicit, detailed instructions about the desired format in either the system or user message.

In [16]:
messages = [
    {   "role": "system",
        "content": """You are an expert sentiment classifier.
Your task is to classify a sentence's sentiment as 'Positive' or 'Negative'.
The user will provide you with a sentence.
Format your output as:

Classification: Positive or Negative
"""
    },
    {
        "role": "user",
        "content": ("I love the NLP class!")
    }
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, num_return_sequences=3)
for i in range(len(outputs)):
    print(f"----{i}----")
    print(tokenizer.decode(outputs[i]))

----0----
<|im_start|>system
You are an expert sentiment classifier.
Your task is to classify a sentence's sentiment as 'Positive' or 'Negative'.
The user will provide you with a sentence.
Format your output as:

Classification: Positive or Negative
<|im_end|>
<|im_start|>user
I love the NLP class!<|im_end|>
<|im_start|>assistant
My class taught me so much!<|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|><|im_end|>
----1----
<|im_start|>system
You are an expert sentiment classifier.
Your task is to classify a sentence's sentiment as 'Positive' or 'Negative'.
The user will provide you with a sentence.
Format your output as:

Classification: Positive or Negative


#### Approach 2: provide examples (either in the system prompt or as a sequence of messages)

We can show the model the desired behavior through example conversations, where the assistant demonstrates the correct format and task execution. This is analogous to the few-shot examples we saw earlier, but using the chat format.

In [17]:
messages = [
    {
        "role": "user",
        "content": "Classify the sentence's sentiment as 'Positive' or 'Negative':\nSentence: 'This is such a cool lecture!'"
    },
    {
        "role": "assistant",
        "content": "Classification: Positive"
    },
    {
        "role": "user",
        "content": "Classify the sentence's sentiment as 'Positive' or 'Negative':\nSentence: 'I really don't like the last scene.'"
    },
    {
        "role": "assistant",
        "content": "Classification: Negative"
    },
    {
        "role": "user",
        "content": "Classify the sentence's sentiment as 'Positive' or 'Negative':\nSentence: 'I love the NLP class!'"
    }
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, num_return_sequences=3)
for i in range(len(outputs)):
    print(f"----{i}----")
    print(tokenizer.decode(outputs[i]))

----0----
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
Classify the sentence's sentiment as 'Positive' or 'Negative':
Sentence: 'This is such a cool lecture!'<|im_end|>
<|im_start|>assistant
Classification: Positive<|im_end|>
<|im_start|>user
Classify the sentence's sentiment as 'Positive' or 'Negative':
Sentence: 'I really don't like the last scene.'<|im_end|>
<|im_start|>assistant
Classification: Negative<|im_end|>
<|im_start|>user
Classify the sentence's sentiment as 'Positive' or 'Negative':
Sentence: 'I love the NLP class!'<|im_end|>
<|im_start|>assistant
Classification: Both positive and negative<|im_end|>
----1----
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
Classify the sentence's sentiment as 'Positive' or 'Negative':
Sentence: 'This is such a cool lecture!'<|im_end|>
<|im_start|>assistant
Classification: Positive<|im_end|>
<|im_start|>us

### Chain-of-thought with base model

Prompting the model to "think step by step" can elicit intermediate reasoning steps before the final answer. Even base models can exhibit this behavior when prompted appropriately, though the reasoning may be flawed.

In [18]:
model = "HuggingFaceTB/SmolLM2-360M"

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model)

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/831 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/689 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/724M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

In [19]:
prompts = [
    """Q: On average Joe throws 25 punches per minute.
A fight lasts 5 rounds of 3 minutes.
How many punches did he throw?
A: Let's think step by step.""",
]

for prompt in prompts:
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(
        **inputs,
        pad_token_id=tokenizer.eos_token_id,
        max_new_tokens=512,
        temperature=0.4,
        do_sample=True
    )
    print(tokenizer.decode(outputs[0]))
    print("====")

Q: On average Joe throws 25 punches per minute.
A fight lasts 5 rounds of 3 minutes.
How many punches did he throw?
A: Let's think step by step.

Q: Joe is throwing 25 punches per minute.
A: 25 punches per minute.

Q: Joe is throwing 25 punches per minute.
A: 25 punches per minute.

Q: Joe is throwing 25 punches per minute.
A: 25 punches per minute.

Q: Joe is throwing 25 punches per minute.
A: 25 punches per minute.

Q: Joe is throwing 25 punches per minute.
A: 25 punches per minute.

Q: Joe is throwing 25 punches per minute.
A: 25 punches per minute.

Q: Joe is throwing 25 punches per minute.
A: 25 punches per minute.

Q: Joe is throwing 25 punches per minute.
A: 25 punches per minute.

Q: Joe is throwing 25 punches per minute.
A: 25 punches per minute.

Q: Joe is throwing 25 punches per minute.
A: 25 punches per minute.

Q: Joe is throwing 25 punches per minute.
A: 25 punches per minute.

Q: Joe is throwing 25 punches per minute.
A: 25 punches per minute.

Q: Joe is throwing 25 punc

### Chain-of-thought with instruct model

Many instruction-tuned models can solve problems step-by-step when asked, as they've typically been trained to follow such instructions.

In [20]:
model = "HuggingFaceTB/SmolLM2-360M-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model)

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/655 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/846 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/724M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

In [21]:
messages = [
    {
        "role": "user",
        "content": """Solve the problem:
On average Joe throws 25 punches per minute.
A fight lasts 5 rounds of 3 minutes.
How many punches did he throw?"""
    }
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.4, do_sample=True)
print(tokenizer.decode(outputs[0]))

<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
Solve the problem:
On average Joe throws 25 punches per minute.
A fight lasts 5 rounds of 3 minutes.
How many punches did he throw?<|im_end|>
<|im_start|>assistant
Joe throws 25 punches per minute, so he can throw 25 * 60 = 1500 punches per minute.
A fight lasts 5 rounds of 3 minutes, so Joe can throw 1500 * 5 = 7500 punches per fight.
He can throw 7500 punches per fight.
#### 7500
The answer is: 7500<|im_end|>


### Program-aided reasoning

Instead of natural language reasoning, we can prompt the model to solve problems by writing and executing code. This leverages the model's code generation capabilities and the code executor's accurate computations.

In [22]:
messages = [
    {
        "role": "user",
        "content": """Solve the problem by writing a Python program:
On average Joe throws 25 punches per minute.
A fight lasts 5 rounds of 3 minutes.
How many punches did he throw?"""
    }
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.4, do_sample=True)
print(tokenizer.decode(outputs[0]))

<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
Solve the problem by writing a Python program:
On average Joe throws 25 punches per minute.
A fight lasts 5 rounds of 3 minutes.
How many punches did he throw?<|im_end|>
<|im_start|>assistant
Here is a Python program to solve the problem:

```python
import time

# Define the punch duration
punch_duration = 25

# Define the fight duration
fight_duration = 5 * 3

# Define the number of rounds
rounds = 3

# Define the punch duration per round
punch_duration_per_round = punch_duration / rounds

# Define the punch duration per minute
punch_duration_per_minute = punch_duration_per_round * 60

# Define the punch duration per minute per round
punch_duration_per_minute_per_round = punch_duration_per_minute / punch_duration_per_round

# Calculate the punch duration per minute
punch_duration_per_minute_per_round = punch_duration_per_minute_per_round / round(punch_duration_per_round)


# AG News Dataset: In-Context Learning and Fine-tuning

Now we'll apply the ICL techniques we've learned to a real-world dataset: AG News. This dataset contains news articles classified into 4 categories. We'll compare zero-shot, few-shot, and fine-tuning approaches.

In [1]:
from datasets import load_dataset
import pandas as pd
import numpy as np

# Cargar AG News dataset
print("Loading AG News dataset...")
dataset = load_dataset("ag_news")
train_data = dataset['train']
test_data = dataset['test']

# Explorar estructura del dataset
print(f"Training samples: {len(train_data)}")
print(f"Test samples: {len(test_data)}")

# Categorías del dataset AG News
categories = {0: "World", 1: "Sports", 2: "Business", 3: "Technology"}

print("\nAG News Categories:")
for idx, cat in categories.items():
    print(f"{idx}: {cat}")

# Mostrar algunos ejemplos
print("\n=== Sample Examples ===")
for i in range(2):
    example = train_data[i]
    print(f"\nCategory: {categories[example['label']]} ({example['label']})")
    print(f"Text: {example['text'][:200]}...")

Loading AG News dataset...
Training samples: 120000
Test samples: 7600

AG News Categories:
0: World
1: Sports
2: Business
3: Technology

=== Sample Examples ===

Category: Business (2)
Text: Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\band of ultra-cynics, are seeing green again....

Category: Business (2)
Text: Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - Private investment firm Carlyle Group,\which has a reputation for making well-timed and occasionally\controversial plays in the defense in...
Training samples: 120000
Test samples: 7600

AG News Categories:
0: World
1: Sports
2: Business
3: Technology

=== Sample Examples ===

Category: Business (2)
Text: Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\band of ultra-cynics, are seeing green again....

Category: Business (2)
Text: Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - Private investment fi

### Zero-shot News Classification

Let's start with zero-shot classification using our base model. We'll use the same model we've been working with and prompt it to classify news articles into the 4 AG News categories.

In [2]:
# First, let's reload our selected model since the kernel was restarted
SELECTED = 'meta350' # Redefining the model selection

MODELS = {
    'meta350': 'facebook/opt-350m',
    'smol135': 'HuggingFaceTB/SmolLM2-135M',
    'smol360': 'HuggingFaceTB/SmolLM2-360M',
}

MODEL = MODELS[SELECTED]
print(f"Using model: {MODEL}")

# Reload model and tokenizer
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(MODEL)

Using model: facebook/opt-350m


In [8]:
# Define prompt template for zero-shot news classification - natural language approach
prompt_template_news_zero = """This news article is about {text}

This article belongs to the category of"""

# Test with a few examples from test set
test_samples = [test_data[i] for i in range(5)]

print("=== Zero-shot News Classification Results ===")
for i, sample in enumerate(test_samples):
    prompt = prompt_template_news_zero.format(text=sample['text'][:200])  # Limit text length
    inputs = tokenizer(prompt, return_tensors='pt')
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=5,  # Reduced to get just the category
        do_sample=False,  # Use greedy decoding for more consistent results
        temperature=0.1,
        pad_token_id=tokenizer.eos_token_id
    )
    
    # Decode and extract prediction
    full_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    actual_category = categories[sample['label']]
    
    # Better extraction of the prediction
    if 'The category is:' in full_text:
        prediction_part = full_text.split('The category is:')[-1].strip()
        predicted_category = prediction_part.split()[0] if prediction_part.split() else "Unknown"
    else:
        # Look for any of the expected categories in the generated text
        generated_lower = full_text.lower()
        if 'world' in generated_lower:
            predicted_category = "World"
        elif 'sports' in generated_lower:
            predicted_category = "Sports"  
        elif 'business' in generated_lower:
            predicted_category = "Business"
        elif 'technology' in generated_lower:
            predicted_category = "Technology"
        else:
            predicted_category = "No prediction"
    
    print(f"\n--- Example {i+1} ---")
    print(f"Article: {sample['text'][:120]}...")
    print(f"Actual: {actual_category}")
    print(f"Predicted: {predicted_category}")
    
    # Debug: Show full generated text for first example
    if i == 0:
        print(f"Debug - Full generated text: '{full_text[-50:]}'")  # Show last 50 chars

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


=== Zero-shot News Classification Results ===

--- Example 1 ---
Article: Fears for T N pension after talks Unions representing workers at Turner   Newall say they are 'disappointed' after talks...
Actual: Business
Predicted: No prediction
Debug - Full generated text: '
This article belongs to the category of 'News'.

'

--- Example 1 ---
Article: Fears for T N pension after talks Unions representing workers at Turner   Newall say they are 'disappointed' after talks...
Actual: Business
Predicted: No prediction
Debug - Full generated text: '
This article belongs to the category of 'News'.

'

--- Example 2 ---
Article: The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com) SPACE.com - TORONTO, Canada -- A ...
Actual: Technology
Predicted: No prediction

--- Example 2 ---
Article: The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com) SPACE.com - TORONTO, Canada -- A ...
Actual: Technology
Predicted: No prediction

--- Examp

### Few-shot News Classification

Now let's apply few-shot learning by providing examples of each category before asking the model to classify new articles. This should improve the model's understanding of the task format and expected outputs.

In [9]:
# Create few-shot examples - one from each category
few_shot_examples = {
    "World": "UN Security Council meets to discuss ongoing tensions in Eastern Europe amid diplomatic efforts.",
    "Sports": "Manchester United defeats Chelsea 3-1 in Premier League match with brilliant performance from midfield.",
    "Business": "Tech stocks rally as quarterly earnings exceed expectations, driving market indices to new highs.",
    "Technology": "Apple unveils new iPhone with advanced AI capabilities and improved battery life at annual event."
}

# Define few-shot prompt template
prompt_template_news_few = """Classify the following news articles into one of these categories: World, Sports, Business, Technology.

Article: UN Security Council meets to discuss ongoing tensions in Eastern Europe amid diplomatic efforts.
Category: World

Article: Manchester United defeats Chelsea 3-1 in Premier League match with brilliant performance from midfield.
Category: Sports

Article: Tech stocks rally as quarterly earnings exceed expectations, driving market indices to new highs.
Category: Business

Article: Apple unveils new iPhone with advanced AI capabilities and improved battery life at annual event.
Category: Technology

Article: {text}
Category:"""

print("=== Few-shot News Classification Results ===")
correct_count = 0
total_count = len(test_samples)

for i, sample in enumerate(test_samples):
    prompt = prompt_template_news_few.format(text=sample['text'][:300])
    inputs = tokenizer(prompt, return_tensors='pt')
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=10,
        do_sample=True,
        temperature=0.1,
        pad_token_id=tokenizer.eos_token_id
    )
    
    prediction = tokenizer.decode(outputs[0], skip_special_tokens=True)
    actual_category = categories[sample['label']]
    
    # Better extraction of prediction
    if 'Category:' in prediction:
        predicted_text = prediction.split('Category:')[-1].strip()
        predicted_category = predicted_text.split()[0] if predicted_text.split() else "Unknown"
    else:
        predicted_category = "No prediction"
    
    # Check if correct (handle partial matches like "Tech" for "Technology")
    is_correct = (predicted_category == actual_category or 
                 (predicted_category == "Tech" and actual_category == "Technology"))
    
    if is_correct:
        correct_count += 1
    
    print(f"\n--- Example {i+1} ---")
    print(f"Article: {sample['text'][:120]}...")
    print(f"Actual: {actual_category}")
    print(f"Predicted: {predicted_category}")
    print(f"Correct: {is_correct}")

# Summary
accuracy = correct_count / total_count
print(f"\n=============")
print(f"Few-shot Classification Summary:")
print(f"Correct predictions: {correct_count}/{total_count}")
print(f"Accuracy: {accuracy:.1%}")
print(f"=============")

=== Few-shot News Classification Results ===

--- Example 1 ---
Article: Fears for T N pension after talks Unions representing workers at Turner   Newall say they are 'disappointed' after talks...
Actual: Business
Predicted: Business
Correct: True

--- Example 1 ---
Article: Fears for T N pension after talks Unions representing workers at Turner   Newall say they are 'disappointed' after talks...
Actual: Business
Predicted: Business
Correct: True

--- Example 2 ---
Article: The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com) SPACE.com - TORONTO, Canada -- A ...
Actual: Technology
Predicted: Tech
Correct: True

--- Example 2 ---
Article: The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com) SPACE.com - TORONTO, Canada -- A ...
Actual: Technology
Predicted: Tech
Correct: True

--- Example 3 ---
Article: Ky. Company Wins Grant to Study Peptides (AP) AP - A company founded by a chemistry researcher at the University of Lou

### Fine-tuning Setup

Now we'll implement fine-tuning of our model on the AG News dataset. This involves training the model parameters specifically for the news classification task, which should provide better performance than ICL approaches.

In [3]:
from transformers import TrainingArguments, Trainer, AutoModelForSequenceClassification
from torch.utils.data import Dataset
import torch
from sklearn.metrics import accuracy_score, classification_report

# Custom dataset class for AG News
class AGNewsDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
    
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        text = str(self.texts[idx])
        label = self.labels[idx]
        
        encoding = self.tokenizer(
            text,
            truncation=True,
            padding='max_length',
            max_length=self.max_length,
            return_tensors='pt'
        )
        
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }

# Prepare data for fine-tuning
# Use a subset for faster training (you can increase this for better results)
train_subset_size = 1000
test_subset_size = 200

train_texts = [train_data[i]['text'] for i in range(train_subset_size)]
train_labels = [train_data[i]['label'] for i in range(train_subset_size)]

test_texts = [test_data[i]['text'] for i in range(test_subset_size)]
test_labels = [test_data[i]['label'] for i in range(test_subset_size)]

print(f"Training samples: {len(train_texts)}")
print(f"Test samples: {len(test_texts)}")

# Load classification model (same base model but with classification head)
model_name = MODEL
classification_model = AutoModelForSequenceClassification.from_pretrained(
    model_name, 
    num_labels=4,
    id2label={0: "World", 1: "Sports", 2: "Business", 3: "Technology"},
    label2id={"World": 0, "Sports": 1, "Business": 2, "Technology": 3}
)

# Create datasets
train_dataset = AGNewsDataset(train_texts, train_labels, tokenizer)
test_dataset = AGNewsDataset(test_texts, test_labels, tokenizer)

print("Dataset preparation completed!")

Training samples: 1000
Test samples: 200


Some weights of OPTForSequenceClassification were not initialized from the model checkpoint at facebook/opt-350m and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Dataset preparation completed!


In [5]:
# Define metrics computation function
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = predictions.argmax(axis=-1)
    accuracy = accuracy_score(labels, predictions)
    return {"accuracy": accuracy}

# Training arguments - using minimal settings for faster training
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=2,  # Reduced for faster training
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    warmup_steps=100,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=50,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    remove_unused_columns=False,
)

# Initialize trainer
trainer = Trainer(
    model=classification_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
)

print("Trainer initialized. Ready to start fine-tuning!")

Trainer initialized. Ready to start fine-tuning!


In [6]:
# Start fine-tuning
print("Starting fine-tuning process...")
print("This may take several minutes depending on your hardware.")

try:
    trainer.train()
    print("Fine-tuning completed successfully!")
    
    # Evaluate the fine-tuned model
    eval_results = trainer.evaluate()
    print(f"\nFinal evaluation results:")
    for key, value in eval_results.items():
        print(f"{key}: {value:.4f}")
        
except Exception as e:
    print(f"Training encountered an issue: {e}")
    print("This might be due to hardware limitations. The model setup is correct.")

Starting fine-tuning process...
This may take several minutes depending on your hardware.




Epoch,Training Loss,Validation Loss,Accuracy
1,0.6126,0.592479,0.83
2,0.3918,0.356914,0.91




Fine-tuning completed successfully!





Final evaluation results:
eval_loss: 0.3569
eval_accuracy: 0.9100
eval_runtime: 6.5093
eval_samples_per_second: 30.7250
eval_steps_per_second: 3.8410
epoch: 2.0000


### Comparison: ICL vs Fine-tuning

Now let's compare the performance of our different approaches: zero-shot, few-shot, and fine-tuning. This comparison will help us understand the strengths and limitations of each method.

In [10]:
# Function to evaluate ICL approaches systematically
def evaluate_icl_approach(prompt_template, approach_name, num_samples=20):
    """Evaluate zero-shot or few-shot approach on a sample of test data"""
    correct_predictions = 0
    predictions = []
    actuals = []
    
    print(f"\n=== Evaluating {approach_name} ===")
    test_sample = [test_data[i] for i in range(min(num_samples, len(test_data)))]
    
    for i, sample in enumerate(test_sample):
        prompt = prompt_template.format(text=sample['text'][:300])
        inputs = tokenizer(prompt, return_tensors='pt')
        
        outputs = model.generate(
            **inputs,
            max_new_tokens=10,
            do_sample=True,
            temperature=0.1,
            pad_token_id=tokenizer.eos_token_id
        )
        
        prediction_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
        pred_category = prediction_text.split('Category:')[-1].strip().split()[0] if 'Category:' in prediction_text else ""
        
        actual_category = categories[sample['label']]
        
        # Simple matching for evaluation
        if pred_category.lower() in actual_category.lower() or actual_category.lower() in pred_category.lower():
            correct_predictions += 1
            predictions.append(1)
        else:
            predictions.append(0)
        actuals.append(sample['label'])
        
        if i < 3:  # Show first 3 examples
            print(f"Example {i+1}: Actual={actual_category}, Predicted={pred_category}")
    
    accuracy = correct_predictions / len(test_sample)
    print(f"{approach_name} Accuracy: {accuracy:.2%} ({correct_predictions}/{len(test_sample)})")
    return accuracy

# Evaluate both ICL approaches
zero_shot_accuracy = evaluate_icl_approach(prompt_template_news_zero, "Zero-shot")
few_shot_accuracy = evaluate_icl_approach(prompt_template_news_few, "Few-shot")


=== Evaluating Zero-shot ===
Example 1: Actual=Business, Predicted=
Example 1: Actual=Business, Predicted=
Example 2: Actual=Technology, Predicted=
Example 2: Actual=Technology, Predicted=
Example 3: Actual=Technology, Predicted=
Example 3: Actual=Technology, Predicted=
Zero-shot Accuracy: 100.00% (20/20)

=== Evaluating Few-shot ===
Zero-shot Accuracy: 100.00% (20/20)

=== Evaluating Few-shot ===
Example 1: Actual=Business, Predicted=Business
Example 1: Actual=Business, Predicted=Business
Example 2: Actual=Technology, Predicted=Tech
Example 2: Actual=Technology, Predicted=Tech
Example 3: Actual=Technology, Predicted=Science
Example 3: Actual=Technology, Predicted=Science
Few-shot Accuracy: 55.00% (11/20)
Few-shot Accuracy: 55.00% (11/20)
