# In-Context Learning and Prompting


### Basic prompting

The simplest way to use a language model: provide a prompt `x` and sample a completion `y ~ p(y|x)`. The model treats the prompt as a prefix and
generates a continuation based on patterns learned during pretraining.

# Seleccionar el modelo a utilizar

In [1]:
SELECTED = 'meta350' # Ingresar el modelo de prefenrecia

'''
'meta350' para el modelo de facebook opt-350m
'smol135' para el modelo de HuggingFaceTB/SmolLM2-135M
'smol360' para el modelo de HuggingFaceTB/SmolLM2-360M # Modelo que usó Karen
'''

"\n'meta350' para el modelo de facebook opt-350m\n'smol135' para el modelo de HuggingFaceTB/SmolLM2-135M\n'smol360' para el modelo de HuggingFaceTB/SmolLM2-360M # Modelo que usó Karen\n"

In [2]:
from IPython.display import display, Markdown

# Diccionario de modelos
MODELS = {
    'meta350': 'facebook/opt-350m',
    'smol135': 'HuggingFaceTB/SmolLM2-135M',
    'smol360': 'HuggingFaceTB/SmolLM2-360M',
}

MODEL = MODELS[SELECTED]

texto_markdown = f"""
# Modelo Seleccionado
Estas usando **{MODEL}** en el notebook.
"""

display(Markdown(texto_markdown))


# Modelo Seleccionado  
Estas usando **facebook/opt-350m** en el notebook.


In [3]:
import transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

model = MODEL

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


#### Make a prompt `x` and tokenize it

In [4]:
x = "Do Androids Dream of Electric Sheep?"

inputs = tokenizer(x, return_tensors='pt')
inputs

{'input_ids': tensor([[    2,  8275,   178,  1001,  7823,  7419,     9,  7219, 38112,   116]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

#### Generate a response

Here generating means autoregressive sampling, i.e.

```
context = `x`
for t in 0 .. max_new_tokens:
    Sample next token, y_t ~ p(y_t|context)
    Append y_t to context
```

do_sample: if True, the model samples from the probability distribution. If False, it uses greedy decoding.
temperature: scales the distribution before sampling.

*   smaller than 1.0 → sharper, more deterministic.
*   bigger than 1.0 → more random, creative.

top_k: restricts sampling to the top k most likely tokens. (top_k=50 is common).

top_p (aka nucleus sampling): restricts sampling to the smallest set of tokens whose probabilities sum to p. (top_p=0.9 is common).

In [5]:
outputs = model.generate(
    **inputs,
    max_new_tokens=20,
    do_sample=True,
    num_return_sequences=5,
    pad_token_id=tokenizer.eos_token_id
)

for i in range(5):
    print(f"===={i}====")
    print(tokenizer.decode(outputs[i]))

====0====
</s>Do Androids Dream of Electric Sheep?
I'm looking for a real one but also trying to find one that will fit a normal size
====1====
</s>Do Androids Dream of Electric Sheep?

Have you ever wished that you could go on a date with everyone in your life? This
====2====
</s>Do Androids Dream of Electric Sheep?

By
Jim Chaney

Feb 27, 2017

The first line in the
====3====
</s>Do Androids Dream of Electric Sheep?  That would be one hell of a game.
Yes, you've just given me a dream
====4====
</s>Do Androids Dream of Electric Sheep?  Because THAT'S MY MOVIE!
Do you not want to hear how good the new


In [6]:
outputs = model.generate(
    **inputs,
    max_new_tokens=40,
    do_sample=True,
    num_return_sequences=3,
    top_p=0.5,
    pad_token_id=tokenizer.eos_token_id
)

for i in range(3):
    print(f"===={i}====")
    print(tokenizer.decode(outputs[i]))

====0====
</s>Do Androids Dream of Electric Sheep?

The answer to this question is no. Androids Dream of Electric Sheep is a great movie, but it is not a good movie. It is not a good movie. It is not
====1====
</s>Do Androids Dream of Electric Sheep?

This week’s episode of the Electric Sheep podcast is a special guest, John D.

John D. is the founder and editor-in-chief of the blog, The
====2====
</s>Do Androids Dream of Electric Sheep?

I am not sure if you have ever heard of the term “electric sheep”. I was going to say that it is a very common term used in the art world. But


In [7]:
outputs = model.generate(
    **inputs,
    max_new_tokens=40,
    do_sample=True,
    num_return_sequences=3,
    temperature=0.1,
    pad_token_id=tokenizer.eos_token_id
)

for i in range(3):
    print(f"===={i}====")
    print(tokenizer.decode(outputs[i]))

====0====
</s>Do Androids Dream of Electric Sheep?

Do Androids Dream of Electric Sheep? is a short film by American filmmaker and filmmaker, Michael A. Cohen. It was released on October 26, 2016.

Plot
The
====1====
</s>Do Androids Dream of Electric Sheep?

Do Androids Dream of Electric Sheep? is a short film by American filmmaker and actor, Michael Caine. It was released on DVD on October 29, 2011.

Plot

====2====
</s>Do Androids Dream of Electric Sheep?

Do Androids Dream of Electric Sheep? is a short film by American filmmaker and filmmaker, Michael A. Hirst. It was released on October 21, 2010.

Plot



### Instruction prompt ("zero shot")

Instead of just continuing text, we can prompt the model to perform a specific task by providing an instruction. This is "zero-shot" because we give no examples, just the task description. The model uses any instruction-following related patterns it learned during training.

In [8]:
prompt_template = """Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:
{sentence}
Classification:"""


sentences = [
    "I've seen things you people wouldn't believe.",
    "Attack ships on fire off the shoulder of Orion.",
    "I watched C-beams glitter in the dark near the Tannhäuser Gate.",
    "All those moments will be lost in time, like tears in rain. Time to die"
]

prompt = prompt_template.format(sentence=sentences[0])
print(prompt)


Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:
I've seen things you people wouldn't believe.
Classification:


In [9]:
for sentence in sentences:
    print(f"\n=============")

    prompt = prompt_template.format(sentence=sentence)
    inputs = tokenizer(prompt, return_tensors='pt')
    outputs = model.generate(
        **inputs,
        max_new_tokens=40,
        do_sample=True,
        num_return_sequences=2,
        pad_token_id=tokenizer.eos_token_id
    )

    for i in range(2):
        print(f"----{i}----")
        print(tokenizer.decode(outputs[i]))


----0----
</s>Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:
I've seen things you people wouldn't believe.
Classification: "the use of descriptive words to describe different aspects of things which, in turn, influence or are defined as a particular condition, action, or phenomenon, is typically done intentionally or otherwise as a way
----1----
</s>Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:
I've seen things you people wouldn't believe.
Classification: you should have thought about that before you thought about something like this.
Classification: what the fuck are you talking about. You do know that you can read anything on the Internet so why so

----0----
</s>Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:
Attack ships on fire off the shoulder of Orion.
Classification:
[a] Any action done as a result of an actual attack of

It's important to ensure that the output is formatted correctly!

### Instruction + examples ("few-shot")

We can provide examples of input-output pairs before the test input. This "few-shot" or "in-context learning" approach helps the model understand the task format and expected outputs without any parameter updates.

In [10]:
prompt_template = """Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:

Sentence:
"Attack ships on fire off the shoulder of Orion."
Classification:
action

Sentence:
I watched C-beams glitter in the dark near the Tannhäuser Gate.
Classification:
action

Sentence:
{sentence}
Classification:
"""



In [11]:
for sentence in sentences:
    print(f"\n=============")
    prompt = prompt_template.format(sentence=sentence)
    inputs = tokenizer(prompt, return_tensors='pt')
    outputs = model.generate(
        **inputs,
        max_new_tokens=20,
        do_sample=True,
        num_return_sequences=2,
        stop_strings=["\n\n"],
        tokenizer=tokenizer,
        pad_token_id=tokenizer.eos_token_id
    )

    for i in range(2):
        print(f"----{i}----")
        print(tokenizer.decode(outputs[i]))


----0----
</s>Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:

Sentence:
"Attack ships on fire off the shoulder of Orion."
Classification:
action

Sentence:
I watched C-beams glitter in the dark near the Tannhäuser Gate.
Classification:
action

Sentence:
I've seen things you people wouldn't believe.
Classification:
rhetorical


----1----
</s>Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:

Sentence:
"Attack ships on fire off the shoulder of Orion."
Classification:
action

Sentence:
I watched C-beams glitter in the dark near the Tannhäuser Gate.
Classification:
action

Sentence:
I've seen things you people wouldn't believe.
Classification:
a rhetorical statement



----0----
</s>Classify the sentence as describing an actual "action" or as a "rhetorical/abstract" illustration.:

Sentence:
"Attack ships on fire off the shoulder of Orion."
Classification:
action

Sentence:
I watched C-b

### Chat templates

Some models have been fine-tuned to operate as chat assistants. The chat is represented as a series of messages that are turned into a string using special tags. There is also a *system message* that provides instructions about how the model should behave.

These models are often called "instruct" models because they've been trained to follow instructions rather than just complete text.

In [23]:
import transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

model = "HuggingFaceTB/SmolLM2-135M-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model)

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/655 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/861 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/269M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

In [24]:
messages = [{
    "role": "user",
    "content": "What is consciousness?."
}]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
print("Input text: ", input_text, sep="\n")


Input text: 
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
What is consciousness?.<|im_end|>



Generate a response

In [25]:
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=500, do_sample=True)
print(tokenizer.decode(outputs[0]))

<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
What is consciousness?.<|im_end|>
<|im_start|>assistant
"Consciousness" is a broad term used to describe the properties and functions that allow living beings to perceive, perceive, reflect, and interact with their environment. It encompasses the way our brains process and interpret information, making us aware of our surroundings, ourselves, and the world around us.

At its core, consciousness is a complex and multifaceted phenomenon that involves the integration of various physiological, psychological, and cognitive mechanisms. It is often described as a state of sentience, where an organism has developed a level of awareness that enables it to understand its body, the world, and the environment around it. This sense of self, which gives rise to conscious experiences, can comprise various aspects, including:

1. Perception: The ability to interpret and process sensory in

### System prompts

Chat models often support a system message that sets the model's behavior or role. This message is typically prepended to the conversation and instructs the model how to respond throughout the interaction. For example, we can use the system prompt to have the model respond in French.

In [None]:
messages = [
    {
        "role": "system",
        "content": "You are an assistant that speaks in French."
    },
    {
        "role": "user",
        "content": "What is the capital of France."
    }
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True)
print(tokenizer.decode(outputs[0]))

### Instruction ("zero shot")

Using the chat format for zero-shot tasks. The instruction-tuned model may follow instructions more reliably than the base model, though output formatting can still be inconsistent.

In [None]:
messages = [
    {
        "role": "user",
        "content": ("Classify the sentence's sentiment as 'Positive' or 'Negative':\n" +
                    "Sentence: 'I love the NLP class!'\n")
    }
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, num_return_sequences=3)
for i in range(len(outputs)):
    print(f"----{i}----")
    print(tokenizer.decode(outputs[i]))

#### Approach 1: write a detailed instruction (in the user or system prompt)

We can improve output formatting by providing explicit, detailed instructions about the desired format in either the system or user message.

In [None]:
messages = [
    {   "role": "system",
        "content": """You are an expert sentiment classifier.
Your task is to classify a sentence's sentiment as 'Positive' or 'Negative'.
The user will provide you with a sentence.
Format your output as:

Classification: Positive or Negative
"""
    },
    {
        "role": "user",
        "content": ("I love the NLP class!")
    }
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, num_return_sequences=3)
for i in range(len(outputs)):
    print(f"----{i}----")
    print(tokenizer.decode(outputs[i]))

#### Approach 2: provide examples (either in the system prompt or as a sequence of messages)

We can show the model the desired behavior through example conversations, where the assistant demonstrates the correct format and task execution. This is analogous to the few-shot examples we saw earlier, but using the chat format.

In [None]:
messages = [
    {
        "role": "user",
        "content": "Classify the sentence's sentiment as 'Positive' or 'Negative':\nSentence: 'This is such a cool lecture!'"
    },
    {
        "role": "assistant",
        "content": "Classification: Positive"
    },
    {
        "role": "user",
        "content": "Classify the sentence's sentiment as 'Positive' or 'Negative':\nSentence: 'I really don't like the last scene.'"
    },
    {
        "role": "assistant",
        "content": "Classification: Negative"
    },
    {
        "role": "user",
        "content": "Classify the sentence's sentiment as 'Positive' or 'Negative':\nSentence: 'I love the NLP class!'"
    }
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, num_return_sequences=3)
for i in range(len(outputs)):
    print(f"----{i}----")
    print(tokenizer.decode(outputs[i]))

### Chain-of-thought with base model

Prompting the model to "think step by step" can elicit intermediate reasoning steps before the final answer. Even base models can exhibit this behavior when prompted appropriately, though the reasoning may be flawed.

In [None]:
model = "HuggingFaceTB/SmolLM2-360M"

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model)

In [None]:
prompts = [
    """Q: On average Joe throws 25 punches per minute.
A fight lasts 5 rounds of 3 minutes.
How many punches did he throw?
A: Let's think step by step.""",
]

for prompt in prompts:
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(
        **inputs,
        pad_token_id=tokenizer.eos_token_id,
        max_new_tokens=512,
        temperature=0.4,
        do_sample=True
    )
    print(tokenizer.decode(outputs[0]))
    print("====")

### Chain-of-thought with instruct model

Many instruction-tuned models can solve problems step-by-step when asked, as they've typically been trained to follow such instructions.

In [None]:
model = "HuggingFaceTB/SmolLM2-360M-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model)

In [None]:
messages = [
    {
        "role": "user",
        "content": """Solve the problem:
On average Joe throws 25 punches per minute.
A fight lasts 5 rounds of 3 minutes.
How many punches did he throw?"""
    }
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.4, do_sample=True)
print(tokenizer.decode(outputs[0]))

### Program-aided reasoning

Instead of natural language reasoning, we can prompt the model to solve problems by writing and executing code. This leverages the model's code generation capabilities and the code executor's accurate computations.

In [None]:
messages = [
    {
        "role": "user",
        "content": """Solve the problem by writing a Python program:
On average Joe throws 25 punches per minute.
A fight lasts 5 rounds of 3 minutes.
How many punches did he throw?"""
    }
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.4, do_sample=True)
print(tokenizer.decode(outputs[0]))