# Prompt Engineering: t√©cnicas de zero/one/few shots

Notebook de estudo pr√≥prio baseado no curso da Deep Learning em parceria com especialistas da AWS sobre LLMs, abordando tanto a teoria quanto a pr√°tica.

## 1 - Depend√™ncias

In [None]:
!pip install -U \
    torch==2.5.1 \
    datasets==2.17.0 \
    transformers==4.38.2

In [3]:
import warnings
warnings.filterwarnings("ignore")

#lib com todos os datasets dispon√≠veis do link https://huggingface.co/datasets, tendo datasets de textos, imagens, √°udios etc
from datasets import load_dataset 

#Transformers √© uma biblioteca que permite carregar, treinar e rodar modelos de IA de √∫ltima gera√ß√£o (como BERT, GPT, LLaMA, Stable Diffusion etc.) de forma simples e padronizada em v√°rios frameworks (PyTorch, TensorFlow, JAX)
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig 

## 2 - Iniciando nossas vari√°veis globais

In [4]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

In [5]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

In [6]:
model_name='google/flan-t5-base'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name) #carregamos um modelo j√° pr√©-treinado seq2seq (encoder-decoder), retornando um objeto PyTorch pronto para infer√™ncia ou treinamento

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True) #criando um tokenizer, que ir√° nos permitir gerar tokens a partir dos textos e passar para a arquitetura encoder-decoder do nosso modelo

### 2.1 Exemplo de uso do Tokenizer

In [7]:
sentence = "What time is it, Tom?"

sentence_encoded = tokenizer(sentence, return_tensors='pt')  # processo de ENCODER

sentence_decoded = tokenizer.decode(
    sentence_encoded["input_ids"][0],  # passamos apenas o input_ids, nao passamos o attention_mask
    skip_special_tokens=True
)

print('\nENCODED SENTENCE:')
print(sentence_encoded)

print('\nDECODED SENTENCE:')
print(sentence_decoded)

"""
Logo, podemos ver que quando utilizamos o mesmo tokenizer para encoder-decoder sobre os mesmos tokens, ele retorna ao texto original.

*Observa√ß√£o importante:*
O texto original retornou pois estamos utilizando o DECODER do Tokenizer, mas se utilizarmos o DECODER do modelo ele retornar√° outros tokens...
e ao passarmos esses tokens do modelo ao tokenizer, teremos a resposta!
"""

"""
Agora: DECODER DO MODELO ‚Üí gera novos tokens
"""

# Gera√ß√£o dos tokens de sa√≠da pelo modelo (decoder do modelo)
model_output_tokens = model.generate(
    **sentence_encoded,        # inclui input_ids e attention_mask
    max_length=50
)

# Decodifica√ß√£o desses tokens usando o tokenizer
model_output_text = tokenizer.decode(
    model_output_tokens[0],
    skip_special_tokens=True
)

print('\nMODEL OUTPUT TOKENS:')
print(model_output_tokens)

print('\nMODEL OUTPUT TEXT:')
print(model_output_text)



ENCODED SENTENCE:
{'input_ids': tensor([[ 363,   97,   19,   34,    6, 3059,   58,    1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}

DECODED SENTENCE:
What time is it, Tom?

MODEL OUTPUT TOKENS:
tensor([[    0, 29415,  3246,     1]])

MODEL OUTPUT TEXT:
4:00 PM


## 2 - Sumariza√ß√£o sem Prompt Engineering

In [8]:
example_indices = [40, 200]

dash_line = '-'.join('' for x in range(100))

for i, index in enumerate(example_indices):

    topic = dataset["test"][index]['topic']
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']
    
    inputs = tokenizer(dialogue, return_tensors='pt')

    model_output_tokens = model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0]
        
    output = tokenizer.decode(
        model_output_tokens,
        skip_special_tokens=True
    )
    
    print("TOPIC: ",topic)
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

    # Fica n√≠tido que o modelo deu respostas muito vagas

TOPIC:  transportation
---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
Person1: It's ten to nine.

TOPIC:  

## 3. Sumariza√ß√£o com Prompt Engineering

### 3.1 Zero Shot Inference

In [9]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    prompt = f"""
        Summarize the following conversation.

        {dialogue}

        Summary:
    """

    model_output_tokens = model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0]

    # Input constructed prompt instead of the dialogue.
    inputs = tokenizer(prompt, return_tensors='pt')
    output = tokenizer.decode(
        model_output_tokens, 
        skip_special_tokens=True
    )
    
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)    
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

    # Respostas um pouco mais elaboradas, por√©m ainda vagas

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

        Summarize the following conversation.

        #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

        Summary:
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHO

#### 3.1.1 Zero Shot Inference seguindo template do modelo FLAN
https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py

In [10]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']
        
    prompt = f"""
        Dialogue:

        {dialogue}

        What was going on?
    """

    inputs = tokenizer(prompt, return_tensors='pt')

    model_output_tokens = model.generate(
            inputs["input_ids"], 
            max_new_tokens=50,
        )[0]

    output = tokenizer.decode(
        model_output_tokens, 
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

# Ao utilizar um simples template do modelo, tivemos uma melhora dr√°stica nas respostas

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

        Dialogue:

        #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

        What was going on?
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT:
Tom is late fo

### 3.2 One Shot Inference

In [11]:
def make_prompt(example_indices_full, example_index_to_summarize):

    prompt = ''
    for index in example_indices_full:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']
        
        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5. Other models may have their own preferred stop sequence.
        # A ideia aqui √© utilizarmos os resumos feitos por humanos para passarmos como amostras para nosso shot inference!
        prompt += f"""
            Dialogue:

            {dialogue}

            What was going on?
            {summary}
        """
    
    dialogue = dataset['test'][example_index_to_summarize]['dialogue']
    
    prompt += f"""
        Dialogue:

        {dialogue}

        What was going on?
    """
        
    return prompt

In [12]:
example_indices_full = [40]
example_index_to_summarize = 200

one_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(one_shot_prompt)


            Dialogue:

            #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

            What was going on?
            #Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
        
        Dialogue:

        #Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: 

In [13]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(one_shot_prompt, return_tensors='pt')

model_output_tokens = model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0]

output = tokenizer.decode(
    model_output_tokens, 
    skip_special_tokens=True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to add a CD-ROM drive.


### 3.2 Few Shot Inference

In [14]:
example_indices_full = [40, 80, 120]
example_index_to_summarize = 200

few_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(few_shot_prompt)


            Dialogue:

            #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

            What was going on?
            #Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
        
            Dialogue:

            #Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't forget to

In [15]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(few_shot_prompt, return_tensors='pt')
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0], 
    skip_special_tokens=True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')

Token indices sequence length is longer than the specified maximum sequence length for this model (819 > 512). Running this sequence through the model will result in indexing errors


---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to upgrade his hardware.


---

# Conclus√£o sobre as t√©cnicas de Prompt Engineering com Zero/One/Few Shots

Os experimentos mostram que a t√©cnica de few-shot n√£o apresenta ganhos significativos em rela√ß√£o a one-shot ou at√© mesmo zero-shot, especialmente quando utilizamos os templates instrucionais do modelo FLAN.

Mesmo nos cen√°rios em que o few-shot supera o zero-shot, adicionar mais de 5 ou 6 exemplos geralmente:

- n√£o melhora a qualidade das respostas;
- aumenta o custo computacional;
- pode at√© piorar o desempenho devido ao excesso de contexto.

**Se, ap√≥s 5‚Äì6 exemplos, o modelo ainda entrega resultados insatisfat√≥rios, √© prov√°vel que voc√™ realmente precise de fine-tuning.**

## Sobre o erro de limite de contexto

Durante os testes de few-shot, ocorreu o seguinte erro:

"Token indices sequence length is longer than the specified maximum sequence length for this model (819 > 512). Running this sequence through the model will result in indexing errors."

Esse erro indica que o contexto ultrapassou o tamanho m√°ximo permitido pelo modelo, levando o sistema a:

- descartar parte dos tokens, ou
- truncar o in√≠cio ou o fim da entrada

‚Ä¶o que naturalmente prejudica a resposta final.

---

# Generation Config

- manipulando as configura√ß√£o da nossa LLM, entendendo como funciona os diferentes outputs e se faz sentido os utilizarmos
- top_k, top_p, temperature e max_new_tokens do modelo... 

In [17]:
from transformers import GenerationConfig

# List of generation configs to compare
configs = [
    ("max_new_tokens=50", GenerationConfig(max_new_tokens=50)),
    ("max_new_tokens=10", GenerationConfig(max_new_tokens=10)),
    ("temp=0.1 (sampling)", GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.1)),
    ("temp=0.5 (sampling)", GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.5)),
    ("temp=1.0 (sampling)", GenerationConfig(max_new_tokens=50, do_sample=True, temperature=1.0)),
]

inputs = tokenizer(few_shot_prompt, return_tensors='pt')

print(dash_line)
print("üìä MODEL OUTPUT COMPARISON (FEW-SHOT)")
print(dash_line)

for label, cfg in configs:
    output_tokens = model.generate(
        inputs["input_ids"],
        generation_config=cfg,
    )

    output_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)

    print(f"\n### üîß CONFIG: {label}")
    print(dash_line)
    print(output_text)
    print(dash_line)

print("\n### üßç BASELINE HUMAN SUMMARY:")
print(summary)
print(dash_line)


---------------------------------------------------------------------------------------------------
üìä MODEL OUTPUT COMPARISON (FEW-SHOT)
---------------------------------------------------------------------------------------------------

### üîß CONFIG: max_new_tokens=50
---------------------------------------------------------------------------------------------------
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to upgrade his hardware.
---------------------------------------------------------------------------------------------------

### üîß CONFIG: max_new_tokens=10
---------------------------------------------------------------------------------------------------
#Person1 wants to upgrade his system.
---------------------------------------------------------------------------------------------------

### üîß CONFIG: temp=0.1 (sampling)
----------------------------------------------------------------------------