# Introduction to Data Science 2025

# Week 4

In this week's exercise, we look at prompting and zero- and few-shot task settings. Below is a text generation example from https://github.com/TurkuNLP/intro-to-nlp/blob/master/text_generation_pipeline_example.ipynb demonstrating how to load a text generation pipeline with a pre-trained model and generate text with a given prompt. Your task is to load a similar pre-trained generative model and assess whether the model succeeds at a set of tasks in zero-shot, one-shot, and two-shot settings.

**Note: Downloading and running the pre-trained model locally may take some time. Alternatively, you can open and run this notebook on [Google Colab](https://colab.research.google.com/), as assumed in the following example.**

## Text generation example

This is a brief example of how to run text generation with a causal language model and `pipeline`.

Install [transformers](https://huggingface.co/docs/transformers/index) python package. This will be used to load the model and tokenizer and to run generation.

In [None]:
!pip install --quiet transformers

Import the `AutoTokenizer`, `AutoModelForCausalLM`, and `pipeline` classes. The first two support loading tokenizers and generative models from the [Hugging Face repository](https://huggingface.co/models), and the last wraps a tokenizer and a model for convenience.

In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

Load a generative model and its tokenizer. You can substitute any other generative model name here (e.g. [other TurkuNLP GPT-3 models](https://huggingface.co/models?sort=downloads&search=turkunlp%2Fgpt3)), but note that Colab may have issues running larger models. 

In [2]:
MODEL_NAME = 'TurkuNLP/gpt3-finnish-large'

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)

Instantiate a text generation pipeline using the tokenizer and model.

In [3]:
pipe = pipeline(
    'text-generation',
    model=model,
    tokenizer=tokenizer,
    device=model.device
)

Device set to use cpu


We can now call the pipeline with a text prompt; it will take care of tokenizing, encoding, generation, and decoding:

In [4]:
output = pipe('Terve, miten menee?', max_new_tokens=25)

print(output)

[{'generated_text': 'Terve, miten menee?” kysyin.\n”Oikein hyvin, kiitos kysymästä”, hän sanoi ja ojensi minulle kättään.\n”Sinä olet tosi kiltti”,'}]


Just print the text

In [5]:
print(output[0]['generated_text'])

Terve, miten menee?” kysyin.
”Oikein hyvin, kiitos kysymästä”, hän sanoi ja ojensi minulle kättään.
”Sinä olet tosi kiltti”,


We can also call the pipeline with any arguments that the model `generate` function supports. For details on text generation using `transformers`, see e.g. [this tutorial](https://huggingface.co/blog/how-to-generate).

Example with sampling and a high `temperature` parameter to generate more chaotic output:

In [6]:
output = pipe(
    'Terve, miten menee?',
    do_sample=True,
    temperature=10.0,
    max_new_tokens=25
)

print(output[0]['generated_text'])

Terve, miten menee? Täällä minä nyt sitten vielä majailta. Kävin kotona keskiviikkona hakemassa kamat sekä katsomassa miten kissat pärjää. (Kissat) pärjäsivät muuten hirveän hauskasti


## Exercise 1

Your task is to assess whether a generative model succeeds in the following tasks in zero-shot, one-shot, and two-shot settings:

- binary sentiment classification (positive / negative)

- person name recognition

- two-digit addition (e.g. 11 + 22 = 33)

For example, for assessing whether a generative model can name capital cities, we could use the following prompts:

- zero-shot:
	>"""\
	>Identify the capital cities of countries.
	>
	>Question: What is the capital of Finland?\
	>Answer:\
	>"""
- one-shot:
	>"""\
	>Identify the capital cities of countries.
	>
	>Question: What is the capital of Sweden?\
	>Answer: Stockholm
	>
	>Question: What is the capital of Finland?\
	>Answer:\
	>"""
- two-shot:
	>"""\
	>Identify the capital cities of countries.
	>
	>Question: What is the capital of Sweden?\
	>Answer: Stockholm
	>
	>Question: What is the capital of Denmark?\
	>Answer: Copenhagen
	>
	>Question: What is the capital of Finland?\
	>Answer:\
	>"""

You can do the tasks either in English or Finnish and use a generative model of your choice from the Hugging Face models repository, for example the following models:

- English: `gpt2-large`
- Finnish: `TurkuNLP/gpt3-finnish-large`

You can either come up with your own instructions for the tasks or use the following:

- English:
	- binary sentiment classification: "Do the following texts express a positive or negative sentiment?"
	- person name recognition: "List the person names occurring in the following texts."
	- two-digit addition: "This is a first grade math exam."
- Finnish:
	- binary sentiment classification: "Ilmaisevatko seuraavat tekstit positiivista vai negatiivista tunnetta?"
	- person name recognition: "Listaa seuraavissa teksteissä mainitut henkilönnimet."
	- two-digit addition: "Tämä on ensimmäisen luokan matematiikan koe."

Come up with at least two test cases for each of the three tasks, and come up with your own one- and two-shot examples.

In [8]:
# Using Finnish as the model is already loaded in the previous cell

from typing import List

def evaluate_model_performance():
    print("BINARY SENTIMENT CLASSIFICATION (POSITIIVINEN / NEGATIIVINEN)")
    
    sentiment_test_cases = [
        "Olen todella pettynyt tähän palveluun, se oli kaikki kamalaa alusta loppuun.",
        "Ihanaa, että sain vihdoin uuden työpaikan ja voin aloittaa uuden vaiheen elämässäni.",
        "Ruoka oli mautonta ja palvelu hidasta, en varmasti palaa tänne.",
        "Tämä elokuva oli upea! Maltoin tuskin istua paikollani jännityksestä.", # more difficult
    ]
    
    sentiment_examples = [
        ("Olen iloinen tästä uutisesta!", "positiivinen"),
        ("En pidä tästä tuotteesta ollenkaan.", "negatiivinen")
    ]
    
    evaluate_sentiment_classification(sentiment_test_cases, sentiment_examples)
    
    print("\n\nPERSON NAME RECOGNITION (HENKILÖNNIMET)")
    
    name_recognition_test_cases = [
        "Matti Virtanen ja Anna Korhonen tapasivat eilen Helsingissä kahvilassa.",
        "Kirjailija Väinö Linna kirjoitti Tuntemattoman sotilaan vuonna 1954.",
        "Presidentti Sauli Niinistö ja pääministeri Sanna Marin keskustelivat tärkeistä asioista.",
        "Professori Liisa Keltikangas-Järvinen on tunnettu psykologi Suomessa."
    ]
    
    name_examples = [
        ("Pekka Haavisto vieraili Turussa.", "Pekka Haavisto"),
        ("Laura Huhtasaari ja Jussi Halla-aho puhuivat kokouksessa.", "Laura Huhtasaari, Jussi Halla-aho")
    ]
    
    evaluate_name_recognition(name_recognition_test_cases, name_examples)
    
    print("\n\nTWO-DIGIT ADDITION (KOKONAISLUKUJEN YHTEENLASKU)")
    
    addition_test_cases = [
        ("25 + 34", 59),
        ("18 + 47", 65),
        ("52 + 29", 81),
        ("73 + 16", 89)
    ]
    
    addition_examples = [
        ("15 + 22", "37"),
        ("41 + 33", "74")
    ]
    
    evaluate_addition(addition_test_cases, addition_examples)

def create_prompt(task_instruction: str, examples: List[tuple], test_case: str, shot_type: str) -> str:
    if shot_type == "zero-shot":
        prompt = f"{task_instruction}\n\nKysymys: {test_case}\nVastaus:"
    
    elif shot_type == "one-shot":
        example = examples[0]

        if len(example) == 2:
            prompt = f"{task_instruction}\n\nKysymys: {example[0]}\nVastaus: {example[1]}\n\nKysymys: {test_case}\nVastaus:"
        else:
            prompt = f"{task_instruction}\n\n{example[0]}\nVastaus: {example[1]}\n\nKysymys: {test_case}\nVastaus:"
    
    elif shot_type == "two-shot":
        example1, example2 = examples[0], examples[1]

        if len(example1) == 2:
            prompt = f"{task_instruction}\n\nKysymys: {example1[0]}\nVastaus: {example1[1]}\n\nKysymys: {example2[0]}\nVastaus: {example2[1]}\n\nKysymys: {test_case}\nVastaus:"
        else:
            prompt = f"{task_instruction}\n\n{example1[0]}\nVastaus: {example1[1]}\n\n{example2[0]}\nVastaus: {example2[1]}\n\nKysymys: {test_case}\nVastaus:"
    
    return prompt

def generate_and_display(prompt: str, shot_type: str, test_case: str, expected_answer=None):
    print(f"\n{shot_type.upper()}, test case: {test_case}:")

    if expected_answer:
        print(f"Expected: {expected_answer}")
    
    print("\nPrompt:")
    print(prompt)
    
    try:
        output = pipe(prompt, max_new_tokens=50, do_sample=True, temperature=0.7, pad_token_id=tokenizer.eos_token_id)
        generated_text = output[0]['generated_text']
        
        answer_start = generated_text.rfind("Vastaus:") + len("Vastaus:")
        answer = generated_text[answer_start:].strip()
        
        answer = answer.split('\n')[0].strip()
        if '.' in answer:
            answer = answer.split('.')[0]
        
        print(f"Model's answer: {answer}")

    except Exception as e:
        print(f"Error generating response: {e}")

def evaluate_sentiment_classification(test_cases: List[str], examples: List[tuple]):
    task_instruction = "Ilmaisevatko seuraavat tekstit positiivista vai negatiivista tunnetta?"
    
    expected_answers = ["negatiivinen", "positiivinen", "negatiivinen", "positiivinen"]
    
    for i, test_case in enumerate(test_cases):
        print(f"\n\nSENTIMENT, test case {i+1}")
        
        expected = expected_answers[i]
    
        prompt = create_prompt(task_instruction, examples, test_case, "zero-shot")
        generate_and_display(prompt, "zero-shot", test_case, expected)
        
        prompt = create_prompt(task_instruction, examples, test_case, "one-shot")
        generate_and_display(prompt, "one-shot", test_case, expected)

        prompt = create_prompt(task_instruction, examples, test_case, "two-shot")
        generate_and_display(prompt, "two-shot", test_case, expected)

def evaluate_name_recognition(test_cases: List[str], examples: List[tuple]):
    task_instruction = "Listaa seuraavissa teksteissä mainitut henkilönnimet."
    
    expected_answers = [
        "Matti Virtanen, Anna Korhonen",
        "Väinö Linna", 
        "Sauli Niinistö, Sanna Marin",
        "Liisa Keltikangas-Järvinen"
    ]
    
    for i, test_case in enumerate(test_cases):
        print(f"\n\nNAME RECOGNITION, test case {i+1}")
        
        expected = expected_answers[i]
        
        prompt = create_prompt(task_instruction, examples, test_case, "zero-shot")
        generate_and_display(prompt, "zero-shot", test_case, expected)
        
        prompt = create_prompt(task_instruction, examples, test_case, "one-shot")
        generate_and_display(prompt, "one-shot", test_case, expected)
        
        prompt = create_prompt(task_instruction, examples, test_case, "two-shot")
        generate_and_display(prompt, "two-shot", test_case, expected)

def evaluate_addition(test_cases: List[tuple], examples: List[tuple]):
    task_instruction = "Tämä on ensimmäisen luokan matematiikan koe."
    
    for i, (question, expected_answer) in enumerate(test_cases):
        print(f"\n\nADDITION, test case {i+1}")
        
        prompt = create_prompt(task_instruction, examples, f"{question} =", "zero-shot")
        generate_and_display(prompt, "zero-shot", question, expected_answer)
        
        prompt = create_prompt(task_instruction, examples, f"{question} =", "one-shot")
        generate_and_display(prompt, "one-shot", question, expected_answer)
        
        prompt = create_prompt(task_instruction, examples, f"{question} =", "two-shot")
        generate_and_display(prompt, "two-shot", question, expected_answer)

print("Starting comprehensive evaluation of Finnish GPT-3 model...")
print()

evaluate_model_performance()

print(f"\nEvaluation complete.")

Starting comprehensive evaluation of Finnish GPT-3 model...

BINARY SENTIMENT CLASSIFICATION (POSITIIVINEN / NEGATIIVINEN)


SENTIMENT, test case 1

ZERO-SHOT, test case: Olen todella pettynyt tähän palveluun, se oli kaikki kamalaa alusta loppuun.:
Expected: negatiivinen

Prompt:
Ilmaisevatko seuraavat tekstit positiivista vai negatiivista tunnetta?

Kysymys: Olen todella pettynyt tähän palveluun, se oli kaikki kamalaa alusta loppuun.
Vastaus:
Model's answer: Kiitos, kun vaivauduit lukemaan viestin

ONE-SHOT, test case: Olen todella pettynyt tähän palveluun, se oli kaikki kamalaa alusta loppuun.:
Expected: negatiivinen

Prompt:
Ilmaisevatko seuraavat tekstit positiivista vai negatiivista tunnetta?

Kysymys: Olen iloinen tästä uutisesta!
Vastaus: positiivinen

Kysymys: Olen todella pettynyt tähän palveluun, se oli kaikki kamalaa alusta loppuun.
Vastaus:
Model's answer: 

TWO-SHOT, test case: Olen todella pettynyt tähän palveluun, se oli kaikki kamalaa alusta loppuun.:
Expected: negatiiv

**Submit this exercise by submitting your code and your answers to the above questions as comments on the MOOC platform. You can return this Jupyter notebook (.ipynb) or .py, .R, etc depending on your programming preferences.**