# Prompting notebook

A collection of small exercises to get you started with Jupyter notebooks and language models 🐍

## Installing packages

In [1]:
!pip install transformers
!pip install torch
!pip install accelerate
!pip install pandas
!pip install pyarrow
!pip install scikit-learn

Defaulting to user installation because normal site-packages is not writeable
Collecting transformers
  Downloading transformers-4.44.2-py3-none-any.whl.metadata (43 kB)
Collecting filelock (from transformers)
  Downloading filelock-3.16.1-py3-none-any.whl.metadata (2.9 kB)
Collecting huggingface-hub<1.0,>=0.23.2 (from transformers)
  Downloading huggingface_hub-0.25.1-py3-none-any.whl.metadata (13 kB)
Collecting numpy>=1.17 (from transformers)
  Downloading numpy-2.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Collecting regex!=2019.12.17 (from transformers)
  Downloading regex-2024.9.11-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
Collecting safetensors>=0.4.1 (from transformers)
  Downloading safetensors-0.4.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting tokenizers<0.20,>=0.19 (from transformers)
  Downloading tokenizers-0.19.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x

## Importing packages

In [2]:
from transformers import AutoTokenizer
import transformers 
import torch 
from sklearn.metrics import accuracy_score
import pandas as pd

## Loading the model

Let's take a look at the model we're going to use today: `google/flan-t5-base`.

https://huggingface.co/google/flan-t5-base

The model card on the Hugging Face model hub provides a lot of useful information about the model, such as the model's description, the training data, the model's performance on various tasks, and the model's intended use cases. Try to find information about the following:
- model size
- training data

The Hugging Face transformers library provides a simple way to load all necessary parts of the model for text generation, all in one line of code. This is called the model pipeline.

In [3]:
model = "google/flan-t5-base"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text2text-generation",
    model=model,
    torch_dtype=torch.float16,
)

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]



config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]



We can now use the pipeline as a function that takes an input (the prompt) and returns an output (the generated text). Let's try to ask the model a simple question.

In [4]:
pipeline("What is the capital of Denmark?")



[{'generated_text': 'st helsingborg'}]

Was this the answer you expected? Why do you think the model generated this response?

Hint: Remember the early approaches for language generation, which Kenneth mentioned on Tuesday.

p(next word | previous words) = p(w_i | w_<i)

In [5]:
pipeline("The capital of Denmark is ")

[{'generated_text': 'Copenhagen'}]

Why does this prompt get us the answer we expect?

The likelihood of the word 'Copenhagen' being the next word in that specific sentence was the greatest - the other sentence which generated 'St. Helsingborg' is structured differently.



## Creating a simple chatbot

It seems the model performs better when we give it more context about what kind of answer we expect. Let's try to create a simple chatbot that can answer questions.

In [6]:
def chatbot(sentence: str):

    input = f"Q: {sentence} A: "

    result = pipeline(input)

    return input + result[0]['generated_text']

In [7]:
chatbot("What is the capital of Denmark?")

'Q: What is the capital of Denmark? A: Copenhagen'

In [8]:
chatbot("What is the meaning of life?")

'Q: What is the meaning of life? A: (D).'

We have multiple tools at our disposal to help us with this task. For instance, we kind adjust the parameters of the model pipeline to better suit our needs. You can find more information about the available parameters here: https://huggingface.co/docs/transformers/main_classes/text_generation

For now let's try to increase the minimum number of generated tokens and see if the model can generate a more coherent response.

In [11]:
def custom_chatbot(sentence: str):

    input = f"Q: {sentence} A: "

    result = pipeline(input,
                    temperature = 1,
                    repetition_penalty=1,
                    max_new_tokens=20,
                    min_new_tokens=1,)

    return input + result[0]['generated_text']

In [12]:
custom_chatbot("What is the meaning of life?")

'Q: What is the meaning of life? A: (D).'

Did it help? Try tuning the different parameters to get a better response.

## Using language models to solve problems


We can use language models to solve many different types of problems, e.g., sentiment classification.

Hugging Face also provides information about various datasets that can be used for NLP tasks. Take a look at the Stanford Sentiment Treebank:

https://huggingface.co/datasets/stanfordnlp/sst2

- what kind of data does this dataset contain?
- how were the labels created?

In [13]:
df = pd.read_parquet("hf://datasets/stanfordnlp/sst2/data/validation-00000-of-00001.parquet")[:50]
df

Unnamed: 0,idx,sentence,label
0,0,it 's a charming and often affecting journey .,1
1,1,unflinchingly bleak and desperate,0
2,2,allows us to hope that nolan is poised to emba...,1
3,3,"the acting , costumes , music , cinematography...",1
4,4,"it 's slow -- very , very slow .",0
5,5,although laced with humor and a few fanciful t...,1
6,6,a sometimes tedious film .,0
7,7,or doing last year 's taxes with your ex-wife .,0
8,8,you do n't have to know about music to appreci...,1
9,9,"in exactly 89 minutes , most of which passed a...",0


In [14]:
df.sentence[0]

"it 's a charming and often affecting journey . "

Let's create a simple function for classifying the sentiment of a sentence by prompting the model to give us the kind of response we need:

In [15]:
def sentiment_classifier(sentence: str):

    input = f"Is the following sentence positive or negative? {sentence}? Answer using 0 for negative and 1 for positive: "

    result = pipeline(input)

    return int(result[0]['generated_text'])

In [16]:
sentiment_classifier(df.sentence[0])



1

In [17]:
predictions = [sentiment_classifier(sentence) for sentence in df.sentence[:5]]

In [18]:
predictions

[1, 0, 1, 1, 0]

### Evaluating the model

Accuracy quantifies how often the predictions are correct. It is the ratio of the number of correct predictions to the total number of predictions.

In [19]:
accuracy_score(df.label[:5], predictions)


1.0

In [20]:
predictions = [sentiment_classifier(sentence) for sentence in df.sentence[:50]]



In [21]:
accuracy_score(df.label[:50], predictions)

0.98

In [22]:
df["prediction"] = predictions
df

Unnamed: 0,idx,sentence,label,prediction
0,0,it 's a charming and often affecting journey .,1,1
1,1,unflinchingly bleak and desperate,0,0
2,2,allows us to hope that nolan is poised to emba...,1,1
3,3,"the acting , costumes , music , cinematography...",1,1
4,4,"it 's slow -- very , very slow .",0,0
5,5,although laced with humor and a few fanciful t...,1,1
6,6,a sometimes tedious film .,0,0
7,7,or doing last year 's taxes with your ex-wife .,0,0
8,8,you do n't have to know about music to appreci...,1,1
9,9,"in exactly 89 minutes , most of which passed a...",0,0


In [23]:
df[df.label != df.prediction]

Unnamed: 0,idx,sentence,label,prediction
20,20,pumpkin takes an admirable look at the hypocri...,0,1


In [24]:
df[df.label != predictions]["sentence"][20]

'pumpkin takes an admirable look at the hypocrisy of political correctness , but it does so with such an uneven tone that you never know when humor ends and tragedy begins . '

## Takeaways
- foundation models can solve a wide range of tasks "out-of-the-box" (e.g. text generation, question answering, text classification)
    - though often quite poorly
- the model's performance can be improved by providing more context, adjusting the model's parameters, or fine-tuning the model on a specific task