# Prompting notebook

A collection of small exercises to get you started with Jupyter notebooks and language models 🐍

## Installing packages

In [None]:
!pip install transformers
!pip install torch
!pip install accelerate
!pip install pandas
!pip install pyarrow
!pip install scikit-learn

## Importing packages

In [None]:
from transformers import AutoTokenizer
import transformers 
import torch 
from sklearn.metrics import accuracy_score
import pandas as pd

## Loading the model

Let's take a look at the model we're going to use today: `google/flan-t5-base`.

https://huggingface.co/google/flan-t5-base

The model card on the Hugging Face model hub provides a lot of useful information about the model, such as the model's description, the training data, the model's performance on various tasks, and the model's intended use cases. Try to find information about the following:
- model size
- training data

The Hugging Face transformers library provides a simple way to load all necessary parts of the model for text generation, all in one line of code. This is called the model pipeline.

In [None]:
model = "google/flan-t5-base"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text2text-generation",
    model=model,
    torch_dtype=torch.float16,
)

We can now use the pipeline as a function that takes an input (the prompt) and returns an output (the generated text). Let's try to ask the model a simple question.

In [None]:
pipeline("What is the capital of Denmark?")

Was this the answer you expected? Why do you think the model generated this response?

Hint: Remember the early approaches for language generation, which Kenneth mentioned on Tuesday.

p(next word | previous words) = p(w_i | w_<i)

In [None]:
pipeline("The capital of Denmark is ")

Why does this prompt get us the answer we expect?

## Creating a simple chatbot

It seems the model performs better when we give it more context about what kind of answer we expect. Let's try to create a simple chatbot that can answer questions.

In [None]:
def chatbot(sentence: str):

    input = f"Q: {sentence} A: "

    result = pipeline(input)

    return input + result[0]['generated_text']

In [None]:
chatbot("What is the capital of Denmark?")

In [None]:
chatbot("What is the meaning of life?")

We have multiple tools at our disposal to help us with this task. For instance, we kind adjust the parameters of the model pipeline to better suit our needs. You can find more information about the available parameters here: https://huggingface.co/docs/transformers/main_classes/text_generation

For now let's try to increase the minimum number of generated tokens and see if the model can generate a more coherent response.

In [None]:
def custom_chatbot(sentence: str):

    input = f"Q: {sentence} A: "

    result = pipeline(input,
                    temperature = 1,
                    repetition_penalty=1,
                    max_new_tokens=20,
                    min_new_tokens=0,)

    return input + result[0]['generated_text']

In [None]:
custom_chatbot("What is the meaning of life?")

Did it help? Try tuning the different parameters to get a better response.

## Using language models to solve problems


We can use language models to solve many different types of problems, e.g., sentiment classification.

Hugging Face also provides information about various datasets that can be used for NLP tasks. Take a look at the Stanford Sentiment Treebank:

https://huggingface.co/datasets/stanfordnlp/sst2

- what kind of data does this dataset contain?
- how were the labels created?

In [None]:
df = pd.read_parquet("hf://datasets/stanfordnlp/sst2/data/validation-00000-of-00001.parquet")[:50]
df

In [None]:
df.sentence[0]

Let's create a simple function for classifying the sentiment of a sentence by prompting the model to give us the kind of response we need:

In [None]:
def sentiment_classifier(sentence: str):

    input = f"Is the following sentence positive or negative? {sentence}? Answer using 0 for negative and 1 for positive: "

    result = pipeline(input)

    return int(result[0]['generated_text'])

In [None]:
sentiment_classifier(df.sentence[0])

In [None]:
predictions = [sentiment_classifier(sentence) for sentence in df.sentence[:5]]

In [None]:
predictions

### Evaluating the model

Accuracy quantifies how often the predictions are correct. It is the ratio of the number of correct predictions to the total number of predictions.

In [None]:
accuracy_score(df.label[:5], predictions)


In [None]:
predictions = [sentiment_classifier(sentence) for sentence in df.sentence[:50]]

In [None]:
accuracy_score(df.label[:50], predictions)

In [None]:
df["prediction"] = predictions
df

In [None]:
df[df.label != df.prediction]

In [None]:
df[df.label != predictions]["sentence"][20]

## Takeaways
- foundation models can solve a wide range of tasks "out-of-the-box" (e.g. text generation, question answering, text classification)
    - though often quite poorly
- the model's performance can be improved by providing more context, adjusting the model's parameters, or fine-tuning the model on a specific task