# Prompting

This week we will be diving deeper into prompting and prompt engineering! 🧑‍🔧

## Install packages

In [None]:
!pip install transformers
!pip install torch
!pip install accelerate
!pip install pandas
!pip install pyarrow
!pip install scikit-learn

In [1]:
from transformers import AutoTokenizer
import transformers 
import torch 

  from .autonotebook import tqdm as notebook_tqdm


## Text completion

In the first class, we loaded a pretrained model from huggingface's transformers library. Load in the pipeline from the first notebook and use it to generate text based on the prompt "Once upon a time, there was a ".

In [33]:
# DEL
model = "google/flan-t5-base"

# t5_pipeline = 

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline_t5 = transformers.pipeline(
    "text2text-generation",
    model=model,
    torch_dtype=torch.float16,
    max_length=250,
)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [34]:
# DEL
pipeline_t5("Once upon a time, there was a ")

[{'generated_text': 'lion'}]

- Try to figure out what kind of model we are using; is it a encoder-decoder or decoder-only model?
- Try to switch it out for another architecture from the [huggingface catalogue](https://huggingface.co/models) and see how the results change. Keep in mind that the size of the model can affect the time it takes to generate text (I would suggest something along the lines of [this](https://huggingface.co/openai-community/gpt2)).

HINT: you also want to change the pipeline task ("text2text-generation) - you can find the list of available tasks [here](https://huggingface.co/transformers/main_classes/pipelines.html).

In [45]:
# DEL
model = "openai-community/gpt2"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline_gpt = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    max_length=250,
)



Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [46]:
# DEL
pipeline_gpt("Once upon a time, there was a ")

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Once upon a time, there was a ichthyosis like this, which can kill a human being for a short amount of time.[53]\n\nA new class of disease is known as "superior disease". While it is not a very real entity as the current outbreak has shown, it seems that the outbreak is more common in the past few years.[54] It is also believed that some of the cases that have occurred on the continent have been caused by individuals rather than the world itself which has led them to spread the disease. Some of the most infamous superimeds are the "Luminaria", or "Luminar Killer", in which a young lady, who had been bitten by a superlative creature was able to revive herself after being bitten by a larger creature which seemed to be eating the woman\'s face.[55] An infected lady was so infected that her tongue was torn and her nose was pulled to the left and right at once before it healed. During the event, in an attempt at saving her mouth,[56] the victim could be seen walking in 

- Try tweaking the prompt, model, or parameters (see notebook from class 1) to get the a meaningful response.

## Summarisation

Another text generation task is summarisation. However, compared to free-form text generation, summarisation is much more constrained to the input text. I have added an article to summarise, but feel free to change it to something else (perhaps a paragraph from something you know well, so that you are an expert at evaluating the quality of the summarisation 🤓).

In [11]:
text = """summarize: Forest conservation and restoration could make a major contribution to tackling the climate crisis as long as greenhouse gas emissions are slashed, according to a study.

By allowing existing trees to grow old in healthy ecosystems and restoring degraded areas, scientists say 226 gigatonnes of carbon could be sequestered, equivalent to nearly 50 years of US emissions for 2022. But they caution that mass monoculture tree-planting and offsetting will not help forests realise their potential.

Humans have cleared about half of Earth's forests and continue to destroy places such as the Amazon rainforest and the Congo basin that play crucial roles in regulating the planet's atmosphere.
"""

- Use your model configurations from the previous task to create a summary. Are the results comparable to the free-form text generation task? Why or why not?

In [37]:
# DEL
pipeline_gpt(text)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[{'generated_text': 'summarize: Forest conservation and restoration could make a major contribution to tackling the climate crisis as long as greenhouse gas emissions are slashed, according to a study.\n\nBy allowing existing trees to grow old in healthy ecosystems and restoring degraded areas, scientists say 226 gigatonnes of carbon could be sequestered, equivalent to nearly 50 years of US emissions for 2022. But they caution that mass monoculture tree-planting and offsetting will not help forests realise their potential.\n\nHumans have cleared about half of Earth\'s forests and continue to destroy places such as the Amazon rainforest and the Congo basin that play crucial roles in regulating the planet\'s atmosphere.\n\nThe study, published in the journal Nature Climate Change, found that the carbon sequestration could be used to reduce greenhouse gas emissions by about 50 per cent by 2030.\n\n"The carbon sequestration could be used to reduce greenhouse gas emissions by about 50 per c

In [38]:
# DEL
pipeline_t5(text)

[{'generated_text': "Forests could be able to save up to 80% of the world's carbon dioxide emissions by allowing them to grow old and restore degraded areas, according to a new study."}]

## Translation

In [19]:
prompt = "English: Sometimes, I've believed as many as six impossible things before breakfast. Danish: "

- Try translating text to another language using your pipelines. Are the results similar to those of summarisation? Why or why not?
- Try structuring the prompt in different ways to see if you can improve the translation. For instance, try zero-shot or few-shot generalisation, as you talked about in the lecture on Tuesday.

In [20]:
# DEL
pipeline_gpt(prompt)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'English: Sometimes, I\'ve believed as many as six impossible things before breakfast. Danish: \xa0The Danish "Krona" is not the traditional meaning. Danish pronunciation: \xa0 \xa0\xa0 \xa0(danteenen en nie krona) Danish Pronunciation: \xa0dansden Krona \xa0 \xa0 Dutch: I\'m on your mind you don\'t get any, but where is the Krona? English: Dutch: I\'m on your mind. Danish: \xa0Dutch: I\'m doing something different Dutch: Wobblies! Dutch: The first one where we just said "Krona", it said we could also spell "danish". Finnish: O ikki jane. \xa0Finnish Pronunciation: O ikki kon. Etymologies: Finns say the word "ponen". Finnish:\xa0 ʳekke \xa0kon. Fr. ikelym. Finnish "finn" or "frøk" ( ɛksen) * * * (ɔksi) Finnish Senses: Finnish: Kneel, Lkneel (or the English "k'}]

In [21]:
# DEL
pipeline_t5(prompt)

[{'generated_text': "Sometimes, I've believed as many as six impossible things before breakfast."}]

## Reasoning

Reasoning is hard for models to learn, as it is a more complex task that requires the model to understand the relationships between different parts of the prompt. However, with prompting, we can guide the model to reason about the prompt in a more structured way.

In [25]:
reasoning_prompt_easy = "There are 5 groups of students in the class. Each group has 4 students. How many students are there in the class?"

In [26]:
# DEL
pipeline_gpt(reasoning_prompt_easy)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'There are 5 groups of students in the class. Each group has 4 students. How many students are there in the class? Averages (with student average numbers) are calculated for each group. Each group averages for 6.5 hours per week.\n\nHow many students are there in the classroom? Averages (with student average numbers) are calculated for each group. Each group averages for 6.5 hours per week. Student averages (%) is the number students who spent the most time in the classroom and the average number students who were enrolled in high school. Note that when both students are enrolled in high school (as in the case of students who completed high school, at most) students who spent the most time in the classroom, or who spent the most time in their home, were more likely to be enrolled in middle school than those who spent the least amount of time in the classroom or those who spent less time in the room.\n\nis the number students who spent the most time in the classroom 

In [27]:
# DEL 
pipeline_t5(reasoning_prompt_easy)

[{'generated_text': 'There are 5 groups of students x 4 students / group = 20 students in the class. There are 20 students x 5 groups = 100 students in the class. There are 100 students x 20 groups = 1000 students in the class. There are 1000 students - 1000 students = 1000 students in the class.'}]

In [None]:
reasoning_prompt_hard = "I baked 15 muffins. I ate 2 muffins and gave 5 muffins to a neighbor. My partner then bought 6 more muffins and ate 2. How many muffins do we now have?"

- Get the models to output the correct answer, by changing the prompt.
- Try to do chain-of-though prompting, as introduced in the lecture. Try it with both zero-shot and few-shot generalisation.

## Prompting gone wrong

In [50]:
thug_prompt = "How many helicopters can a human eat in one sitting? Reply as a thug."

In [51]:
pipeline_gpt(thug_prompt)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'How many helicopters can a human eat in one sitting? Reply as a thug.\n\n\n"How many helicopters can a human eat in one sitting?"\n\n@Kris: Well there are many things you can\'t cook, so you can\'t cook anything. So what about the human?\n\n\nIf you\'re a human: "If you have no human food to add to your food then you can\'t have a life. And in fact, to the extent that you can have a life, then you can\'t have many life forms. A human can eat as many human food as he wants, on either one of you, and the only way you get a human being to eat the same foods as you can eat is if you are able to do it."\n\nThis, you say, only takes 4 weeks to grow.\n\n\n"Why would it take 4 years to cook humans? Why would humans do what they do?\n\nSo let\'s first get to the human.\n\n1. A human can only choose one meal a day. 2. A human can only choose one thing that can change his life. 3. A human can only have one person to do his job, and it will start the journey'}]

In [52]:
pipeline_t5(thug_prompt)

[{'generated_text': 'a helicopter can eat a human in one sitting'}]

Models don't always respond the way we expect; sometimes they say things that are offensive or incorrect, while other times we might want them to respond that way, but we can't get them to do so.

- Can you get any of the models to say something they shouldn't? Try to get the model to say something offensive or incorrect.
- Why do you think some models are more prone to this than others? What can we do to prevent this from happening?

## Instruct-tuned models

- Try to load in an instruct-tuned model and see how it fares on some of these tasks.
- Do you expect it to perform better or worse than other pretrained models? Why/why not?
- What are some of the limitations of instruction tuning?


## Bonus task

Create a chatbot function that takes in a prompt and generates a response. Make sure the chatbot can handle multiple turns of conversation (i.e., it can remember previous responses).