<a href="https://colab.research.google.com/github/simulate111/Introduction-to-Human-Language-Technology/blob/main/Exercise%20task%2013.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text generation example

This is a brief example of how to run text generation with a causal language model and `pipeline`.

Install [transformers](https://huggingface.co/docs/transformers/index) python package. This will be used to load the model and tokenizer and to run generation.

In [1]:
!pip install --quiet transformers

Import the `AutoTokenizer`, `AutoModelForCausalLM`, and `pipeline` classes. The first two support loading tokenizers and generative models from the [Hugging Face repository](https://huggingface.co/models), and the last wraps a tokenizer and a model for convenience.

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

Load a generative model and its tokenizer. You can substitute any other generative model name here (e.g. [other TurkuNLP GPT-3 models](https://huggingface.co/models?sort=downloads&search=turkunlp%2Fgpt3)), but note that Colab may have issues running larger models.

In [3]:
MODEL_NAME = 'gpt2-large'

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Instantiate a text generation pipeline using the tokenizer and model.

In [4]:
pipe = pipeline(
    'text-generation',
    model=model,
    tokenizer=tokenizer,
    device=model.device
)

We can now call the pipeline with a text prompt; it will take care of tokenizing, encoding, generation, and decoding:

In [5]:
output = pipe('Terve, miten menee?', max_new_tokens=25)

print(output)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Terve, miten menee? Terve hat, hat man einen Fürsten gegeben können sie aufgebaut'}]


Just print the text

In [6]:
print(output[0]['generated_text'])

Terve, miten menee? Terve hat, hat man einen Fürsten gegeben können sie aufgebaut


We can also call the pipeline with any arguments that the model `generate` function supports. For details on text generation using `transformers`, see e.g. [this tutorial](https://huggingface.co/blog/how-to-generate).

Example with sampling and a high `temperature` parameter to generate more chaotic output:

In [7]:
output = pipe(
    'Terve, miten menee?',
    do_sample=True,
    temperature=10.0,
    max_new_tokens=25
)

print(output[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Terve, miten menee? Sachgebrift dies eingesind einig- löblichendlicheten i nymm


person name recognition

In [8]:
#zero_shot
pipe("List the person names occurring in the following texts., John and Mary went to the market.", max_length=40, num_return_sequences=3)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'List the person names occurring in the following texts., John and Mary went to the market.John and Mary went to Bethany.John and Mary went to Galilee.John and Mary went to Cap'},
 {'generated_text': 'List the person names occurring in the following texts., John and Mary went to the market. Mary gave her first to him (). Mary was forty, John was thirty. They had three sons. Joseph was'},
 {'generated_text': 'List the person names occurring in the following texts., John and Mary went to the market. Peter and those who followed followed with them., Jesus took a loaf of unleavened bread., The disciples took'}]

In [9]:
#one_shot
pipe(['John and Mary went to the market.', "Sophia and Ethan are siblings."], max_length=40, num_return_sequences=3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[[{'generated_text': 'John and Mary went to the market. They asked for a cask of rum as usual, but the seller said that he had no more in stock. "We\'ll have the rum before you,"'},
  {'generated_text': 'John and Mary went to the market.\n\n15 Then the other Mary said to Elizabeth, "You have nothing to eat."\n\n16 But Elizabeth answered her, "God has given you food'},
  {'generated_text': 'John and Mary went to the market.\n\n16 And Jesus went and told some of the teachers of the Pharisees and Sadducees in the market place which were teaching in the synagogue'}],
 [{'generated_text': 'Sophia and Ethan are siblings.\n\nSophia and Ethan were adopted and raised in the U.S. after their mother was incarcerated for committing the "deadly sins" of incest'},
  {'generated_text': 'Sophia and Ethan are siblings. They have lived alone in the house for years. Sophia is a student at Caltech, and wants to study biology. Ethan is a student at the college to'},
  {'generated_text': "Sophia and Ethan a

In [10]:
#two_shot
pipe(['John and Mary went to the market.', "Sophia and Ethan are siblings.", "Tom and Sarah are best friends."], max_length=40, num_return_sequences=3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[[{'generated_text': 'John and Mary went to the market. After making some small talk at one of the stalls, he asked me, "Where are the sheep? Jesus is coming soon." I couldn\'t think what to'},
  {'generated_text': 'John and Mary went to the market.\n\nThey were there for two hours. One of them called me to come back. We went to the church and ate the Lenten meal. In the'},
  {'generated_text': 'John and Mary went to the market.\n\nAt the counter, an old man told John, "It is bad luck if you fail. Just wait half an hour, and it will be over'}],
 [{'generated_text': 'Sophia and Ethan are siblings. They are the only ones in the original series who are not clones.\n\nOscar and Nicole in the original series also seemed to have a lot in common'},
  {'generated_text': "Sophia and Ethan are siblings. He, also a scientist, is in fact related to Mia's boyfriend from the future, whose name is Miles O'Brien, although he can't get her"},
  {'generated_text': "Sophia and Ethan are siblings. Sophia 

In [11]:
#zero_shot
pipe("This is a first grade math exam. calculate this addition: 1 '+' 2 =", max_length=40, num_return_sequences=3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 =' 3 '+' 4 '+' 5 '+' 6 ='7 '+' 8 '"},
 {'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 = 3, multiply by 2 and then 4. It's a second grade algebra exam.\n\nAlgebra is"},
 {'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 =' 3 and so on, all the way round, one line at a time. (We'll have an"}]

two-digit addition (e.g. 11 + 22 = 33)

In [12]:
#one_shot
pipe(["This is a first grade math exam. calculate this addition: 1 '+' 2 =", "This is a first grade math exam. calculate this addition: 11 '+' 22 ="], max_length=40, num_return_sequences=6)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[[{'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 = 3, and the answer is 3.\n\n\nThis is a first grade spelling, history, grammar, and"},
  {'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 = 3 3+ = 4\n\n2×3 equals 3.\n\n4×4 equals 4.\n"},
  {'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 = 3 3 + 2 = 5 So we want to: add 1 '+' 2\n\nadd 3 '"},
  {'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 ='3' and if you want to multiply it by two, multiply: 1 '+' 2 +'"},
  {'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 = +' + 2. add + 1 2. 2 + 2. add - 1 2. 5 - 2"},
  {'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 = 2 '+' 3 = 3 =\n\nThis is a first grade algebra exam. calculate this subtraction"}],
 [{'generated_text': "This is a first grade math exam. calculate this additio

In [13]:
#two_shot
pipe(["This is a first grade math exam. calculate this addition: 1 '+' 2 =", "This is a first grade math exam. calculate this addition: 11 '+' 22 =", "This is a first grade math exam. calculate this addition: 7 '+' 8 ="], max_length=40, num_return_sequences=9)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[[{'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 = 2 3 + 2 4 = 4 The answer is 1. We need to multiply 4 by 2. How many"},
  {'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 = 3, 4 + 1 = 5, 5+2 = 7, 7/2 2 = 8, 8"},
  {'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 = 3, 3 1 + 2 (1 + 2) = 5, 5 This is a first grade grammar exam"},
  {'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 = 3 5 '+' 3 = 4 11 '+' 4 = 5 16 '+' 5 = 6"},
  {'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 = 3 Note, that 5+3=10 is not an addition and thus the result is an odd numeral"},
  {'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 =3 '+' 4 =5 Now add these two together and you know the answer.\n\n(And"},
  {'generated_text': "This is a first grade math exam. calculate this addition: 1 '+' 2 = 

binary sentiment classification (positive / negative)

In [14]:
#zero_shot
pipe("Do the following texts express a positive or negative sentiment? Sarah is happy these days.", max_length=40, num_return_sequences=3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Do the following texts express a positive or negative sentiment? Sarah is happy these days. Is it time for me to leave? David hates that my clothes are so dirty. Will we break up? Do'},
 {'generated_text': 'Do the following texts express a positive or negative sentiment? Sarah is happy these days.\n\nThis page contains both negative and positive affirmations. For both of the texts used below, we see the'},
 {'generated_text': 'Do the following texts express a positive or negative sentiment? Sarah is happy these days. I love that he likes her so much.\n\nWhy does Michael break up with Sarah, and what makes him'}]

In [15]:
#one_shot
pipe(["Do the following texts express a positive or negative sentiment? Sarah is happy these days.", "Do the following texts express a positive or negative sentiment? Sarah and Tom aer enjoying the life and are happy these days."], max_length=40, num_return_sequences=6)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[[{'generated_text': 'Do the following texts express a positive or negative sentiment? Sarah is happy these days. Can you describe the happy feeling on your part and in Sarah\'s? If the text has the word "thumbs'},
  {'generated_text': 'Do the following texts express a positive or negative sentiment? Sarah is happy these days. She has some new friends. What do you think?\n\n(1) Sarah is happy. She has some'},
  {'generated_text': "Do the following texts express a positive or negative sentiment? Sarah is happy these days. That's amazing Sarah is so beautiful and is very loved Sarah wants to tell her boyfriend she loved him as much as"},
  {'generated_text': 'Do the following texts express a positive or negative sentiment? Sarah is happy these days. She has new shoes, new food, new clothes - she can get what she wants. But at that time,'},
  {'generated_text': 'Do the following texts express a positive or negative sentiment? Sarah is happy these days. Sarah is sad. Sarah is angry. But Sa

In [16]:
#two_shot
pipe(["Do the following texts express a positive or negative sentiment? Sarah is happy these days.", "Do the following texts express a positive or negative sentiment? Sarah and Tom aer enjoying the life and are happy these days.", "Do the following texts express a positive or negative sentiment? The audiences were very sad to see such disasters around the world and want to help people."], max_length=40, num_return_sequences=9)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[[{'generated_text': 'Do the following texts express a positive or negative sentiment? Sarah is happy these days. Sarah wants everyone to do well. Sarah wants Sarah to be happy. Sarah is happy. Sarah wants everyone to do'},
  {'generated_text': "Do the following texts express a positive or negative sentiment? Sarah is happy these days. Sarah hates these days. Sarah hates these days Sarah likes this week Sarah likes this week Sarah doesn't like this week"},
  {'generated_text': "Do the following texts express a positive or negative sentiment? Sarah is happy these days. This is a good thing. I'm sorry, you did something wrong, and your mother is mad at you."},
  {'generated_text': 'Do the following texts express a positive or negative sentiment? Sarah is happy these days. Would Sarah get married or get divorced if she had known what her husband was really like? Sarah is still a little'},
  {'generated_text': "Do the following texts express a positive or negative sentiment? Sarah is happy