## Experimenting with GPT-2

This notebook is part of [AI for Beginners Curriculum](http://aka.ms/ai-beginners).

In this notebook, we will explore how we can play with OpenAI GPT-2 model using Hugging Face `transformers` library.

Without further ado, let's instantiate text generating pipeline and start generating! You can select smaller GPT-2 model in order to increase download time and speed of inference, but that would affect the quality.

In [1]:
from transformers import pipeline

model_name = 'gpt2-large' # try 'gpt2' for small model, 'gpt2-medium' for medium one

generator = pipeline('text-generation', model=model_name)

generator("Hello! I am a neural network, and I want to say that", max_length=100, num_return_sequences=5)


Some weights of GPT2Model were not initialized from the model checkpoint at gpt2-large and are newly initialized: ['h.0.attn.masked_bias', 'h.1.attn.masked_bias', 'h.2.attn.masked_bias', 'h.3.attn.masked_bias', 'h.4.attn.masked_bias', 'h.5.attn.masked_bias', 'h.6.attn.masked_bias', 'h.7.attn.masked_bias', 'h.8.attn.masked_bias', 'h.9.attn.masked_bias', 'h.10.attn.masked_bias', 'h.11.attn.masked_bias', 'h.12.attn.masked_bias', 'h.13.attn.masked_bias', 'h.14.attn.masked_bias', 'h.15.attn.masked_bias', 'h.16.attn.masked_bias', 'h.17.attn.masked_bias', 'h.18.attn.masked_bias', 'h.19.attn.masked_bias', 'h.20.attn.masked_bias', 'h.21.attn.masked_bias', 'h.22.attn.masked_bias', 'h.23.attn.masked_bias', 'h.24.attn.masked_bias', 'h.25.attn.masked_bias', 'h.26.attn.masked_bias', 'h.27.attn.masked_bias', 'h.28.attn.masked_bias', 'h.29.attn.masked_bias', 'h.30.attn.masked_bias', 'h.31.attn.masked_bias', 'h.32.attn.masked_bias', 'h.33.attn.masked_bias', 'h.34.attn.masked_bias', 'h.35.attn.masked_bi

[{'generated_text': 'Hello! I am a neural network, and I want to say that I am an expert in the area of learning and understanding neural networks. I also know a bit about math. You might have seen "How to make deep neural networks" or "What is a deep learning neural network?". I would like to discuss the first one.\n\nHow to make deep neural networks is a very complicated topic, and I don\'t know how to go into it in detail, but I want to do some'},
 {'generated_text': 'Hello! I am a neural network, and I want to say that you can find a network algorithm, called Naive Bayes, which produces good results, but which is computationally too expensive to be useful, so I will not cover all of its details here.\n\nFirst we are going to define how an NN looks like.\n\nHere is a naive Bayes classification problem with a few variables:\n\nFeature Input Value (x) 1 2 3 4 5 1 3 4'},
 {'generated_text': 'Hello! I am a neural network, and I want to say that… "I\'m done with this crap!"\n\nNow, if yo

## Prompt Engineering

In some of the problems, you can use GPT-2 generation right away by designing correct prompts. Have a look at the examples below:

In [10]:
generator("Synonyms of a word cat:", max_length=20, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Synonyms of a word cat:\n\n(a) cat;\n\n(b) black'},
 {'generated_text': 'Synonyms of a word cat: cat-bitch; cat-queen; cat-c'},
 {'generated_text': 'Synonyms of a word cat: feline, feline form, feline spirit, fas'},
 {'generated_text': 'Synonyms of a word cat:\n\n(a) cat-like, like a cat;'},
 {'generated_text': 'Synonyms of a word cat:\n\nThe more common English words you need to understand the meaning'}]

In [25]:
generator("I love when you say this -> Positive\nI have myself -> Negative\nThis is awful for you to say this ->", max_length=40, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'I love when you say this -> Positive\nI have myself -> Negative\nThis is awful for you to say this -> Negative\nHow do you think this hurts us all? -> Positive\nYou seem'},
 {'generated_text': "I love when you say this -> Positive\nI have myself -> Negative\nThis is awful for you to say this -> Negative\nThis is such a horrible way to treat yourself -> Negative (You're"},
 {'generated_text': 'I love when you say this -> Positive\nI have myself -> Negative\nThis is awful for you to say this -> I am disappointed\nI have to admit -> You are a good friend but you'},
 {'generated_text': 'I love when you say this -> Positive\nI have myself -> Negative\nThis is awful for you to say this -> Positive\nThis is awful for me to do this with you -> Negative\nWhy'},
 {'generated_text': "I love when you say this -> Positive\nI have myself -> Negative\nThis is awful for you to say this -> I'm still sad for you!\nMe? \xa0I find this a"}]

In [20]:
generator("Translate English to French: cat => chat, dog => chien, student => ", top_k=50, max_length=30, num_return_sequences=3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Translate English to French: cat => chat, dog => chien, student => étudiant;\n\n\nOther:\n\nTo learn'},
 {'generated_text': 'Translate English to French: cat => chat, dog => chien, student => étude, french = le français = "'},
 {'generated_text': "Translate English to French: cat => chat, dog => chien, student => été, mama => m'aime. We"}]

In [5]:
generator("People who liked the movie The Matrix also liked ", max_length=40, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'People who liked the movie The Matrix also liked \xa0"Dawn of the Dead" (or in the case of John Goodman "Vanish"),\xa0but their votes didn\'t matter! The first'},
 {'generated_text': 'People who liked the movie The Matrix also liked _____. It\'s a true statement, as "tasteful" movies are often more popular for their plot, characters and themes than for their entertainment'},
 {'generated_text': 'People who liked the movie The Matrix also liked \xa0the book... \xa0And so on, and so forth. Now at the other end of the spectrum...\n...the real "truths'},
 {'generated_text': "People who liked the movie The Matrix also liked 『The Grand Budapest Hotel』. 『The Grand Budapest Hotel』 is the movie people who like to watch their dream come true. I'll explain"},
 {'generated_text': 'People who liked the movie The Matrix also liked \xa0a lot of the lines that were used, and I got to listen to the dialogue in the movie. \xa0The characters were really enjoyable characters'}]

## Text Sampling Strategies

So far we have been using simple **greedy** sampling strategy, when we selected next word based on the highest probability. Here is how it works:

In [28]:
prompt = "It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw"
generator(prompt,max_length=100,num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw my girlfriend with a very cute looking guy. I noticed that his eyes that looked into mine were a bit scary and kind of like a predator's eyes. And that's when I knew that I had to know something about him.\n\nAs I sat, studying him, my mind was racing in a crazy way. We didn't exactly"},
 {'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw my reflection in a small screen attached to a thin, white piece of glass. Suddenly, I thought about the possibility of making money making a computer, and decided to try it for myself! I started working on making an e-ink display. I was able to make it work by following some simple rules!\n\nThe first step is'},
 {'generated_text': 'It was early evening when I can back from work. I usually work late,

**Beam Search** allows the generator to explore several directions (*beams*) of text generation, and select the ones with highers overall score. You can do beam search by providing `num_beams` parameter. You can also specify `no_repeat_ngram_size` to penalize the model for repeating n-grams of a given size: 

In [29]:
prompt = "It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw"
generator(prompt,max_length=100,num_return_sequences=5,num_beams=10,no_repeat_ngram_size=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw a group of people sitting around a table. One of them was a middle-aged man. He was wearing a black suit and a white shirt. His eyes were closed and he was staring at the floor with his hands on his knees.\n\n"What are you doing here?" I asked him. "What do you want?"\n'},
 {'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw a man sitting at a table in front of a computer. He was wearing a black suit, a white shirt, and a red tie.\n\n"Hello," he said to me. "How are you?" he asked me in a voice that sounded as if he was speaking to a child. There was a smile on his face.'},
 {'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw a man sitt

**Sampling** selects the next word non-deterministically, using the probability distribution returned by the model. You turn on sampling using `do_sample=True` parameter. You can also specify `temperature`, to make the model more or less deterministic.

In [30]:
prompt = "It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw"
generator(prompt,max_length=100,do_sample=True,temperature=0.8)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'It was early evening when I can back from work. I usually work late, but this time it was an exception. When I entered a room, I saw one of my colleagues lying on his stomach. He was unconscious. I could smell a strong odor of alcohol on his breath. I was on my way out. I ran to his room and found him unconscious on the bed. I saw a bottle of wine on the bed. He was in a state of intoxication. He was unconscious. I'}]

We can also provide to additional parameters to sampling:
* `top_k` specifies the number of word options to consider when using sampling. This minimizes the chance of getting weird (low-probability) words in our text.
* `top_p` is similar, but we chose the smallest subset of most probable words, whose total probability is larger than p.

Feel free to experiment with adding those parameters in.

## Fine-Tuning GPT-2

You can also fine-tune GPT-2 text generation on your own dataset. This will allow you to adjust the style of text, while keeping the major part of language model. The example of fine-tuning GPT-2 to generate song lyrics can be found [in this blog post](https://towardsdatascience.com/how-to-fine-tune-gpt-2-for-text-generation-ae2ea53bc272).