<a href="https://colab.research.google.com/github/rahiakela/natural-language-processing-case-studies/blob/master/gpt-mechanism/text_generation_using_gpt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Text generation using GPT-2

Transformers can be pretrained on large bodies of unlabeled data and
then finetuned for other tasks. Two main groups of such pretrained models are:

1. **Bidirectional Encoder Representations from Transformers (BERTs)**
2. **Generative Pretrained Transformers (GPTs)**

The first GPT model was introduced in a 2018 paper by Radford et al. from OpenAI – it demonstrated how a generative language model can acquire knowledge and process longrange dependencies thanks to pretraining on a large, diverse corpus of contiguous text. Two successor models (trained on more extensive corpora) were released in the following years: GPT-2 in 2019 (1.5 billion parameters) and GPT-3 in 2020 (175 billion parameters).

We will be making use of the excellent Transformers library created by Hugging Face(https://huggingface.co/). It abstracts away several components of the building process, allowing us to focus on the model performance and intended performance.

Reference:

1. https://huggingface.co/blog/how-to-generate
2. https://huggingface.co/transformers/model_doc/gpt2.html

## Setup

In [None]:
# installing Transformers and TensorFlow 2.0 in one line
!pip install transformers[tf-gpu]

In [2]:
import tensorflow as tf
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

One of the advantages of the Transformers library – and a reason for its popularity, undoubtedly – is how easily we can download a specific model (and also define the appropriate tokenizer):

In [None]:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-large")
GPT2 = TFGPT2LMHeadModel.from_pretrained("gpt2-large", pad_token_id=tokenizer.eos_token_id)

It is usually a good idea to fix the random seed to ensure the results are reproducible.

In [4]:
# settings

# for reproducability
SEED=34
tf.random.set_seed(SEED)

# maximum number of words in output text
MAX_LEN = 70

## Different Decoding Methods

Let us focus on the fact that how we decode is one of the most important decisions when using a GPT-2 model.

### Greedy search

With **greedy search**, the word with the highest probability is predicted as the next word in the sequence:

In [5]:
input_sequence1 = "I don't know about you, but there's only one thing I want to do after a long day of work"
input_sequence2 = "There are times when I am really tired of people, but I feel lonely too."

Once we have our input sequence, we encode it and then call a decode method:

In [6]:
# encode context the generation is conditioned on
input_ids = tokenizer.encode(input_sequence1, return_tensors="tf")

# generate text until the output length (which includes the context length) reaches 70
greedy_output = GPT2.generate(input_ids, max_length=MAX_LEN)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
I don't know about you, but there's only one thing I want to do after a long day of work: go to the gym.

I'm not talking about the gym that's right next to my house. I'm talking about the gym that's right next to my house.

I'm not talking about the gym that


In [7]:
# encode context the generation is conditioned on
input_ids = tokenizer.encode(input_sequence2, return_tensors="tf")

# generate text until the output length (which includes the context length) reaches 70
greedy_output = GPT2.generate(input_ids, max_length=MAX_LEN)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
There are times when I am really tired of people, but I feel lonely too. I feel like I'm alone in the world. I feel like I'm alone in my own body. I feel like I'm alone in my own mind. I feel like I'm alone in my own heart. I feel like I'm alone in my own mind


As you can see, the results leave some room for improvement: the model starts repeating itself, because the high-probability words mask the less-likely ones so they cannot explore more diverse combinations.

### Beam search

A simple remedy is **beam search**: we keep track of the alternative variants, so that more comparisons are possible:

In [8]:
# set return_num_sequences > 1
beam_outputs = GPT2.generate(input_ids, max_length=MAX_LEN, num_beams=5, no_repeat_ngram_size=2, num_return_sequences=5, early_stopping=True)

print("")
print("Output:\n" + 100 * '-')

# now we have 5 output sequences
for i, beam_output in enumerate(beam_outputs):
  print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))


Output:
----------------------------------------------------------------------------------------------------
0: There are times when I am really tired of people, but I feel lonely too. I don't know what to do with myself."

"I feel like I can't do anything right now," she said. "I'm so tired."
1: There are times when I am really tired of people, but I feel lonely too. I don't know what to do with myself."

"I feel like I can't do anything right now," she says. "I'm so tired."
2: There are times when I am really tired of people, but I feel lonely too. I don't know what to do with myself."

"I feel like I can't do anything right now," she says. "I'm not sure what I'm supposed to be doing with my life."
3: There are times when I am really tired of people, but I feel lonely too. I don't know what to do with myself."

"I feel like I can't do anything right now," she says. "I'm not sure what I'm supposed to be doing."
4: There are times when I am really tired of people, but I feel lonely to

This is definitely more diverse – the message is the same, but at least the formulations look a little different from a style point of view.

## Sampling – Indeterministic Decoding

we can explore sampling – indeterministic decoding. Instead of following a strict path to find the end text with the highest probability, we rather randomly pick the next word by its conditional probability distribution. 

This approach risks producing incoherent ramblings, so we make use of the temperature parameter, which affects the probability mass distribution:

In [9]:
# use temperature to decrease the sensitivity to low probability candidates
sample_output = GPT2.generate(input_ids, do_sample=True, max_length=MAX_LEN, top_k=0, temperature=0.2)

print("")
print("Output:\n" + 100 * '-')

print(tokenizer.decode(sample_output[0], skip_special_tokens=True))


Output:
----------------------------------------------------------------------------------------------------
There are times when I am really tired of people, but I feel lonely too. I feel like I'm alone in my own world. I feel like I'm alone in my own life. I feel like I'm alone in my own mind. I feel like I'm alone in my own heart. I feel like I'm alone in my own


What happens if we increase the temperature?

In [10]:
# use temperature to decrease the sensitivity to low probability candidates
sample_output = GPT2.generate(input_ids, do_sample=True, max_length=MAX_LEN, top_k=0, temperature=0.8)

print("")
print("Output:\n" + 100 * '-')

print(tokenizer.decode(sample_output[0], skip_special_tokens=True))


Output:
----------------------------------------------------------------------------------------------------
There are times when I am really tired of people, but I feel lonely too. I find it strange how the people around me seem to be always so nice. The only time I feel lonely is when I'm on the road. I can't be alone with my thoughts.

What are some of your favourite things to do in the area


This is getting more interesting, although it still feels a bit like a train of thought – which is perhaps to be expected, given the content of our prompt. Let's explore some more ways to tune the output.

### Top-K sampling

In **Top-K sampling**, the top k most likely next words are selected and the entire probability mass is shifted to these k words. So instead of increasing the chances of high-probability words occurring and decreasing the chances of low-probability words, we just remove lowprobability words altogether.

In [11]:
# sample from only top_k most likely words
sample_output = GPT2.generate(input_ids, do_sample=True, max_length=MAX_LEN, top_k=50)

print("")
print("Output:\n" + 100 * '-')

print(tokenizer.decode(sample_output[0], skip_special_tokens=True))


Output:
----------------------------------------------------------------------------------------------------
There are times when I am really tired of people, but I feel lonely too. I go to a place where you can feel comfortable. It's a place where you can relax. But if you're so tired of going along with the rules, maybe I won't go. You know what? Maybe if I don't go, you won't


This seems like a step in the right direction. Can we do better?

###Top-P sampling

**Top-P sampling** (also known as nucleus sampling) is similar to Top-K, but instead of choosing the top k most likely words, we choose the smallest set of words whose total probability is larger than p, and then the entire probability mass is shifted to the words in this set. 

The main difference here is that with Top-K sampling, the size of the set of words is static (obviously), whereas in Top-P sampling, the size of the set can change. 

To use this sampling method, we just set top_k = 0 and choose a top_p value:

In [12]:
# sample only from 80% most likely words
sample_output = GPT2.generate(input_ids, do_sample=True, max_length=MAX_LEN, top_k=0, top_p=0.8)

print("")
print("Output:\n" + 100 * '-')

print(tokenizer.decode(sample_output[0], skip_special_tokens=True))


Output:
----------------------------------------------------------------------------------------------------
There are times when I am really tired of people, but I feel lonely too. I feel like I should just be standing there, just sitting there. I know I'm not a danger to anybody. I just feel alone."


We can combine both approaches:

In [13]:
# combine both sampling techniques
sample_outputs = GPT2.generate(input_ids, do_sample=True, max_length= 2* MAX_LEN, top_k=50, top_p=0.85, num_return_sequences=5)

print("")
print("Output:\n" + 100 * '-')

for i, sample_output in enumerate(sample_outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
  print("")


Output:
----------------------------------------------------------------------------------------------------
0: There are times when I am really tired of people, but I feel lonely too. I don't feel like I am being respected by my own country, which is why I am trying to change the government."

In a recent video posted to YouTube, Mr. Jaleel, dressed in a suit and tie, talks about his life in Pakistan and his frustration at his treatment by the country's law enforcement agencies. He also describes how he met a young woman from California who helped him organize the protest in Washington.

"She was a journalist who worked with a television channel in Pakistan," Mr. Jaleel says in the video. "She came to my home one day,

1: There are times when I am really tired of people, but I feel lonely too. It's not that I don't like to be around other people, but it's just something I have to face sometimes.

What is your favorite thing to eat?

The most favorite thing I have eaten is chicken and

Clearly, the more-sophisticated method's settings can give us pretty impressive results.

Let's explore this avenue more – we'll use the prompts taken from OpenAI's GPT-2 website, where they feed them to a full-sized GPT-2 model. This comparison will give us an idea of how well we are doing with a local (smaller) model compared to a full one that was used for the original
demos:

In [15]:
MAX_LEN = 500

prompt1 = "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, \\
in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English."

input_ids = tokenizer.encode(prompt1, return_tensors="tf")

sample_outputs = GPT2.generate(input_ids, do_sample=True, max_length= MAX_LEN, top_k=50, top_p=0.85)

print("")
print("Output:\n" + 100 * '-')

for i, sample_output in enumerate(sample_outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
  print("")


Output:
----------------------------------------------------------------------------------------------------
0: In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, \in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

Advertisement

"We discovered that the unicorns are intelligent, talk perfectly English, and are very familiar with humans," said the study's lead researcher, Dr. Alan Stone, who is a professor at the University of New Mexico in Albuquerque. "And we were astonished to discover that the animals can speak English."

According to a report in Science, Stone said that when the scientists first looked for the unicorns, they saw that they were roaming a valley in the mountains near a small village. There, they spotted the animals, which were about the size of a cow, on a nearby rock.

"We thought the unicorns must be extinct," Stone said. "We thought we

In another example, it seems like the trepidations of the model authors were justified: GPT-2 can in fact generate fake news stories.

In [19]:
prompt2 = "Miley Cyrus was caught shoplifting from Abercrombie and Fitch on Hollywood Boulevard today."

input_ids = tokenizer.encode(prompt2, return_tensors="tf")

sample_outputs = GPT2.generate(input_ids, 
                               do_sample=True, 
                               max_length= MAX_LEN, 
                               temperature=0.8,
                               top_k=50, 
                               top_p=0.85,
                               num_return_sequences=5)

print("")
print("Output:\n" + 100 * '-')

for i, sample_output in enumerate(sample_outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
  print("\n" + 100 * '-')


Output:
----------------------------------------------------------------------------------------------------
0: Miley Cyrus was caught shoplifting from Abercrombie and Fitch on Hollywood Boulevard today.

She was arrested for allegedly stealing $15,000 worth of clothes from the department store.

Cyrus was caught on surveillance video leaving the store with $15,000 worth of clothes before walking back to her car, TMZ reported.

Cyrus was wearing a black and white striped dress, black heels, a white T-shirt and black leggings.

Cyrus was arrested on suspicion of shoplifting and is now being held at the Los Angeles County Jail.

The singer was seen leaving the department store with a bag of clothes in her hand

The star was spotted wearing a black and white striped dress, black heels, a white T-shirt and black leggings

Cyrus was spotted leaving the store with a bag of clothes in her hand

Cyrus was caught on surveillance video leaving the store with $15,000 worth of clothes before walk

What about riffing off literature classics like Tolkien?

In [18]:
prompt3 = "Legolas and Gimli advanced on the orcs, raising their weapons with a harrowing war cry"

input_ids = tokenizer.encode(prompt3, return_tensors="tf")

sample_outputs = GPT2.generate(input_ids, 
                               do_sample=True, 
                               max_length= MAX_LEN, 
                               temperature=0.8,
                               top_k=50, 
                               top_p=0.85,
                               num_return_sequences=5)

print("")
print("Output:\n" + 100 * '-')

for i, sample_output in enumerate(sample_outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
  print("\n" + 100 * '-')


Output:
----------------------------------------------------------------------------------------------------
0: Legolas and Gimli advanced on the orcs, raising their weapons with a harrowing war cry and firing a volley of arrows at them.

Gimli charged at an orc, but was held back by the other warriors, who used their weapons to block Gimli's attacks. The two orcs were thrown back, and Gimli was stabbed in the back by a dagger. The orc quickly recovered and stabbed Gimli again. The orc staggered back, but the two men continued their attack, with Gimli falling to the ground and Gimli's blade impaled on the orc's shoulder.

Gimli tried to fight back, but he was no match for the orc's strength and speed. The orc, however, was no match for the two men's courage and skill, and they began to flee. The orc and Gimli ran across the battlefield, and the two orcs were chased by a number of soldiers, who had arrived to try to stop them.

Gimli's sword slashed the orc's shoulder, and the orc fell

As you can see from the examples above, a GPT-2 model working out of the box (without finetuning) can already generate plausible-looking long-form text. Assessing the future impact of this technology on the field of communication remains an open and highly controversial issue: on the one hand, there is fully justified fear of fake news proliferation (see the Miley Cyrus story above). This is particularly concerning because large-scale automated detection of generated text is an extremely challenging topic. 

On the other hand, GPT-2 text generation capabilities can be helpful for creative types: be it style experimentation or parody, an AIpowered
writing assistant can be a tremendous help.