---
# Text Generation using GPT-2

This notebook demonstrates the process of generating text using the GPT-2 model from the Hugging Face Transformers library. GPT-2 is a powerful language model capable of generating human-like text based on a given input prompt.

The notebook is organized as follows:

1. **Installation and Importing Libraries**: We begin by installing the required Hugging Face Transformers library.

2. **Checking GPU Availability**: We check for the availability of a GPU to run the GPT-2 model more efficiently.

3. **Loading Model and Tokenizer**: We load the GPT-2 model and its corresponding tokenizer from the Hugging Face Transformers library.

4. **Defining Text Generation Function**: We define a function named `generate_prompt` to generate text based on a given input prompt. The function takes the input prompt, model, tokenizer, and a maximum length as input parameters and generates a sequence of text.

5. **Generating Text**: We generate text using the GPT-2 model with various input prompts to showcase its capabilities, including prompts about "The Lord of the Rings", "Quantum Physics", "Donald Trump", and "Poisonous Vegetables".

By the end of this notebook, you will have a better understanding of how to use the GPT-2 model from the Hugging Face Transformers library to generate text based on a given input prompt.

---


# install libs

In [2]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.27.1-py3-none-any.whl (6.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.7/6.7 MB[0m [31m25.2 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m83.8 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.13.2-py3-none-any.whl (199 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.2/199.2 KB[0m [31m22.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.13.2 tokenizers-0.13.2 transformers-4.27.1


# import libs and check gpu

In [1]:
!nvidia-smi

Sun Mar 19 19:56:41 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   60C    P0    27W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("Using CPU")


Using GPU: Tesla T4


# load model and tokenizer

In [4]:
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2').to(device)
model.eval()

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0): GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (1): GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dro

# generate_prompt and detailed LOTR explanation

In [5]:
def generate_prompt(prompt, model, tokenizer, max_length=1024):
    input_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)
    output = model.generate(input_ids, max_length=max_length, num_return_sequences=1, no_repeat_ngram_size=2, early_stopping=True, pad_token_id=tokenizer.eos_token_id)
    summary = tokenizer.decode(output[0], skip_special_tokens=True)
    return summary.strip()


prompt = f"Write a detailed explanation of the events in the lord of the rings"
summary = generate_prompt(prompt, model, tokenizer)
summary.split('.')

['Write a detailed explanation of the events in the lord of the rings',
 '\n\nThe Lord of Rings: The Card Game\n',
 ' The Lord is the Lord, the King of all the Rings',
 ' He is a powerful and powerful being, and he is known to be the most powerful of them all',
 ' His power is that of a king',
 ' In the game, he has the power to summon the Ring of Power, which is an ancient artifact that is used to control the world',
 ' It is said that the ring is one of his most important possessions',
 ' When the player defeats the evil lord, they will be able to use it to defeat the other lords',
 ' This is because the powerful lord is able, through the use of magic, to create a ring that can be used by any character',
 ' If the character defeats him, then the person who defeated him will become the king of that world, as well as the ruler of it',
 ' However, if the party defeats this powerful evil, it will not be possible to return to the realm of power',
 ' Instead, a character will have to fight

# more fun tests

## what is quantum physics ?

In [None]:
prompt = f"Quantum physics, wikipedia"
summary = generate_prompt(prompt, model, tokenizer)
summary.split('.')

['Quantum physics, wikipedia',
 'com/wiki/Quantums\n\nThe quantum field is a type of particle that is used to describe the properties of matter',
 ' It is the most common type in physics',
 '\n',
 ' The quantum fields are a kind of particles that are used for describing the effects of quantum mechanics',
 ' They are the particles of the quantum theory',
 ' In the case of a particle, the field of view is not the same as the view of its particle',
 ' For example, a photon is an electron, and a wave is another wave',
 ' A particle is also called a quantum particle because it is composed of two particles',
 ' This is why the term quantum is sometimes used',
 ' Quantum physics is based on the theory of relativity',
 ' When you look at a picture of an object, you can see that the object is moving in a straight line',
 ' You can also see the direction of movement of objects',
 ' If you see a moving object moving, it will move in the opposite direction',
 ' So, if you are looking at an image o

## Trump weird allegations

In [6]:
prompt = f"Donald Trump, detailed NYT article"
summary = generate_prompt(prompt, model, tokenizer)
summary.split('.')

["Donald Trump, detailed NYT article on the Trump campaign's ties to Russia",
 '\n\n"The Trump team has been working with the Russian government to help elect Donald Trump as president," the article said',
 ' "The campaign has also been providing financial support to the Kremlin to support the campaign of the Democratic nominee for president',
 '"\n',
 '@nytimes: "Trump campaign and Russian officials have been helping elect Hillary Clinton as the next president of United States',
 '" pic',
 'twitter',
 'com/Y4YjYXqYqE — The Hill (@thehill) November 8, 2016\n, which was published on Nov',
 ' 8',
 ' The article was written by a former senior adviser to former President Bill Clinton',
 ' It was also published by the Washington Post',
 '']

## poisonous vegetables

In [7]:
prompt = f"When I was a child, I loved to eat poisonous vegetables. "
summary = generate_prompt(prompt, model, tokenizer)
summary.split('.')

['When I was a child, I loved to eat poisonous vegetables',
 ' \xa0I loved the smell of the leaves',
 ' I liked to play with my friends',
 '\xa0 I wanted to be a good cook',
 '\nI was always a little bit of a vegetarian',
 " \xa0 I didn't eat meat",
 ' But I did eat vegetables, and I ate them',
 ' And I always wanted them to taste good',
 ' So I started to make my own',
 ' It was very simple',
 " The first time I made my first batch of vegetables was in the summer of '94",
 ' My husband and my family were in town',
 ' We were going to the grocery store',
 ' They were all looking at the same thing',
 ' "What\'s your name?" "John',
 '" "I\'m John',
 '" I said, "You\'re John',
 " You're a farmer",
 '" They said "Yes, John, you\'re the farmer',
 ' What\'s the name of your farm?" I told them, it\'s a farm',
 " John said it was the farm, but I don't know what it is",
 " He said he was going out to buy some vegetables and he said I'm going back to my farm and we're going on a trip",
 ' Then I