# CS 39AA - Notebook 13a: Text Generation with (pre-trained) GPT-2

Using a pre-trained transformer model, let's see what kind of text generation results we can get. We'll later see how this compares when we fine-tune this model using a corpus of our choice. To facilitate a comparison between the two we will use the same prompt for both. 

The hugginface documentation on how to do text generation with a pre-trained model can be found here:
* https://huggingface.co/transformers/v4.0.1/task_summary.html#text-generation


In [45]:
import pandas as pd
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import set_seed

The model we're choosing to use is the GPT-2 Medium model, which is the second smallest of the four GPT-2 models available (small, medium, large, and extra large). This can be found here:
* https://huggingface.co/gpt2-medium

In [2]:
MODEL_NAME = 'gpt2-medium'

model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [3]:
prompt = "The old bullfighter fell and "
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")

In [52]:
set_seed(41)
prompt_length = len(tokenizer.decode(inputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True))
outputs = model.generate(inputs, max_length=30, do_sample=True, top_k=1000, temperature=1, num_return_sequences=10, 
                         pad_token_id=tokenizer.eos_token_id) #top_p=0.95, 


In [53]:
for i in range(len(outputs)):
    generated = tokenizer.decode(outputs[i])
    len_gen = len(outputs[i])
    generated = generated.replace('\n', ' ') # remove new line characters from generated text
    print(f"ret_seq{i}: {generated} \n    (len(generated) = {len(generated.split())}) \n")
    #print(f"ret_seq{i}: {generated} \n    (len(generated) = {len_gen}) \n")

ret_seq0: The old bullfighter fell and fractured his skull. There was a massive hole in the body of Khall and that week, the "overrun" 
    (len(generated) = 23) 

ret_seq1: The old bullfighter fell and hit the ground, covered in blood, head to feet, surrounded by unconscious rivals: Lord Purnell, renowned for 
    (len(generated) = 22) 

ret_seq2: The old bullfighter fell and the walls of his mask now looked strangely familiar. It might become his weapon again.  That looks like it could 
    (len(generated) = 25) 

ret_seq3: The old bullfighter fell and broke his liver, ate several bruised bones and sustained major injuries to his spine. Maywood reported the incident to authorities but 
    (len(generated) = 26) 

ret_seq4: The old bullfighter fell and wounded another opponent before the fight ended just as the scene was catching fire and smoke billowed in the background. Another bull 
    (len(generated) = 27) 

ret_seq5: The old bullfighter fell and his family got out. Our older dau

In [24]:
outputs[0].shape

torch.Size([50])

In [None]:
# upload my own model to huggingface
# https://huggingface.co/transformers/model_sharing.html
# https://huggingface.co/transformers/model_sharing.html#how-to-share-your-model-with-the-community
