## Generating Simpsons Episodes with GPT

The model used herein was trained in [this notebook](link)

- This was a good opportunity to experiment with 'repetition_penalty' and 'temperature', both of which have significant impact on the ultimate output.

- This project was a great demonstration of the power of GPT to imitate language style. Even though the Simpsons episodes that this model generates don't necessarily make sense, they do seem to capture an understanding of the Simpsons universe.

Check out some episodes that were generated [here]() or try making your own.

The idea to use titles as a prompt was based on the idea that the output text would relate to the title. Unfortunately, this is not the case. Then again, we could do more epochs and would certainly benefit from significantly more training data. 

In [10]:
#!pip install transformers

In [2]:
from transformers import GPT2Tokenizer, TextDataset, DataCollatorForLanguageModeling, GPT2LMHeadModel, pipeline, \
                         Trainer, TrainingArguments

In [3]:
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

In [4]:
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})

Using pad_token, but it is not set yet.


In [5]:
loaded_model = GPT2LMHeadModel.from_pretrained('caffsean/gpt2-simpsons')

Downloading:   0%|          | 0.00/948 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/510M [00:00<?, ?B/s]

In [6]:
finetuned_generator = pipeline(
    'text-generation', model=loaded_model, tokenizer=tokenizer, return_full_text=False, max_length=400,do_sample=True, top_p= 0.9, temperature=0.75, repetition_penalty=1.015, top_k=10
)

In [7]:
from tqdm import tqdm

def generate_episode(title, loops, pool=5, lookback=-300):
  print(f'Writing episode: {title}')
  episode_title = f'EPISODE TITLE: {title}\n\nEPISODE SUMMARY:'

  chunk_list = episode_title
  
  for loop in tqdm(range(loops)):
    print(f'Loop {loop}/{loops}')
    input = chunk_list[lookback:]
    options = finetuned_generator(input, num_return_sequences=pool)
    max = 0
    max_idx = 0
    for idx, x in enumerate(options):
      b = len(x['generated_text'])
      if b > max:
        max = b
        max_idx = idx

    addition = options[max_idx]['generated_text'] 
    chunk_list += addition

  return ('.').join(chunk_list.split('.')[:-1]) + '.' + '\n\nTHE END \n\n [Roll End Credits]'

def create_episodes_from_title_list(title_list, episode_length):
  for title in tqdm(title_list):
    episode_text = generate_episode(title, episode_length)
    title_path = title.replace(' ','_').lower()
    with open(f'{title_path}.txt', 'w') as f:
      f.write(episode_text)
      print(f'Episode {title} - Saved Successfully!')


In [9]:
title_list = ['Marge is a Robot',
              'Maggie is a Ghost',
              'Seinfeld goes to Springfield',
              "Krusty's Katastrophe",
              'The Simpsons go to New Zealand',
              "Lisa the Republican",
              "Bart Becomes Santa Clause",
              "Homer the Chef",
              "Marge Turns into a Dolphin",
              "Tales of Ancient Springfield"]


create_episodes_from_title_list(title_list, 6)

  0%|          | 0/10 [00:00<?, ?it/s]

Writing episode: Marge is a Robot



  0%|          | 0/6 [00:00<?, ?it/s][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 0/6



 17%|█▋        | 1/6 [00:56<04:43, 56.62s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 1/6



 33%|███▎      | 2/6 [01:41<03:17, 49.49s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 2/6



 50%|█████     | 3/6 [02:27<02:24, 48.02s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 3/6



 67%|██████▋   | 4/6 [03:13<01:34, 47.37s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 4/6



 83%|████████▎ | 5/6 [04:00<00:47, 47.08s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 5/6



100%|██████████| 6/6 [04:46<00:00, 47.76s/it]
 10%|█         | 1/10 [04:46<42:59, 286.59s/it]

Episode Marge is a Robot - Saved Successfully!
Writing episode: Maggie is a Ghost



  0%|          | 0/6 [00:00<?, ?it/s][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 0/6



 17%|█▋        | 1/6 [00:52<04:20, 52.17s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 1/6



 33%|███▎      | 2/6 [01:40<03:18, 49.65s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 2/6



 50%|█████     | 3/6 [02:27<02:25, 48.53s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 3/6



 67%|██████▋   | 4/6 [03:14<01:35, 47.92s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 4/6



 83%|████████▎ | 5/6 [04:00<00:47, 47.31s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 5/6



100%|██████████| 6/6 [04:45<00:00, 47.65s/it]
 20%|██        | 2/10 [09:32<38:09, 286.20s/it]

Episode Maggie is a Ghost - Saved Successfully!
Writing episode: Seinfeld goes to Springfield



  0%|          | 0/6 [00:00<?, ?it/s][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 0/6



 17%|█▋        | 1/6 [00:53<04:25, 53.06s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 1/6



 33%|███▎      | 2/6 [01:40<03:19, 49.83s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 2/6



 50%|█████     | 3/6 [02:28<02:26, 48.71s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 3/6



 67%|██████▋   | 4/6 [03:14<01:36, 48.00s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 4/6



 83%|████████▎ | 5/6 [04:01<00:47, 47.33s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 5/6



100%|██████████| 6/6 [04:48<00:00, 48.07s/it]
 30%|███       | 3/10 [14:20<33:30, 287.21s/it]

Episode Seinfeld goes to Springfield - Saved Successfully!
Writing episode: Krusty's Katastrophe



  0%|          | 0/6 [00:00<?, ?it/s][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 0/6



 17%|█▋        | 1/6 [00:51<04:19, 51.93s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 1/6



 33%|███▎      | 2/6 [01:36<03:09, 47.44s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 2/6



 50%|█████     | 3/6 [02:20<02:18, 46.04s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 3/6



 67%|██████▋   | 4/6 [03:06<01:32, 46.00s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 4/6



 83%|████████▎ | 5/6 [03:52<00:45, 45.84s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 5/6



100%|██████████| 6/6 [04:38<00:00, 46.36s/it]
 40%|████      | 4/10 [18:59<28:21, 283.64s/it]

Episode Krusty's Katastrophe - Saved Successfully!
Writing episode: The Simpsons go to New Zealand



  0%|          | 0/6 [00:00<?, ?it/s][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 0/6



 17%|█▋        | 1/6 [00:51<04:17, 51.55s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 1/6



 33%|███▎      | 2/6 [01:37<03:12, 48.05s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 2/6



 50%|█████     | 3/6 [02:23<02:22, 47.35s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 3/6



 67%|██████▋   | 4/6 [03:08<01:32, 46.43s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 4/6



 83%|████████▎ | 5/6 [03:54<00:46, 46.21s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 5/6



100%|██████████| 6/6 [04:40<00:00, 46.81s/it]
 50%|█████     | 5/10 [23:39<23:33, 282.64s/it]

Episode The Simpsons go to New Zealand - Saved Successfully!
Writing episode: Lisa the Republican



  0%|          | 0/6 [00:00<?, ?it/s][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 0/6



 17%|█▋        | 1/6 [00:51<04:18, 51.80s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 1/6



 33%|███▎      | 2/6 [01:38<03:15, 48.95s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 2/6



 50%|█████     | 3/6 [02:24<02:22, 47.46s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 3/6



 67%|██████▋   | 4/6 [03:10<01:33, 46.97s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 4/6



 83%|████████▎ | 5/6 [03:56<00:46, 46.74s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 5/6



100%|██████████| 6/6 [04:43<00:00, 47.31s/it]
 60%|██████    | 6/10 [28:23<18:52, 283.06s/it]

Episode Lisa the Republican - Saved Successfully!
Writing episode: Bart Becomes Santa Clause



  0%|          | 0/6 [00:00<?, ?it/s][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 0/6



 17%|█▋        | 1/6 [00:51<04:16, 51.39s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 1/6



 33%|███▎      | 2/6 [01:36<03:10, 47.62s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 2/6



 50%|█████     | 3/6 [02:19<02:16, 45.64s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 3/6



 67%|██████▋   | 4/6 [03:05<01:31, 45.76s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 4/6



 83%|████████▎ | 5/6 [03:51<00:45, 45.93s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 5/6



100%|██████████| 6/6 [04:37<00:00, 46.24s/it]
 70%|███████   | 7/10 [33:01<14:03, 281.24s/it]

Episode Bart Becomes Santa Clause - Saved Successfully!
Writing episode: Homer the Chef



  0%|          | 0/6 [00:00<?, ?it/s][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 0/6



 17%|█▋        | 1/6 [00:51<04:16, 51.35s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 1/6



 33%|███▎      | 2/6 [01:36<03:10, 47.68s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 2/6



 50%|█████     | 3/6 [02:23<02:21, 47.31s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 3/6



 67%|██████▋   | 4/6 [03:09<01:33, 46.70s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 4/6



 83%|████████▎ | 5/6 [03:53<00:45, 45.96s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 5/6



100%|██████████| 6/6 [04:38<00:00, 46.45s/it]
 80%|████████  | 8/10 [37:40<09:20, 280.43s/it]

Episode Homer the Chef - Saved Successfully!
Writing episode: Marge Turns into a Dolphin



  0%|          | 0/6 [00:00<?, ?it/s][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 0/6



 17%|█▋        | 1/6 [00:51<04:17, 51.51s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 1/6



 33%|███▎      | 2/6 [01:35<03:09, 47.31s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 2/6



 50%|█████     | 3/6 [02:19<02:17, 45.69s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 3/6



 67%|██████▋   | 4/6 [03:03<01:29, 44.88s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 4/6



 83%|████████▎ | 5/6 [03:46<00:44, 44.44s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 5/6



100%|██████████| 6/6 [04:30<00:00, 45.05s/it]
 90%|█████████ | 9/10 [42:10<04:37, 277.27s/it]

Episode Marge Turns into a Dolphin - Saved Successfully!
Writing episode: Tales of Ancient Springfield



  0%|          | 0/6 [00:00<?, ?it/s][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 0/6



 17%|█▋        | 1/6 [00:52<04:20, 52.20s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 1/6



 33%|███▎      | 2/6 [01:37<03:11, 47.96s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 2/6



 50%|█████     | 3/6 [02:22<02:20, 46.75s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 3/6



 67%|██████▋   | 4/6 [03:08<01:32, 46.41s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 4/6



 83%|████████▎ | 5/6 [03:54<00:46, 46.23s/it][ASetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Loop 5/6



100%|██████████| 6/6 [04:39<00:00, 46.61s/it]
100%|██████████| 10/10 [46:50<00:00, 281.00s/it]

Episode Tales of Ancient Springfield - Saved Successfully!



