# How to Build the AI that's "Too Dangerous to Release"



Welcome to the companion notebook tutorial for the [OpenAI GPT-2 artcle on floydhub](https://blog.floydhub.com/[ARTICLE NAME]).  Today, you'll create your very own copy of the state of the art language model that drove the internet mad: OpenAI's generative pretrained transformer.

The code we will use is heavily based on huggingface's [`pytorch-pretrained-bert` GitHub repo](https://github.com/huggingface/pytorch-pretrained-BERT). Be sure to check it out, they have top-quality implementations of all the latest and greatest NLP models, as well as fantastic documentation.

One more thing. If you're not running this notebook on FloydHub, [RUN IT ON FLOYDHUB!](https://)

The first step is to download and packages that will help us run the code (this includes the pretrained gpt-2 model).

In [1]:
! pip install pytorch-pretrained-bert
! pip install spacy ftfy==4.4.3
! python -m spacy download en

Collecting pytorch-pretrained-bert
  Downloading pytorch_pretrained_bert-0.6.2-py3-none-any.whl.metadata (86 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/86.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.7/86.7 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
Collecting boto3 (from pytorch-pretrained-bert)
  Downloading boto3-1.35.56-py3-none-any.whl.metadata (6.7 kB)
Collecting botocore<1.36.0,>=1.35.56 (from boto3->pytorch-pretrained-bert)
  Downloading botocore-1.35.56-py3-none-any.whl.metadata (5.7 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3->pytorch-pretrained-bert)
  Downloading jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Collecting s3transfer<0.11.0,>=0.10.0 (from boto3->pytorch-pretrained-bert)
  Downloading s3transfer-0.10.3-py3-none-any.whl.metadata (1.7 kB)
Downloading pytorch_pretrained_bert-0.6.2-py3-none-any.whl (123 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[

Next, we'll import the required packages into python. Here's we're getting a little specific on what sub-modules we're using, but feel free to run something like `from pytorch_pretrained_bert import *` if you're just experimenting.

In [2]:
import torch
from pytorch_pretrained_bert import GPT2Tokenizer, GPT2Model, GPT2LMHeadModel

import argparse
import logging
from tqdm import trange

import torch
import torch.nn.functional as F
import numpy as np

We'll also activate the logger, so that we can see a little more of what's going on apart from the designated outputs. This step is optional.

In [3]:
# OPTIONAL: if you want to have more information on what's happening, activate the logger as follows
import logging
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
                    datefmt = '%m/%d/%Y %H:%M:%S',
                    level = logging.INFO)
logger = logging.getLogger(__name__)

All NLP models need tokens as inputs. Thankfully we don't need to write a tokenizer from scratch, since the  good men and women at huggingface already did that for us! All we have to do is initialize it.

In [4]:
# Load pre-trained model tokenizer (vocabulary)
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

100%|██████████| 1042301/1042301 [00:00<00:00, 3433564.21B/s]
100%|██████████| 456318/456318 [00:00<00:00, 1875053.43B/s]


Of course, that's just the tokenizer. The actual model is whole other animal. Since this tutorial is about GPT-2, we're going to be using a pretrained model (also part of the hugginface package that we installed).

In [23]:
model = GPT2LMHeadModel.from_pretrained('gpt2')

ERROR:pytorch_pretrained_bert.modeling_gpt2:Model name 'gpt2-xl' was not found in model name list (gpt2). We assumed 'gpt2-xl' was a path or url but couldn't find files gpt2-xl/pytorch_model.bin and gpt2-xl/config.json at this path or url.


In [18]:
model.lm_head.decoder.weight.sum()

tensor(14659.9668, grad_fn=<SumBackward0>)

In [21]:
model.lm_head.decoder.reset_parameters()

In [22]:
model.lm_head.decoder.weight.sum()

tensor(103.2848, grad_fn=<SumBackward0>)

We're **not** training GPT-2. That would take ages. Since we're running infreence on a pretrained model instead, PyTorch requires us to specify that with the `.eval()` function.

Also, we'll be executing the tensor operations on FloydHub's GPUs, so we'll call `.cuda()` as well.

In [None]:
model.eval()
model.cuda()

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (h): ModuleList(
      (0): Block(
        (ln_1): BertLayerNorm()
        (attn): Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
        )
        (ln_2): BertLayerNorm()
        (mlp): MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
        )
      )
      (1): Block(
        (ln_1): BertLayerNorm()
        (attn): Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
        )
        (ln_2): BertLayerNorm()
        (mlp): MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
        )
      )
      (2): Block(
        (ln_1): BertLayerNorm()
        (attn): Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
        )
        (ln_2): BertLayerNorm()
        (mlp): MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
        )
      )
      (3): Block(
        (ln_1): BertLayerNorm()
      

This next block is a helper function that help make work predictions. You can ignore reading through it if you feel uncomfortable with the math.

In [None]:
def top_k_logits(logits, k):
    """
    Masks everything but the k top entries as -infinity (1e10).
    Used to mask logits such that e^-infinity -> 0 won't contribute to the
    sum of the denominator.
    """
    if k == 0:
        return logits
    else:
        values = torch.topk(logits, k)[0]
        batch_mins = values[:, -1].view(-1, 1).expand_as(logits)
        return torch.where(logits < batch_mins, torch.ones_like(logits) * -1e10, logits)

Now, this code block is probably the most important in the notebook. We define a sampling function that takes in tokens and spits out predicted tokens.

Essentially, we're calling the forward pass of GPT-2 (stored in `model`), with some additional statistics stuff that helps make sure we can actually predict works.

In [None]:
def sample_sequence(model, length, start_token=None, batch_size=None, context=None, temperature=1, top_k=0, device='cuda', sample=True):
    if start_token is None:
        assert context is not None, 'Specify exactly one of start_token and context!'
        context = torch.tensor(context, device=device, dtype=torch.long).unsqueeze(0).repeat(batch_size, 1)
    else:
        assert context is None, 'Specify exactly one of start_token and context!'
        context = torch.full((batch_size, 1), start_token, device=device, dtype=torch.long)
    prev = context
    output = context
    past = None
    with torch.no_grad():
        for i in trange(length):
            logits, past = model(prev, past=past)
            logits = logits[:, -1, :] / temperature
            logits = top_k_logits(logits, k=top_k)
            log_probs = F.softmax(logits, dim=-1)
            if sample:
                prev = torch.multinomial(log_probs, num_samples=1)
            else:
                _, prev = torch.topk(log_probs, k=1, dim=-1)
            output = torch.cat((output, prev), dim=1)
    return output

And now that we're done with the heavy lifting, the fun begins!

Here are a few numbers that you might like tweaking. They're all pretty self-explanatory.

In [None]:
sequence_length = 100
number_of_samples = 3

We're done! Run this code block to generate some funky AI text.

Note: you'll have to prompt the model with text, and it will continue writing.

Also note: This model won't perform nearly as well as the one on [the OpenAI blogpos](https://openai.com/blog/better-language-models/)t. In case you didn't know, [they didn't release that model](https://thegradient.pub/openai-please-open-source-your-language-model/).

In [None]:
context_tokens = []
raw_text = input("Model prompt >>> ")

while not raw_text:
  print('Prompt should not be empty!')
  raw_text = input("Model prompt >>> ")

context_tokens = tokenizer.encode(raw_text)
generated = 0

for _ in range(number_of_samples):
  out = sample_sequence(
      model=model, length=sequence_length,
      context=context_tokens, batch_size=512
  )
  out = out[:, len(context_tokens):].tolist()
  for i in range(512):
    generated += 1
    text = tokenizer.decode(out[i])
    print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
    print(text)
  print("\n\n"+ "=" * 80)

Model prompt >>> Who is Kyle Rayner? Kyle Rayner is


100%|██████████| 100/100 [00:11<00:00,  6.31it/s]
  0%|          | 0/100 [00:00<?, ?it/s]

 a forty-six year-old Evangelical man living in Arizona. His wife Adriane has three children and spent the past three years (and a half) involved in marriage ceremonies and marriages. As a young man mom, Kyle makes his living blogging about how the Church teaches you, evangelistically. In the article, Kyle complains about being upset with his wife all day long, with giving away all her college scholarships, eating by herself, having sex outside – all while being dominated by a male authority
 a victim of a strange outbreak of part-time illness that has cast the Ashoka Muslim into a permanent discomfiting exile and pursued a different sort of regular life. Both Jason and Sho are all is once again lost to a single narrative in which Kyle is seemingly in immediate danger above all else. The guns from the academy and magical search fund suggest somebody online may have already evacuated the academy. There are new evidence that a confrontation may have ensnared Kyle as a physical threat to 

100%|██████████| 100/100 [00:11<00:00,  6.20it/s]
  0%|          | 0/100 [00:00<?, ?it/s]

 a Mayo Clinic product that bragged about being a kettle snake in 2007. He is a guesswork, but he says he is:

[14]

[15]

[16]

[17]

[18]

[19] In a competition where Eddie Alvarez defeated Tim Kennedy, making that record sixth to the ever Batman-ranked Faber we'd hate to see him go in a loss … not sure how he gets there, maybe his line
 the domain owner, owner, security member and sone. Most people assume that there are 2 inches, 1 inch, and 1 foot of hair in Kyle Rayner's 10 foot wide hand. But, if you look closely at the picture listed below, actions like flying, assembling, teaching John Edmonds and other managers, being an SIS member, being a summer camps volunteer, abusing rashers, and fondling, are not related to Kyle Rayner's behavior at his DC Camp and
 a San Francisco high school football player diagnosed at the age of 18 with an undisclosed medical condition, 3 additions to the list compiled into 54 scholarships over the past six seasons.

In Week 13, it didn't play well, 

100%|██████████| 100/100 [00:11<00:00,  6.15it/s]


 a well-respected NFL quarterback in prep football, a strong offensive line, and a troubled future in the NFL. In the game of football we've seen him help fellow youngsters all over the nation get the help they need. You don't have to be a physical prospect to get his name on the team. In a four-year game, no one is going to look away." -- Eric Pinkston, Ret. Day Contributor


This Puck Daddy Demo Nobody read that too hard
 Cody's buddy in recent Days of Heaven games. He helped create such a devastating Demon Zone artifact in Dead Zone (US have never helped out with the artifact project)

big N 52nd on earth game: rocks a volleyball, level, and various other cool stuff, it you won, baby

NBA 2K19 game: QUAKED ut 5M MM2K18s with "GS drag striker" retarded works

PHOTOS.: Bottom massage scene, proven art creation for Utah
 known for being a staunch sports fan but he knows this would do a great deal to gain the trust of some of the smaller NASCAR teams. After Bryan and Brian make it out o