# Requirements

In [None]:
! pip install torch==1.6.0

## Train a character-level GPT on some text data

The inputs here are simple text files, which we chop up to individual characters and then train GPT on. So you could say this is a char-transformer instead of a char-rnn. Doesn't quite roll off the tongue as well. In this example we will feed it some shakespear, which we'll get it to predict character-level.

In [1]:
# set up logging
import logging
logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO,
)
minGPT_coef = 8
minGPT_coef = min(max(minGPT_coef, 1), 8)

In [2]:
# make deterministic
from mingpt.utils import set_seed
set_seed(42)

In [3]:
import numpy as np
import torch
import torch.nn as nn
from torch.nn import functional as F

In [4]:
import math
from torch.utils.data import Dataset

class CharDataset(Dataset):

    def __init__(self, data, block_size):
        chars = list(set(data))
        data_size, vocab_size = len(data), len(chars)
        print('data has %d characters, %d unique.' % (data_size, vocab_size))
        
        self.stoi = { ch:i for i,ch in enumerate(chars) }
        self.itos = { i:ch for i,ch in enumerate(chars) }
        self.block_size = block_size
        self.vocab_size = vocab_size
        self.data = data
    
    def __len__(self):
        return math.ceil(len(self.data) / (self.block_size + 1))

    def __getitem__(self, idx):
        # we're actually going to "cheat" and pick a spot in the dataset at random
        i = np.random.randint(0, len(self.data) - (self.block_size + 1))
        chunk = self.data[i:i+self.block_size+1]
        dix = [self.stoi[s] for s in chunk]
        x = torch.tensor(dix[:-1], dtype=torch.long)
        y = torch.tensor(dix[1:], dtype=torch.long)
        return x, y


In [5]:
block_size = 128//minGPT_coef # spatial extent of the model for its context

In [6]:
# you can download this file at https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt
text = open('input.txt', 'r').read() # don't worry we won't run out of file handles
train_dataset = CharDataset(text, block_size) # one line of poem is roughly 50 characters

data has 1115394 characters, 65 unique.


In [7]:
from mingpt.model import GPT, GPTConfig
mconf = GPTConfig(train_dataset.vocab_size, train_dataset.block_size,
                  n_layer=8//minGPT_coef, n_head=8//minGPT_coef, n_embd=512//minGPT_coef)
model = GPT(
    vocab_size=mconf.vocab_size, 
    block_size=block_size, 
    n_embd=mconf.n_embd, 
    n_layer=mconf.n_layer,
    n_head=mconf.n_head
)

08/20/2020 23:51:29 - INFO - mingpt.model -   number of parameters: 5.945600e+04


In [8]:
from mingpt.trainer import Trainer, TrainerConfig

# initialize a trainer instance and kick off training
tconf = TrainerConfig(max_epochs=10, batch_size=512, learning_rate=6e-4,
                      lr_decay=True, warmup_tokens=512*20, final_tokens=200*len(train_dataset)*block_size,
                      num_workers=4)
trainer = Trainer(model, train_dataset, None, tconf)
trainer.train()

epoch 1 iter 128: train loss 2.60572. lr 5.999637e-04: 100%|██████████| 129/129 [00:16<00:00,  7.71it/s]
epoch 2 iter 128: train loss 2.41788. lr 5.998534e-04: 100%|██████████| 129/129 [00:18<00:00,  7.16it/s]
epoch 3 iter 128: train loss 2.34223. lr 5.996691e-04: 100%|██████████| 129/129 [00:18<00:00,  6.98it/s]
epoch 4 iter 128: train loss 2.27679. lr 5.994108e-04: 100%|██████████| 129/129 [00:16<00:00,  7.82it/s]
epoch 5 iter 128: train loss 2.24310. lr 5.990787e-04: 100%|██████████| 129/129 [00:16<00:00,  7.83it/s]
epoch 6 iter 128: train loss 2.24525. lr 5.986728e-04: 100%|██████████| 129/129 [00:16<00:00,  7.97it/s]
epoch 7 iter 128: train loss 2.21830. lr 5.981932e-04: 100%|██████████| 129/129 [00:18<00:00,  7.00it/s]
epoch 8 iter 128: train loss 2.17025. lr 5.976399e-04: 100%|██████████| 129/129 [00:17<00:00,  7.55it/s]
epoch 9 iter 128: train loss 2.19934. lr 5.970133e-04: 100%|██████████| 129/129 [00:22<00:00,  5.81it/s]
epoch 10 iter 128: train loss 2.15271. lr 5.963133e-04:

In [9]:
# alright, let's sample some character-level shakespear
from mingpt.utils import sample

context = "O God, O God!"
x = torch.tensor([train_dataset.stoi[s] for s in context], dtype=torch.long)[None,...].to(trainer.device)
y = sample(model, x, 2000, temperature=0.9, sample=True, top_k=5)[0]
completion = ''.join([train_dataset.itos[int(i)] for i in y])
print(completion)

O God, O God! Cant of a hing to tand that shy live to mosst my sitesel seet mare wind and ther shold,
Tith his the and and thou to the would the witentere trunce me stay, the we the sely my seare withe wor mestay.

Marce.

COMINTENTES:
As alt werstice heare, thin selvioughs say my to sir shat,
Whather ang my the hereace she herst a mor hat shoul sere to me and to me seay arthe and my les arient, wore thy, thost,
Whath sin a with say ther'd,
When to wan that that that to herem ther thilll thear him artince ancone and, me man,
Whathou desen may.

KING RESTO:
How'll and
A heard have with that me,
And ther and this the and he here have wer sel and she how stre tiones thing me me wom thy my the so din somon:
A then and the my that the have wom sher'st thee with won say thou me think that and say, murdentlen mentent man:
The monted, my sought, there me with say hou din and
The me the me,
That ath his to have thou that to day,
Wher thour hearther mone man:
Harth, man, shing alll to mary have 

In [None]:
# well that was fun