## Train a character-level GPT on some text data

The inputs here are simple text files, which we chop up to individual characters and then train GPT on. So you could say this is a char-transformer instead of a char-rnn. Doesn't quite roll off the tongue as well. In this example we will feed it some shakespear, which we'll get it to predict character-level.

In [1]:
# set up logging
import logging
logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO,
)

In [2]:
# make deterministic
from mingpt.utils import set_seed
set_seed(42)

In [3]:
import numpy as np
import torch
import torch.nn as nn
from torch.nn import functional as F

In [4]:
torch.set_num_threads(4)

In [5]:
import math
from torch.utils.data import Dataset

class CharDataset(Dataset):

    def __init__(self, data, block_size):
        chars = list(set(data))
        data_size, vocab_size = len(data), len(chars)
        print('data has %d characters, %d unique.' % (data_size, vocab_size))
        
        self.stoi = { ch:i for i,ch in enumerate(chars) }
        self.itos = { i:ch for i,ch in enumerate(chars) }
        self.block_size = block_size
        self.vocab_size = vocab_size
        self.data = data
    
    def __len__(self):
        return math.ceil(len(self.data) / (self.block_size + 1))

    def __getitem__(self, idx):
        # we're actually going to "cheat" and pick a spot in the dataset at random
        i = np.random.randint(0, len(self.data) - (self.block_size + 1))
        chunk = self.data[i:i+self.block_size+1]
        dix = [self.stoi[s] for s in chunk]
        x = torch.tensor(dix[:-1], dtype=torch.long)
        y = torch.tensor(dix[1:], dtype=torch.long)
        return x, y


In [6]:
block_size = 128 # spatial extent of the model for its context

In [7]:
# you can download this file at https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt
text = open('input.txt', 'r').read() # don't worry we won't run out of file handles
train_dataset = CharDataset(text, block_size) # one line of poem is roughly 50 characters

data has 1115394 characters, 65 unique.


In [8]:
from mingpt.model_delight import DelightConfig, DelightModel
mconf = DelightConfig(
    train_dataset.vocab_size, 
    train_dataset.block_size,
    n_layer=12, n_head=1, 
    n_embd=512
)
model = DelightModel(mconf)

08/22/2020 04:22:06 - INFO - mingpt.model_delight -   number of parameters: 1.974738e+07


In [9]:
print(model)

DelightModel(
  (tok_emb): Embedding(65, 512)
  (drop): Dropout(p=0.1, inplace=False)
  (blocks): Sequential(
    (0): DelightBlock(
      (dextra_layer): DExTraUnit(
        (input_layer): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (2): GELU()
          (3): Dropout(p=0.0, inplace=False)
        )
        (dextra_layers): ModuleList(
          (0): GroupLinear(
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
            (dropout): Dropout(p=0.0, inplace=False)
            (act): GELU()
          )
          (1): GroupLinear(
            (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (dropout): Dropout(p=0.0, inplace=False)
            (act): GELU()
          )
          (2): GroupLinear(
            (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (dropout): Dropout(p=0.0, inplace=False)
      

In [10]:
from mingpt.trainer import Trainer, TrainerConfig

# initialize a trainer instance and kick off training
tconf = TrainerConfig(max_epochs=200, batch_size=128, learning_rate=3e-4,
                      weight_decay=0, betas=(0.9, 0.98), grad_norm_clip=0.,
                      lr_decay=True, warmup_tokens=512*20, final_tokens=200*len(train_dataset)*block_size,
                      num_workers=4)
trainer = Trainer(model, train_dataset, None, tconf)
trainer.train()

epoch 1 iter 67: train loss 2.50245. lr 2.999818e-04: 100%|██████████| 68/68 [1:32:57<00:00, 82.02s/it]
epoch 2 iter 67: train loss 2.39625. lr 2.999267e-04: 100%|██████████| 68/68 [1:33:22<00:00, 82.38s/it]
epoch 3 iter 64: train loss 2.31426. lr 2.998386e-04:  96%|█████████▌| 65/68 [1:32:57<04:42, 94.10s/it]

KeyboardInterrupt: 

In [11]:
# alright, let's sample some character-level shakespear
from mingpt.utils import sample

context = "O God, O God!"
x = torch.tensor([train_dataset.stoi[s] for s in context], dtype=torch.long)[None,...].to(trainer.device)
y = sample(model, x, 2000, temperature=0.9, sample=True, top_k=5)[0]
completion = ''.join([train_dataset.itos[int(i)] for i in y])
print(completion)

O God, O God! I my bar stond thath sthous, ten mon man theresthis.


LIUCHES:


And bothen me wo she hare shonge than sh bere mar thing.


CANAUS:
I, t hanor t we hand sorar m aneat thit tit bean ar stint ten and theall
Wiss s brothor tirersthe t se this ther toond tis har tind at ar monom t shoff tor bere beat be s oorour. tint.

WIOLO:
I bowhin wheld buth s manont m mar tof thand tir be tinghes be se thofrsthot.

SSAULO:
I bowin t wind, we tear when howell thend tingr beeng serom,
An t tiorind bun thant thir thind te t stooures m atong t soreris,
Wind thal blis borite bare st aint athest, bountin t torerst tis anot.


SCHANS:
I, soul wallll,-wert mer wit blaing te and ar tore st ofon ar and
And storent mer maim bene and bend athand sin seandeng teeas,
Whor bend thow bet sthing bere torister hest seand,
Itere that stont seen thours tors meere ars siound beant

Th sthare wer he st hathe the meald tis seast antir ben be siseat t ond.


CHERORDY:
A blouterd t his her thit thou s toul tom

In [None]:
# well that was fun