<a href="https://colab.research.google.com/github/forhayley/minGPT/blob/master/Andrej_Karpathy's_minGPT_A_PyTorch_re_implementation_of_GPT_training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!git clone https://github.com/karpathy/minGPT.git

Cloning into 'minGPT'...
remote: Enumerating objects: 12, done.[K
remote: Counting objects:   8% (1/12)[Kremote: Counting objects:  16% (2/12)[Kremote: Counting objects:  25% (3/12)[Kremote: Counting objects:  33% (4/12)[Kremote: Counting objects:  41% (5/12)[Kremote: Counting objects:  50% (6/12)[Kremote: Counting objects:  58% (7/12)[Kremote: Counting objects:  66% (8/12)[Kremote: Counting objects:  75% (9/12)[Kremote: Counting objects:  83% (10/12)[Kremote: Counting objects:  91% (11/12)[Kremote: Counting objects: 100% (12/12)[Kremote: Counting objects: 100% (12/12), done.[K
remote: Compressing objects:  10% (1/10)[Kremote: Compressing objects:  20% (2/10)[Kremote: Compressing objects:  30% (3/10)[Kremote: Compressing objects:  40% (4/10)[Kremote: Compressing objects:  50% (5/10)[Kremote: Compressing objects:  60% (6/10)[Kremote: Compressing objects:  70% (7/10)[Kremote: Compressing objects:  80% (8/10)[Kremote: Compressing objects:  90% (9/

In [None]:
!pwd

/content/minGPT


In [None]:
%cd minGPT

/content/minGPT


In [None]:
# set up logging
import logging
logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO,
)


In [None]:
# make deterministic
from mingpt.utils import set_seed
set_seed(42)



In [None]:
import numpy as np
import torch
import torch.nn as nn
from torch.nn import functional as F

In [None]:
import math
from torch.utils.data import Dataset

class CharDataset(Dataset):

    def __init__(self, data, block_size):
        chars = list(set(data))
        data_size, vocab_size = len(data), len(chars)
        print('data has %d characters, %d unique.' % (data_size, vocab_size))
        
        self.stoi = { ch:i for i,ch in enumerate(chars) }
        self.itos = { i:ch for i,ch in enumerate(chars) }
        self.block_size = block_size
        self.vocab_size = vocab_size
        self.data = data
    
    def __len__(self):
        return math.ceil(len(self.data) / (self.block_size + 1))

    def __getitem__(self, idx):
        # we're actually going to "cheat" and pick a spot in the dataset at random
        i = np.random.randint(0, len(self.data) - (self.block_size + 1))
        chunk = self.data[i:i+self.block_size+1]
        dix = [self.stoi[s] for s in chunk]
        x = torch.tensor(dix[:-1], dtype=torch.long)
        y = torch.tensor(dix[1:], dtype=torch.long)
        return x, y

In [None]:
block_size = 128 # spatial extent of the model for its context

In [None]:
!wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt

--2020-08-17 20:35:13--  https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1115394 (1.1M) [text/plain]
Saving to: ‘input.txt’


2020-08-17 20:35:13 (12.5 MB/s) - ‘input.txt’ saved [1115394/1115394]



In [None]:


# you can download this file at https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt
text = open('input.txt', 'r').read() # don't worry we won't run out of file handles
train_dataset = CharDataset(text, block_size) # one line of poem is roughly 50 characters



data has 1115394 characters, 65 unique.


In [None]:
# Initializing GPT

In [None]:
from mingpt.model import GPT, GPTConfig
mconf = GPTConfig(train_dataset.vocab_size, train_dataset.block_size,
                  n_layer=8, n_head=8, n_embd=512)
model = GPT(mconf)

08/17/2020 20:35:37 - INFO - mingpt.model -   number of parameters: 2.535219e+07


In [None]:
import gc
gc.collect() 

98

In [None]:
from mingpt.trainer import Trainer, TrainerConfig

# initialize a trainer instance and kick off training
tconf = TrainerConfig(max_epochs=3, batch_size=256, learning_rate=6e-4,
                      lr_decay=True, warmup_tokens=512*20, final_tokens=200*len(train_dataset)*block_size,
                      num_workers=4)
trainer = Trainer(model, train_dataset, None, tconf)
trainer.train()












  0%|          | 0/34 [00:00<?, ?it/s][A[A[A[A[A[A[A[A[A[A[A










epoch 1 iter 0: train loss 2.36343. lr 6.000000e-04:   0%|          | 0/34 [00:01<?, ?it/s][A[A[A[A[A[A[A[A[A[A[A










epoch 1 iter 0: train loss 2.36343. lr 6.000000e-04:   3%|▎         | 1/34 [00:01<01:03,  1.93s/it][A[A[A[A[A[A[A[A[A[A[A










epoch 1 iter 1: train loss 2.74536. lr 5.999999e-04:   3%|▎         | 1/34 [00:03<01:03,  1.93s/it][A[A[A[A[A[A[A[A[A[A[A










epoch 1 iter 1: train loss 2.74536. lr 5.999999e-04:   6%|▌         | 2/34 [00:03<01:01,  1.92s/it][A[A[A[A[A[A[A[A[A[A[A










epoch 1 iter 2: train loss 2.54288. lr 5.999998e-04:   6%|▌         | 2/34 [00:05<01:01,  1.92s/it][A[A[A[A[A[A[A[A[A[A[A










epoch 1 iter 2: train loss 2.54288. lr 5.999998e-04:   9%|▉         | 3/34 [00:05<00:59,  1.91s/it][A[A[A[A[A[A[A[A[A[A[A










epoch 1 iter 3: train loss 2.46885. lr 5.999996e-04:   9%

KeyboardInterrupt: ignored

In [None]:
# alright, let's sample some character-level shakespear
from mingpt.utils import sample

context = "O God!"
x = torch.tensor([train_dataset.stoi[s] for s in context], dtype=torch.long)[None,...].to(trainer.device)
y = sample(model, x, 2000, temperature=0.9, sample=True, top_k=5)[0]
completion = ''.join([train_dataset.itos[int(i)] for i in y])
print(completion)

O God!

PorllLERIUS:
Hest , Go sLinZuly Unoax tbe touBu breknoas ?
Butere vith : hathithurd, So aUWhoRourd 's dLe Rarx,
Tot the ?



BEONES:
WI qu te whar blXo al t he.

use want hat sDICELARUS:
Anere t tis Ye I veas Qe s hashe 3 hoouthe, quritju me hastou tWhite
And ; RVI $inJune I ze.

I po &CIwhotep thare blds torarathiWhard, is allken t tinCoou$in at F t mnoun qughengese
I a? thenAn& she coBullxt,
I hat !

WhathendenxthYo fowe stherd seaYo me hje the the:
Whe a3 ?


Poto!
ADIS:
AYaJUSAnd UKI:
Tou & teQu Vat ge mahat ast bllJureat meusoPouse xcosthone thearVin.




PARENCETANCETE:
Whers heno qusthe thou heran h a& lisk heTou a.



BUTIO:
IVoVoug me riI Vou RIO:
Te $on at Gors masthere hin Zh -A3
I reate hinknthe I se shof an atend hend these?


nown xd to senp?


gere; anwn : tszit OfzequriBerdss meSMand axe hith,
Thes hanenghinenge ble whin toBoussen $eQar st asente ant s ofurendEand,
Thi's Ore hangen and t ske at Whal tomthill s with thoure and;
And thend al
Town tf hatodf h s wnd