# `nanoGPT`: Shakespeare

## Install / Setup

### Google Colab

```python
from google.colab import drive
drive.mount('/content/drive')
```

### First Time Running

We need to install `wordplay` and setup the Shakespeare dataset

This will need to be ran the first time you are running this notebook.

Following the

```python
!python3 -m pip install wordplay
```

you will need to restart your runtime (Runtime -> Restart runtime)

After this, you should be able to

```python
>>> import ngpt
>>> ngpt.__file__
'/content/nanoGPT/src/ngpt/__init__.py'
```

In [1]:
%%bash

python3 -c 'import wordplay; print(wordplay.__file__)' 2> '/dev/null'

if [[ $? -eq 0 ]]; then
    echo "Has wordplay installed. Nothing to do."
else
    echo "Does not have wordplay installed. Installing..."
    git clone 'https://github.com/saforem2/wordplay'
    python3 wordplay/data/shakespeare_char/prepare.py
    python3 wordplay/data/shakespeare/prepare.py
    python3 -m pip install deepspeed
    python3 -m pip install -e wordplay
fi

/content/wordplay/src/wordplay/__init__.py
Has wordplay installed. Nothing to do.


## Post Install

If installed correctly, you should be able to:

```python
>>> import wordplay
>>> wordplay.__file__
'/path/to/wordplay/src/wordplay/__init__.py'
```

In [2]:
import os
# os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
os.environ['COLORTERM'] = 'truecolor'

In [3]:
%load_ext autoreload
%autoreload 2

import wordplay
from enrich import get_logger
log = get_logger(level='INFO')
#from rich import print
log.info(wordplay.__file__)

INFO:root:/content/wordplay/src/wordplay/__init__.py


[30m[[0m[90m2024-02-06 [0m[90m14:13:55[0m[30m][0m[30m[[0m[1;32mINFO[0m[30m][0m[30m[[0m[3;36m<ipython-input-3-df3b64ce7190>[0m[92m:[0m[30m8[0m[30m][0m[1;93m - [0m[32m/content/wordplay/src/wordplay/[0m[35m__init__.py[0m


## Build Trainer

Explicitly, we:

1. `setup_torch(...)`
2. Build `cfg: DictConfig = get_config(...)`
3. Instnatiate `config: ExperimentConfig = instantiate(cfg)`
4. Build `trainer = Trainer(config)`

In [4]:
import os
import numpy as np
from ezpz import setup
from hydra.utils import instantiate
from wordplay.configs import get_config, PROJECT_ROOT
from wordplay.trainer import Trainer

HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

BACKEND = 'DDP'

rank = setup(
    framework='pytorch',
    backend=BACKEND,
    seed=1234,
)

cfg = get_config(
    [
        'data=shakespeare',
        'model=shakespeare',
        'optimizer=shakespeare',
        'train=shakespeare',
        f'train.backend={BACKEND}',
        'train.compile=false',
        'train.dtype=float16',
        'train.max_iters=5000',
        'train.log_interval=100',
        'train.eval_interval=500',
    ]
)
config = instantiate(cfg)

In [5]:
trainer = Trainer(config)

## Prompt (**prior** to training)

In [6]:
query = "What is an LLM?"
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=128,
    top_k=1,
    display=False
)
log.info(f"['prompt']: '{query}'")
log.info("['response']:\n\n" + fr"{outputs['0']['raw']}")

## Train Model

|  name  |       description            |
|:------:|:----------------------------:|
| `step` | Current training step        |
| `loss` | Loss value                   |
| `dt`   | Time per step (in **ms**)    |
| `sps`  | Samples per second           |
| `mtps` | (million) Tokens per sec     |
| `mfu`  | Model Flops utilization[^1]  |
^legend: #tbl-legend

[^1]: in units of A100 `bfloat16` peak FLOPS

In [7]:
trainer.config.device_type

'cuda'

In [8]:
trainer.train()

  0%|          | 0/5000 [00:00<?, ?it/s]

  2%|▏         | 97/5000 [00:07<02:07, 38.41it/s]

  4%|▍         | 197/5000 [00:10<02:07, 37.72it/s]

  6%|▌         | 297/5000 [00:12<02:05, 37.42it/s]

  8%|▊         | 397/5000 [00:15<02:05, 36.72it/s]

 10%|▉         | 497/5000 [00:18<01:59, 37.68it/s]

 12%|█▏        | 597/5000 [00:25<01:56, 37.74it/s]

 14%|█▍        | 697/5000 [00:28<01:55, 37.24it/s]

 16%|█▌        | 797/5000 [00:30<01:50, 38.11it/s]

 18%|█▊        | 897/5000 [00:33<01:48, 37.77it/s]

 20%|█▉        | 997/5000 [00:35<01:46, 37.50it/s]

 22%|██▏       | 1097/5000 [00:43<01:44, 37.44it/s]

 24%|██▍       | 1197/5000 [00:46<01:40, 37.95it/s]

 26%|██▌       | 1297/5000 [00:48<01:38, 37.46it/s]

 28%|██▊       | 1397/5000 [00:51<01:35, 37.62it/s]

 30%|██▉       | 1497/5000 [00:54<01:32, 38.01it/s]

 32%|███▏      | 1597/5000 [01:01<01:32, 36.86it/s]

 34%|███▍      | 1697/5000 [01:04<01:27, 37.82it/s]

 36%|███▌      | 1797/5000 [01:06<01:25, 37.26it/s]

 38%|███▊      | 1897/5000 [01:09<01:21, 37.84it/s]

 40%|███▉      | 1997/5000 [01:12<01:19, 37.93it/s]

 42%|████▏     | 2097/5000 [01:19<01:18, 37.15it/s]

 44%|████▍     | 2197/5000 [01:22<01:15, 37.35it/s]

 46%|████▌     | 2297/5000 [01:24<01:13, 37.02it/s]

 48%|████▊     | 2397/5000 [01:27<01:08, 37.97it/s]

 50%|████▉     | 2497/5000 [01:30<01:06, 37.58it/s]

 52%|█████▏    | 2597/5000 [01:37<01:04, 37.24it/s]

 54%|█████▍    | 2697/5000 [01:40<01:03, 36.21it/s]

 56%|█████▌    | 2797/5000 [01:43<00:58, 37.63it/s]

 58%|█████▊    | 2897/5000 [01:45<00:57, 36.82it/s]

 60%|█████▉    | 2997/5000 [01:48<00:53, 37.60it/s]

 62%|██████▏   | 3097/5000 [01:55<00:52, 36.48it/s]

 64%|██████▍   | 3197/5000 [01:58<00:47, 37.76it/s]

 66%|██████▌   | 3297/5000 [02:01<00:46, 36.46it/s]

 68%|██████▊   | 3397/5000 [02:03<00:42, 38.04it/s]

 70%|██████▉   | 3497/5000 [02:06<00:40, 37.14it/s]

 72%|███████▏  | 3597/5000 [02:14<00:37, 37.88it/s]

 74%|███████▍  | 3697/5000 [02:16<00:34, 37.50it/s]

 76%|███████▌  | 3797/5000 [02:19<00:31, 38.39it/s]

 78%|███████▊  | 3897/5000 [02:21<00:28, 38.28it/s]

 80%|███████▉  | 3997/5000 [02:24<00:26, 37.21it/s]

 82%|████████▏ | 4097/5000 [02:31<00:23, 38.42it/s]

 84%|████████▍ | 4197/5000 [02:34<00:20, 38.48it/s]

 86%|████████▌ | 4297/5000 [02:37<00:18, 38.22it/s]

 88%|████████▊ | 4397/5000 [02:39<00:15, 38.34it/s]

 90%|████████▉ | 4497/5000 [02:42<00:13, 37.13it/s]

 92%|█████████▏| 4597/5000 [02:49<00:10, 38.94it/s]

 94%|█████████▍| 4697/5000 [02:52<00:07, 38.65it/s]

 96%|█████████▌| 4797/5000 [02:55<00:05, 38.74it/s]

 98%|█████████▊| 4897/5000 [02:57<00:02, 37.93it/s]

100%|█████████▉| 4997/5000 [03:00<00:00, 38.24it/s]

100%|██████████| 5000/5000 [03:00<00:00, 27.72it/s]


## Evaluate Model

In [9]:
import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=2,
    display=False
)
log.info(f'took: {time.perf_counter() - t0:.4f}s')
log.info(f"['prompt']: '{query}'")
log.info("['response']:\n\n" + fr"{outputs['0']['raw']}")