# `nanoGPT`

## Install / Setup

### First Time Running

We need to install `ngpt` and setup the Shakespeare dataset

This will need to be ran the first time you are running this notebook.

Following the

```python
!python3 -m pip install nanoGPT
```

you will need to restart your runtime (Runtime -> Restart runtime)

After this, you should be able to

```python
>>> import ngpt
>>> ngpt.__file__
'/content/nanoGPT/src/ngpt/__init__.py'
```

In [1]:
%%bash

[ $(python3 -c 'import ngpt; print(ngpt.__file__)') ] && export HAS_NGPT=1 || export HAS_NGPT=0
[ $HAS_NGPT ] || git clone https://github.com/saforem2/nanoGPT;
[ $HAS_NGPT ] || python3 nanoGPT/data/shakespeare_char/prepare.py
[ $HAS_NGPT ] || python3 -m pip install -e nanoGPT -vvv

## Post Install

If installed correctly, you should be able to:

```python
>>> import ngpt
>>> ngpt.__file__
'/path/to/nanoGPT/src/ngpt/__init__.py'
```

In [2]:
%load_ext autoreload
%autoreload 2

import ngpt
from rich import print
print(ngpt.__file__)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Build Trainer

Explicitly, we:

1. `setup_torch(...)`
2. Build `cfg: DictConfig = get_config(...)`
3. Instnatiate `config: ExperimentConfig = instantiate(cfg)`
4. Build `trainer = Trainer(config)`

In [3]:
import os
from ezpz import setup_torch
from hydra.utils import instantiate
from ngpt.configs import get_config
from ngpt.trainer import Trainer

os.environ['MASTER_PORT'] = '4235'
rank = setup_torch('DDP', seed=1234)
cfg = get_config(
    [
        'data=owt',              # open web text
        'model=gpt2',            # gpt2 arch.
        'optimizer=gpt2',
        'train=gpt2',
        'train.init_from=gpt2',  # init from GPT2
        'train.max_iters=1000',
        'train.dtype=bfloat16',
    ]
)
config = instantiate(cfg)
trainer = Trainer(config)

--------------------------------------------------------------------------

  Local host:   thetagpu24
  Local device: mlx5_0
--------------------------------------------------------------------------
RANK: 0 / 0


[38;2;131;131;131m[2023-11-10 15:26:45][0m[34m[INFO][0m[38;2;119;119;119m[configs.py:264[0m[38;2;119;119;119m][0m - Rescaling GAS -> GAS [32m/[0m[32m/[0m WORLD_SIZE = [35m1[0m [32m/[0m[32m/[0m [35m1[0m
[38;2;131;131;131m[2023-11-10 15:26:45][0m[34m[INFO][0m[38;2;119;119;119m[configs.py:399[0m[38;2;119;119;119m][0m - Tokens per iteration: [35m12[0m,[35m288[0m
[38;2;131;131;131m[2023-11-10 15:26:45][0m[34m[INFO][0m[38;2;119;119;119m[configs.py:431[0m[38;2;119;119;119m][0m - Using [1m<[0m[1;95mtorch.amp.autocast_mode.autocast[0m[39m object at [0m[35m0x7fd7f13aee00[0m[1m>[0m
[38;2;131;131;131m[2023-11-10 15:26:45][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:184[0m[38;2;119;119;119m][0m - Initializing from OpenAI GPT-[35m2[0m Weights: gpt2


2023-11-10 15:26:45.538583: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


[2023-11-10 15:26:49,447] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[38;2;131;131;131m[2023-11-10 15:26:50][0m[34m[INFO][0m[38;2;119;119;119m[model.py:225[0m[38;2;119;119;119m][0m - loading weights from pretrained gpt: gpt2
[38;2;131;131;131m[2023-11-10 15:26:50][0m[34m[INFO][0m[38;2;119;119;119m[model.py:234[0m[38;2;119;119;119m][0m - forcing [3;94mvocab_size[0m=[35m50257[0m, [3;94mblock_size[0m=[35m1024[0m, [3;94mbias[0m=[3;92mTrue[0m
[38;2;131;131;131m[2023-11-10 15:26:50][0m[34m[INFO][0m[38;2;119;119;119m[model.py:240[0m[38;2;119;119;119m][0m - overriding dropout rate to [35m0.0[0m
[38;2;131;131;131m[2023-11-10 15:26:52][0m[34m[INFO][0m[38;2;119;119;119m[model.py:160[0m[38;2;119;119;119m][0m - number of parameters: [35m123.[0m65M
[38;2;131;131;131m[2023-11-10 15:26:56][0m[34m[INFO][0m[38;2;119;119;119m[model.py:290[0m[38;2;119;119;119m][0m - num decayed parameter tensors: [3

Process ForkProcess-11:
Process ForkProcess-15:
Process ForkProcess-12:
Process ForkProcess-8:
Process ForkProcess-22:
Process ForkProcess-29:
Process ForkProcess-3:
Process ForkProcess-28:
Process ForkProcess-7:
Process ForkProcess-13:
Process ForkProcess-17:
Process ForkProcess-2:
Process ForkProcess-10:
Process ForkProcess-24:
Process ForkProcess-30:
Process ForkProcess-25:
Process ForkProcess-6:
Process ForkProcess-1:
Process ForkProcess-16:
Process ForkProcess-20:
Process ForkProcess-9:
Process ForkProcess-26:
Process ForkProcess-27:
Process ForkProcess-32:
Process ForkProcess-4:
Process ForkProcess-23:
Process ForkProcess-18:
Process ForkProcess-14:
Process ForkProcess-19:
Process ForkProcess-21:
Process ForkProcess-5:
Process ForkProcess-31:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/lus/theta-fs0/software/thetagpu/conda/2023-01-11/mconda3/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
  

## Train Model

Legend:

<div style="text-align:left;">

|  **NAME**  |     **DESCRIPTION**          |
|:----------:|:----------------------------:|
|   `step`   | Current training step        |
|   `loss`   | Loss value                   |
|   `dt`     | Time per step (in **ms**)    |
|   `sps`    | Samples per second           |
|   `mtps`   | (million) Tokens per sec     |
|   `mfu`    | Model Flops Utilization*     |

*in units of A100 `bfloat16` peak FLOPS

</div>

In [4]:
trainer.train()

  0%|          | 0/1000 [00:00<?, ?it/s]

[38;2;131;131;131m[2023-11-10 15:29:47][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:540[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m100[0m [3;94mloss[0m=[35m3[0m[35m.218[0m [3;94mdt[0m=[35m123[0m[35m.690[0m [3;94msps[0m=[35m8[0m[35m.085[0m [3;94mmtps[0m=[35m0[0m[35m.099[0m [3;94mmfu[0m=[35m27[0m[35m.230[0m [3;94mtrain_loss[0m=[35m3[0m[35m.094[0m [3;94mval_loss[0m=[35m3[0m[35m.108[0m
[38;2;131;131;131m[2023-11-10 15:29:59][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:540[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m200[0m [3;94mloss[0m=[35m2[0m[35m.946[0m [3;94mdt[0m=[35m123[0m[35m.632[0m [3;94msps[0m=[35m8[0m[35m.089[0m [3;94mmtps[0m=[35m0[0m[35m.099[0m [3;94mmfu[0m=[35m27[0m[35m.231[0m [3;94mtrain_loss[0m=[35m3[0m[35m.094[0m [3;94mval_loss[0m=[35m3[0m[35m.108[0m
[38;2;131;131;131m[2023-11-10 15:30:12][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:540[0m[38;2;119;119;119m

## Evaluate Model

In [13]:
query = "What is a supercomputer?"
outputs = trainer.evaluate(query, num_samples=1, display=False)

In [18]:
from rich.text import Text
from enrich.console import get_console
console = get_console()

console.print(fr'\[prompt]: "{query}"')
console.print("\[response]:\n\n" + fr"{outputs['0']['raw']}")