# `nanoGPT`: GPT-2 Small (150M Params)

## Install / Setup

### Google Colab

1. Mount Google Drive

    ```python
    from google.colab import drive
    drive.mount('/content/drive')
    ```

2. `bfloat16` doesn't work on Colab, switch to `float16`

3. To prepare data
    ```bash
    !python3 drive/MyDrive/Google-Colab/nanoGPT/data/prepare.py yelp_review_full
    ```

### First Time Running

We need to install `ngpt` and setup the Shakespeare dataset

This will need to be ran the first time you are running this notebook.

Following the

```python
!python3 -m pip install nanoGPT
```

you will need to restart your runtime (Runtime -> Restart runtime)

After this, you should be able to

```python
>>> import ngpt
>>> ngpt.__file__
'/content/nanoGPT/src/ngpt/__init__.py'
```

In [1]:
%%bash

#cd drive/MyDrive/Google-Colab/
echo "pwd: $(pwd)"

HF_DATASETS_CACHE="./.cache/huggingface"
mkdir -p "${HF_DATASETS_CACHE}"

python3 -c 'import ngpt; print(ngpt.__file__)' 2> '/dev/null'
STATUS=$?

if [[ $STATUS -eq 0 ]]; then
    echo "Has ngpt installed. Nothing to do."
else
    echo "Does not have ngpt installed. Installing..."
    git clone 'https://github.com/saforem2/nanoGPT'
    python3 -m pip install -e nanoGPT
fi

#python3 nanoGPT/data/prepare.py yelp_review_full

#cd -

pwd: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/notebooks
/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/__init__.py
Has ngpt installed. Nothing to do.


## Post Install

If installed correctly, you should be able to:

```python
>>> import ngpt
>>> ngpt.__file__
'/path/to/nanoGPT/src/ngpt/__init__.py'
```

In [2]:
%load_ext autoreload
%autoreload 2

import ngpt
from enrich import get_logger
log = get_logger('jupyter')
log.info(ngpt.__file__)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
[38;2;131;131;131m[2023-11-30 07:10:16][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119m3434626787.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m7[0m[38;2;119;119;119m][0m - [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/[0m[35m__init__.py[0m


## Build Trainer

Explicitly, we:

1. `setup_torch(...)`
2. Build `cfg: DictConfig = get_config(...)`
3. Instnatiate `config: ExperimentConfig = instantiate(cfg)`
4. Build `trainer = Trainer(config)`

In [3]:
#import os
#os.listdir('../data/yelp_review_full/')

In [4]:
import os
import numpy as np
from ezpz import setup_torch
from hydra.utils import instantiate
from ngpt.configs import get_config, PROJECT_ROOT
from ngpt.trainer import Trainer
from enrich.console import get_console

console = get_console()
HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['MASTER_PORT'] = '5278'
os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

SEED = np.random.randint(2**32)
console.log(f'SEED: {SEED}')

rank = setup_torch('DDP', seed=1234)
cfg = get_config(
    [
        'data=yelp',
        #'data.dataset=yelp_review_full',
        #'data.root_path=../data/yelp_review_full/',
        #'data.out_dir=output_yelp_review_full',
        'model=gpt2_medium',
        'optimizer=gpt2_medium',
        'train=gpt2_medium',
        'train.dtype=bfloat16',
        'train.max_iters=1000',
        'train.log_interval=100',
        'train.init_from=gpt2-medium',
    ]
)
config = instantiate(cfg)
trainer = Trainer(config)

--------------------------------------------------------------------------

  Local host:   thetagpu23
  Local device: mlx5_0
--------------------------------------------------------------------------


[38;2;131;131;131m[2023-11-30 07:10:33][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mconfigs.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m72[0m[38;2;119;119;119m][0m - Setting HF_DATASETS_CACHE to [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/.cache/huggingface/[0m[35mdatasets[0m
Failed to download font: Source Sans Pro, skipping!
Failed to download font: Titillium WebRoboto Condensed, skipping!


[38;2;131;131;131m[2023-11-30 07:10:37][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mconfigs.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m295[0m[38;2;119;119;119m][0m - Loading test from [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/data/yelp_review_full/[0m[35mtest.bin[0m
[38;2;131;131;131m[2023-11-30 07:10:37][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mconfigs.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m295[0m[38;2;119;119;119m][0m - Loading train from [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/data/yelp_review_full/[0m[35mtrain.bin[0m
[38;2;131;131;131m[2023-11-30 07:10:38][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mconfigs.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m270[0m[38;2;119;119;119m][0m - Rescaling GAS -> GAS [32m/[0m[32m/[0m WORLD_SIZE = [35m1[0m [32m/[0m[32m/[0m [35m1[0m
[38;2;131;131;

## Prompt (**prior** to training)

In [5]:
query = "What is a supercomputer? Explain like I'm a child, and speak clearly. Double check your logic.."
outputs = trainer.evaluate(query, num_samples=1, display=False)
console.print(fr'\[prompt]: "{query}"')
console.print("\[response]:\n\n" + fr"{outputs['0']['raw']}")

## Train Model


|  **NAME**  |     **DESCRIPTION**          |
|:----------:|:----------------------------:|
|   `step`   | Current training step        |
|   `loss`   | Loss value                   |
|   `dt`     | Time per step (in **ms**)    |
|   `sps`    | Samples per second           |
|   `mtps`   | (million) Tokens per sec     |
|   `mfu`    | Model Flops Utilization*     |
^Logging Legend

*in units of A100 `bfloat16` peak FLOPS

In [6]:
trainer.train()

  0%|          | 0/1000 [00:00<?, ?it/s]

[38;2;131;131;131m[2023-11-30 07:12:55][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mtrainer.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m518[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m100[0m [3;94mloss[0m=[35m3[0m[35m.118[0m [3;94mdt[0m=[35m280[0m[35m.748[0m [3;94msps[0m=[35m3[0m[35m.562[0m [3;94mmtps[0m=[35m0[0m[35m.015[0m [3;94mmfu[0m=[35m34[0m[35m.014[0m [3;94mtrain_loss[0m=[35m3[0m[35m.389[0m [3;94mval_loss[0m=[35m0[0m[35m.000[0m
[38;2;131;131;131m[2023-11-30 07:13:23][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mtrainer.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m518[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m200[0m [3;94mloss[0m=[35m3[0m[35m.138[0m [3;94mdt[0m=[35m281[0m[35m.203[0m [3;94msps[0m=[35m3[0m[35m.556[0m [3;94mmtps[0m=[35m0[0m[35m.015[0m [3;94mmfu[0m=[35m34[0m[35m.008[0m [3;94mtrain_loss[0m=[35m3[0m[35m.389[0m [3;94mval_loss[0m=[

## Evaluate Model

In [7]:
query = "What is a supercomputer? Explain like I'm a child, and speak clearly. Double check your logic.."
outputs = trainer.evaluate(query, num_samples=1, display=False)
console.print(fr'\[prompt]: "{query}"')
console.print("\[response]:\n\n" + fr"{outputs['0']['raw']}")