# `nanoGPT`: GPT-2 XL (1.5B Params)

## Install / Setup

### First Time Running

We need to install `ngpt` and setup the Shakespeare dataset

This will need to be ran the first time you are running this notebook.

Following the

```python
!python3 -m pip install nanoGPT
```

you will need to restart your runtime (Runtime -> Restart runtime)

After this, you should be able to

```python
>>> import ngpt
>>> ngpt.__file__
'/content/nanoGPT/src/ngpt/__init__.py'
```

In [1]:
%%bash

python3 -c 'import ngpt; print(ngpt.__file__)' 2> '/dev/null'

if [[ $? -eq 0 ]]; then
    echo "Has ngpt installed. Nothing to do."
else
    echo "Does not have ngpt installed. Installing..."
    git clone 'https://github.com/saforem2/nanoGPT'
    python3 nanoGPT/data/shakespeare_char/prepare.py
    python3 -m pip install -e nanoGPT -vvv
fi

/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/__init__.py
Has ngpt installed. Nothing to do.


## Post Install

If installed correctly, you should be able to:

```python
>>> import ngpt
>>> ngpt.__file__
'/path/to/nanoGPT/src/ngpt/__init__.py'
```

In [2]:
%load_ext autoreload
%autoreload 2

import ngpt
from enrich import get_logger
log = get_logger('jupyter')
log.info(ngpt.__file__)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
[38;2;131;131;131m[2023-11-30 08:04:45][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119m3434626787.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m7[0m[38;2;119;119;119m][0m - [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/[0m[35m__init__.py[0m


## Build Trainer

Explicitly, we:

1. `setup_torch(...)`
2. Build `cfg: DictConfig = get_config(...)`
3. Instnatiate `config: ExperimentConfig = instantiate(cfg)`
4. Build `trainer = Trainer(config)`

In [7]:
import os
import numpy as np
from ezpz import setup_torch
from hydra.utils import instantiate
from ngpt.configs import get_config, PROJECT_ROOT
from ngpt.trainer import Trainer
from enrich.console import get_console

console = get_console()
HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['MASTER_PORT'] = '5127'
os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

SEED = np.random.randint(2**32)
console.print(f'SEED: {SEED}')

rank = setup_torch('DDP', seed=1234)
cfg = get_config(
    [
        'data=owt',
        'model=gpt2_xl',
        'model.block_size=64',
        'optimizer=gpt2_xl',
        'train=gpt2_xl',
        'train.init_from=gpt2-xl',
        'train.max_iters=100',
        'train.dtype=bfloat16',
    ]
)
config = instantiate(cfg)
trainer = Trainer(config)

[38;2;131;131;131m[2023-11-30 08:34:50][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mconfigs.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m295[0m[38;2;119;119;119m][0m - Loading val from [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/data/openwebtext/[0m[35mval.bin[0m
[38;2;131;131;131m[2023-11-30 08:34:50][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mconfigs.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m295[0m[38;2;119;119;119m][0m - Loading train from [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/data/openwebtext/[0m[35mtrain.bin[0m
[38;2;131;131;131m[2023-11-30 08:34:50][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mconfigs.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m270[0m[38;2;119;119;119m][0m - Rescaling GAS -> GAS [32m/[0m[32m/[0m WORLD_SIZE = [35m1[0m [32m/[0m[32m/[0m [35m1[0m
[38;2;131;131;131m[2023-11

## Prompt (**prior** to training)

In [8]:
query = "What is a supercomputer?"
outputs = trainer.evaluate(query, num_samples=1, display=False)
log.info("['prompt']: '{query}'")
log.info("['response']:\n\n" + fr"{outputs['0']['raw']}")

[38;2;131;131;131m[2023-11-30 08:35:54][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119m1657463709.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m3[0m[38;2;119;119;119m][0m - [1m[[0m[32m'prompt'[0m[1m][0m: [32m'[0m[32m{[0m[32mquery[0m[32m}[0m[32m'[0m
[38;2;131;131;131m[2023-11-30 08:35:54][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119m1657463709.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m4[0m[38;2;119;119;119m][0m - [1m[[0m[32m'response'[0m[1m][0m:

What is a supercomputer? When it comes to massive computing, a supercomputer is simply a large computer system that has the ability to perform many calculations at once. This can be the result of using many different processing cores, or memory, or operating at a high clock speed. Supercomputers are often used to crack complex calculations and research problems.

Image credit: Wikipedia

Image credit: Wikipedia

Image credit: Wikipedia

Image credit: Wikipedia

Image credit: 

## Train Model


|  **NAME**  |     **DESCRIPTION**          |
|:----------:|:----------------------------:|
|   `step`   | Current training step        |
|   `loss`   | Loss value                   |
|   `dt`     | Time per step (in **ms**)    |
|   `sps`    | Samples per second           |
|   `mtps`   | (million) Tokens per sec     |
|   `mfu`    | Model Flops Utilization*     |
^Logging Legend

*in units of A100 `bfloat16` peak FLOPS

In [9]:
trainer.model.module.train()
trainer.train()

  0%|          | 0/100 [00:00<?, ?it/s]

OutOfMemoryError: CUDA out of memory. Tried to allocate 150.00 MiB. GPU 0 has a total capacty of 79.35 GiB of which 48.19 MiB is free. Including non-PyTorch memory, this process has 79.30 GiB memory in use. Of the allocated memory 77.64 GiB is allocated by PyTorch, and 225.02 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

## Evaluate Model

In [None]:
query = "What is a supercomputer?"
outputs = trainer.evaluate(query, num_samples=1, display=False)
log.info("['prompt']: '{query}'")
log.info("['response']:\n\n" + fr"{outputs['0']['raw']}")