# `nanoGPT`: GPT-2 Large (775M Params)

## Install / Setup

### First Time Running

We need to install `ngpt` and setup the Shakespeare dataset

This will need to be ran the first time you are running this notebook.

Following the

```python
!python3 -m pip install nanoGPT
```

you will need to restart your runtime (Runtime -> Restart runtime)

After this, you should be able to

```python
>>> import ngpt
>>> ngpt.__file__
'/content/nanoGPT/src/ngpt/__init__.py'
```

In [1]:
%%bash

python3 -c 'import ngpt; print(ngpt.__file__)' 2> '/dev/null'

if [[ $? -eq 0 ]]; then
    echo "Has ngpt installed. Nothing to do."
else
    echo "Does not have ngpt installed. Installing..."
    git clone 'https://github.com/saforem2/nanoGPT'
    python3 nanoGPT/data/shakespeare_char/prepare.py
    python3 -m pip install -e nanoGPT -vvv
fi

/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/__init__.py
Has ngpt installed. Nothing to do.


## Post Install

If installed correctly, you should be able to:

```python
>>> import ngpt
>>> ngpt.__file__
'/path/to/nanoGPT/src/ngpt/__init__.py'
```

In [2]:
%load_ext autoreload
%autoreload 2

import ngpt
from enrich import get_logger
log = get_logger('jupyter')
log.info(ngpt.__file__)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
[38;2;131;131;131m[2023-11-30 07:43:58][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119m3434626787.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m7[0m[38;2;119;119;119m][0m - [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/[0m[35m__init__.py[0m


## Build Trainer

Explicitly, we:

1. `setup_torch(...)`
2. Build `cfg: DictConfig = get_config(...)`
3. Instnatiate `config: ExperimentConfig = instantiate(cfg)`
4. Build `trainer = Trainer(config)`

In [3]:
import os
from ezpz import setup_torch
from hydra.utils import instantiate
from ngpt.configs import get_config, PROJECT_ROOT
from ngpt.trainer import Trainer

HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['MASTER_PORT'] = '5631'
os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

rank = setup_torch('DDP', seed=1234)
cfg = get_config(
    [
        'data=owt',
        'model=gpt2_large',
        'model.block_size=128',
        'optimizer=gpt2_large',
        'train=gpt2_large',
        'train.init_from=gpt2-large',
        'train.max_iters=1000',
        'train.dtype=bfloat16',
    ]
)
config = instantiate(cfg)
trainer = Trainer(config)

--------------------------------------------------------------------------

  Local host:   thetagpu23
  Local device: mlx5_0
--------------------------------------------------------------------------


[38;2;131;131;131m[2023-11-30 07:44:02][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mconfigs.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m72[0m[38;2;119;119;119m][0m - Setting HF_DATASETS_CACHE to [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/.cache/huggingface/[0m[35mdatasets[0m
Failed to download font: Source Sans Pro, skipping!
Failed to download font: Titillium WebRoboto Condensed, skipping!
[38;2;131;131;131m[2023-11-30 07:44:05][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mconfigs.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m295[0m[38;2;119;119;119m][0m - Loading val from [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/data/openwebtext/[0m[35mval.bin[0m
[38;2;131;131;131m[2023-11-30 07:44:05][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mconfigs.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m295[0m[38;2;119;119;119m][0m

## Prompt (prior to training)

In [4]:
query = "What is a supercomputer?"
outputs = trainer.evaluate(query, num_samples=1, display=False)
log.info("['prompt']: '{query}'")
log.info("['response']:\n\n" + fr"{outputs['0']['raw']}")

[38;2;131;131;131m[2023-11-30 07:44:53][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119m1657463709.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m3[0m[38;2;119;119;119m][0m - [1m[[0m[32m'prompt'[0m[1m][0m: [32m'[0m[32m{[0m[32mquery[0m[32m}[0m[32m'[0m
[38;2;131;131;131m[2023-11-30 07:44:53][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119m1657463709.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m4[0m[38;2;119;119;119m][0m - [1m[[0m[32m'response'[0m[1m][0m:

What is a supercomputer? A supercomputer is a computer that can run more than one hundred thousand instructions per second. It's basically like a computer that can simulate a human brain. So, if we had a supercomputer with a billion instructions per second, that's forty-seven times as fast as the brain itself.

A supercomputer is something that we can invent and code and build. But we're not going to have one, because we're not going to have a supercomputer. There are some th

## Train

In [5]:
trainer.train()

  0%|          | 0/1000 [00:00<?, ?it/s]

[38;2;131;131;131m[2023-11-30 07:46:25][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mtrainer.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m518[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m100[0m [3;94mloss[0m=[35m2[0m[35m.422[0m [3;94mdt[0m=[35m563[0m[35m.654[0m [3;94msps[0m=[35m1[0m[35m.774[0m [3;94mmtps[0m=[35m0[0m[35m.001[0m [3;94mmfu[0m=[35m36[0m[35m.352[0m [3;94mtrain_loss[0m=[35m2[0m[35m.662[0m [3;94mval_loss[0m=[35m2[0m[35m.654[0m
[38;2;131;131;131m[2023-11-30 07:47:23][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119mtrainer.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m518[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m200[0m [3;94mloss[0m=[35m2[0m[35m.877[0m [3;94mdt[0m=[35m575[0m[35m.121[0m [3;94msps[0m=[35m1[0m[35m.739[0m [3;94mmtps[0m=[35m0[0m[35m.001[0m [3;94mmfu[0m=[35m36[0m[35m.280[0m [3;94mtrain_loss[0m=[35m2[0m[35m.662[0m [3;94mval_loss[0m=[

## Evaluate Model

In [7]:
query = "What is a supercomputer?"
outputs = trainer.evaluate(query, num_samples=1, display=False)
log.info("['prompt']: '{query}'")
log.info("['response']:\n\n" + fr"{outputs['0']['raw']}")

[38;2;131;131;131m[2023-11-30 08:00:05][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119m1657463709.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m3[0m[38;2;119;119;119m][0m - [1m[[0m[32m'prompt'[0m[1m][0m: [32m'[0m[32m{[0m[32mquery[0m[32m}[0m[32m'[0m
[38;2;131;131;131m[2023-11-30 08:00:05][0m[34m[INFO][0m[38;2;119;119;119m[[0m[38;2;119;119;119m1657463709.py[0m[38;2;119;119;119m:[0m[38;2;119;119;119m4[0m[38;2;119;119;119m][0m - [1m[[0m[32m'response'[0m[1m][0m:

What is a supercomputer? There is no place on Earth where they can make “a supercomputer.” It’s called Deep Space Computer, or DSC, and only the U.S. military has it.

What kind of computer can you build? DSC lets us take an image, an equation, a concept, and manipulate it in ways that are orders of magnitude more powerful than any human, and could one day outpace us. We use the computer to make scientific breakthroughs, and to make our everyday lives more complex and interes