# `nanoGPT`

## Install / Setup

### First Time Running

We need to install `ngpt` and setup the Shakespeare dataset

This will need to be ran the first time you are running this notebook.

Following the

```python
!python3 -m pip install nanoGPT
```

you will need to restart your runtime (Runtime -> Restart runtime)

After this, you should be able to

```python
>>> import ngpt
>>> ngpt.__file__
'/content/nanoGPT/src/ngpt/__init__.py'
```

In [1]:
%%bash

python3 -c 'import ngpt; print(ngpt.__file__)' 2> '/dev/null'

if [[ $? -eq 0 ]]; then
    echo "Has ngpt installed. Nothing to do."
else
    echo "Does not have ngpt installed. Installing..."
    git clone 'https://github.com/saforem2/nanoGPT'
    python3 nanoGPT/data/shakespeare_char/prepare.py
    python3 -m pip install -e nanoGPT -vvv
fi

/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/__init__.py
Has ngpt installed. Nothing to do.


## Post Install

If installed correctly, you should be able to:

```python
>>> import ngpt
>>> ngpt.__file__
'/path/to/nanoGPT/src/ngpt/__init__.py'
```

In [2]:
%load_ext autoreload
%autoreload 2

import ngpt
from rich import print
print(ngpt.__file__)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Build Trainer

Explicitly, we:

1. `setup_torch(...)`
2. Build `cfg: DictConfig = get_config(...)`
3. Instnatiate `config: ExperimentConfig = instantiate(cfg)`
4. Build `trainer = Trainer(config)`

In [3]:
import os
import numpy as np
from ezpz import setup_torch
from hydra.utils import instantiate
from ngpt.configs import get_config, PROJECT_ROOT
from ngpt.trainer import Trainer
from enrich.console import get_console

console = get_console()
HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['MASTER_PORT'] = '5432'
os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

rank = setup_torch('DDP', seed=1234)
cfg = get_config(
    [
        'data=shakespeare',
        'model=shakespeare',
        'optimizer=shakespeare',
        'train=shakespeare',
        'train.dtype=bfloat16',
        'train.max_iters=5000',
        'train.log_interval=250',
        'train.eval_interval=1000',
    ]
)
config = instantiate(cfg)
trainer = Trainer(config)

--------------------------------------------------------------------------

  Local host:   thetagpu23
  Local device: mlx5_0
--------------------------------------------------------------------------
2023-11-15 09:28:25.224857: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


[38;2;131;131;131m[2023-11-15 09:28:27][0m[34m[INFO][0m[38;2;119;119;119m[configs.py:263[0m[38;2;119;119;119m][0m - Rescaling GAS -> GAS [32m/[0m[32m/[0m WORLD_SIZE = [35m1[0m [32m/[0m[32m/[0m [35m1[0m
[38;2;131;131;131m[2023-11-15 09:28:27][0m[34m[INFO][0m[38;2;119;119;119m[configs.py:398[0m[38;2;119;119;119m][0m - Tokens per iteration: [35m16[0m,[35m384[0m
[38;2;131;131;131m[2023-11-15 09:28:28][0m[34m[INFO][0m[38;2;119;119;119m[configs.py:430[0m[38;2;119;119;119m][0m - Using [1m<[0m[1;95mtorch.amp.autocast_mode.autocast[0m[39m object at [0m[35m0x7f94e170a230[0m[1m>[0m
[38;2;131;131;131m[2023-11-15 09:28:28][0m[34m[INFO][0m[38;2;119;119;119m[configs.py:436[0m[38;2;119;119;119m][0m - Initializing a new model from scratch
[38;2;131;131;131m[2023-11-15 09:28:28][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:179[0m[38;2;119;119;119m][0m - Initializing a new model from scratch
[38;2;131;131;131m[2023-11-15 09:28:28][0m[34

[38;2;131;131;131m[2023-11-11 01:15:48][0m[34m[INFO][0m[38;2;119;119;119m[configs.py:436[0m[38;2;119;119;119m][0m - Initializing a new model from scratch


[38;2;131;131;131m[2023-11-11 01:15:48][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:179[0m[38;2;119;119;119m][0m - Initializing a new model from scratch


[38;2;131;131;131m[2023-11-11 01:15:48][0m[34m[INFO][0m[38;2;119;119;119m[model.py:160[0m[38;2;119;119;119m][0m - number of parameters: [35m10.[0m65M


[38;2;131;131;131m[2023-11-11 01:15:50][0m[34m[INFO][0m[38;2;119;119;119m[model.py:290[0m[38;2;119;119;119m][0m - num decayed parameter tensors: [35m26[0m, with [35m10[0m,[35m740[0m,[35m096[0m parameters


[38;2;131;131;131m[2023-11-11 01:15:50][0m[34m[INFO][0m[38;2;119;119;119m[model.py:291[0m[38;2;119;119;119m][0m - num non-decayed parameter tensors: [35m13[0m, with [35m4[0m,[35m992[0m parameters


[38;2;131;131;131m[2023-11-11 01:15:50][0m[34m[INFO][0m[38;2;119;119;119m[model.py:297[0m[38;2;119;119;119m][0m - using fused AdamW: [3;92mTrue[0m


## Prompt (**prior** to training)

In [5]:
query = "What is a supercomputer?"
outputs = trainer.evaluate(query, num_samples=1, display=False)
console.print(fr'\[prompt]: "{query}"')
console.print("\[response]:\n\n" + fr"{outputs['0']['raw']}")

## Train Model

Legend:

|  name  |       description            |
|:------:|:----------------------------:|
| `step` | Current training step        |
| `loss` | Loss value                   |
| `dt`   | Time per step (in **ms**)    |
| `sps`  | Samples per second           |
| `mtps` | (million) Tokens per sec     |
| `mfu`  | Model Flops utilization[^1]  |

[^1]: in units of A100 `bfloat16` peak FLOPS

In [6]:
trainer.train()

  0%|          | 0/5000 [00:00<?, ?it/s]

[38;2;131;131;131m[2023-11-15 09:29:20][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m250[0m [3;94mloss[0m=[35m2[0m[35m.049[0m [3;94mdt[0m=[35m26[0m[35m.551[0m [3;94msps[0m=[35m37[0m[35m.663[0m [3;94mmtps[0m=[35m0[0m[35m.617[0m [3;94mmfu[0m=[35m14[0m[35m.034[0m [3;94mtrain_loss[0m=[35m4[0m[35m.299[0m [3;94mval_loss[0m=[35m4[0m[35m.291[0m
[38;2;131;131;131m[2023-11-15 09:29:27][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m500[0m [3;94mloss[0m=[35m1[0m[35m.612[0m [3;94mdt[0m=[35m26[0m[35m.721[0m [3;94msps[0m=[35m37[0m[35m.424[0m [3;94mmtps[0m=[35m0[0m[35m.613[0m [3;94mmfu[0m=[35m14[0m[35m.025[0m [3;94mtrain_loss[0m=[35m4[0m[35m.299[0m [3;94mval_loss[0m=[35m4[0m[35m.291[0m
[38;2;131;131;131m[2023-11-15 09:29:33][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m

[38;2;131;131;131m[2023-11-11 01:16:20][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m750[0m [3;94mloss[0m=[35m1[0m[35m.418[0m [3;94mdt[0m=[35m27[0m[35m.394[0m [3;94msps[0m=[35m36[0m[35m.505[0m [3;94mmtps[0m=[35m0[0m[35m.598[0m [3;94mmfu[0m=[35m13[0m[35m.619[0m [3;94mtrain_loss[0m=[35m4[0m[35m.299[0m [3;94mval_loss[0m=[35m4[0m[35m.291[0m


[38;2;131;131;131m[2023-11-11 01:16:27][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m1000[0m [3;94mloss[0m=[35m1[0m[35m.332[0m [3;94mdt[0m=[35m26[0m[35m.899[0m [3;94msps[0m=[35m37[0m[35m.176[0m [3;94mmtps[0m=[35m0[0m[35m.609[0m [3;94mmfu[0m=[35m13[0m[35m.642[0m [3;94mtrain_loss[0m=[35m4[0m[35m.299[0m [3;94mval_loss[0m=[35m4[0m[35m.291[0m


[38;2;131;131;131m[2023-11-11 01:16:34][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m1250[0m [3;94mloss[0m=[35m1[0m[35m.277[0m [3;94mdt[0m=[35m27[0m[35m.229[0m [3;94msps[0m=[35m36[0m[35m.725[0m [3;94mmtps[0m=[35m0[0m[35m.602[0m [3;94mmfu[0m=[35m13[0m[35m.647[0m [3;94mtrain_loss[0m=[35m4[0m[35m.299[0m [3;94mval_loss[0m=[35m4[0m[35m.291[0m


[38;2;131;131;131m[2023-11-11 01:16:40][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m1500[0m [3;94mloss[0m=[35m1[0m[35m.234[0m [3;94mdt[0m=[35m26[0m[35m.878[0m [3;94msps[0m=[35m37[0m[35m.205[0m [3;94mmtps[0m=[35m0[0m[35m.610[0m [3;94mmfu[0m=[35m13[0m[35m.668[0m [3;94mtrain_loss[0m=[35m4[0m[35m.299[0m [3;94mval_loss[0m=[35m4[0m[35m.291[0m


[38;2;131;131;131m[2023-11-11 01:16:47][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m1750[0m [3;94mloss[0m=[35m1[0m[35m.175[0m [3;94mdt[0m=[35m27[0m[35m.460[0m [3;94msps[0m=[35m36[0m[35m.417[0m [3;94mmtps[0m=[35m0[0m[35m.597[0m [3;94mmfu[0m=[35m13[0m[35m.659[0m [3;94mtrain_loss[0m=[35m4[0m[35m.299[0m [3;94mval_loss[0m=[35m4[0m[35m.291[0m


[38;2;131;131;131m[2023-11-11 01:16:54][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m2000[0m [3;94mloss[0m=[35m1[0m[35m.140[0m [3;94mdt[0m=[35m26[0m[35m.889[0m [3;94msps[0m=[35m37[0m[35m.190[0m [3;94mmtps[0m=[35m0[0m[35m.609[0m [3;94mmfu[0m=[35m13[0m[35m.678[0m [3;94mtrain_loss[0m=[35m4[0m[35m.299[0m [3;94mval_loss[0m=[35m4[0m[35m.291[0m


[38;2;131;131;131m[2023-11-11 01:16:58][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:432[0m[38;2;119;119;119m][0m - Saving checkpoint to: [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/[0m[35mngpt[0m


[38;2;131;131;131m[2023-11-11 01:16:58][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:433[0m[38;2;119;119;119m][0m - Saving model to: [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/[0m[35mmodel.pth[0m


[38;2;131;131;131m[2023-11-11 01:16:58][0m[34m[INFO][0m[38;2;119;119;119m[configs.py:129[0m[38;2;119;119;119m][0m - Appending [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/[0m[35mngpt[0m to [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/ckpts/[0m[35mcheckpoints.log[0m


[38;2;131;131;131m[2023-11-11 01:17:05][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m2250[0m [3;94mloss[0m=[35m1[0m[35m.121[0m [3;94mdt[0m=[35m27[0m[35m.308[0m [3;94msps[0m=[35m36[0m[35m.619[0m [3;94mmtps[0m=[35m0[0m[35m.600[0m [3;94mmfu[0m=[35m13[0m[35m.675[0m [3;94mtrain_loss[0m=[35m1[0m[35m.050[0m [3;94mval_loss[0m=[35m1[0m[35m.474[0m


[38;2;131;131;131m[2023-11-11 01:17:12][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m2500[0m [3;94mloss[0m=[35m1[0m[35m.067[0m [3;94mdt[0m=[35m26[0m[35m.838[0m [3;94msps[0m=[35m37[0m[35m.261[0m [3;94mmtps[0m=[35m0[0m[35m.610[0m [3;94mmfu[0m=[35m13[0m[35m.696[0m [3;94mtrain_loss[0m=[35m1[0m[35m.050[0m [3;94mval_loss[0m=[35m1[0m[35m.474[0m


[38;2;131;131;131m[2023-11-11 01:17:19][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m2750[0m [3;94mloss[0m=[35m1[0m[35m.034[0m [3;94mdt[0m=[35m27[0m[35m.360[0m [3;94msps[0m=[35m36[0m[35m.550[0m [3;94mmtps[0m=[35m0[0m[35m.599[0m [3;94mmfu[0m=[35m13[0m[35m.688[0m [3;94mtrain_loss[0m=[35m1[0m[35m.050[0m [3;94mval_loss[0m=[35m1[0m[35m.474[0m


[38;2;131;131;131m[2023-11-11 01:17:26][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m3000[0m [3;94mloss[0m=[35m1[0m[35m.009[0m [3;94mdt[0m=[35m26[0m[35m.237[0m [3;94msps[0m=[35m38[0m[35m.114[0m [3;94mmtps[0m=[35m0[0m[35m.624[0m [3;94mmfu[0m=[35m13[0m[35m.740[0m [3;94mtrain_loss[0m=[35m1[0m[35m.050[0m [3;94mval_loss[0m=[35m1[0m[35m.474[0m


[38;2;131;131;131m[2023-11-11 01:17:33][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m3250[0m [3;94mloss[0m=[35m0[0m[35m.940[0m [3;94mdt[0m=[35m26[0m[35m.991[0m [3;94msps[0m=[35m37[0m[35m.050[0m [3;94mmtps[0m=[35m0[0m[35m.607[0m [3;94mmfu[0m=[35m13[0m[35m.746[0m [3;94mtrain_loss[0m=[35m1[0m[35m.050[0m [3;94mval_loss[0m=[35m1[0m[35m.474[0m


[38;2;131;131;131m[2023-11-11 01:17:39][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m3500[0m [3;94mloss[0m=[35m0[0m[35m.947[0m [3;94mdt[0m=[35m26[0m[35m.261[0m [3;94msps[0m=[35m38[0m[35m.080[0m [3;94mmtps[0m=[35m0[0m[35m.624[0m [3;94mmfu[0m=[35m13[0m[35m.791[0m [3;94mtrain_loss[0m=[35m1[0m[35m.050[0m [3;94mval_loss[0m=[35m1[0m[35m.474[0m


[38;2;131;131;131m[2023-11-11 01:17:46][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m3750[0m [3;94mloss[0m=[35m0[0m[35m.885[0m [3;94mdt[0m=[35m37[0m[35m.216[0m [3;94msps[0m=[35m26[0m[35m.870[0m [3;94mmtps[0m=[35m0[0m[35m.440[0m [3;94mmfu[0m=[35m13[0m[35m.413[0m [3;94mtrain_loss[0m=[35m1[0m[35m.050[0m [3;94mval_loss[0m=[35m1[0m[35m.474[0m


[38;2;131;131;131m[2023-11-11 01:17:53][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m4000[0m [3;94mloss[0m=[35m0[0m[35m.866[0m [3;94mdt[0m=[35m26[0m[35m.241[0m [3;94msps[0m=[35m38[0m[35m.108[0m [3;94mmtps[0m=[35m0[0m[35m.624[0m [3;94mmfu[0m=[35m13[0m[35m.492[0m [3;94mtrain_loss[0m=[35m1[0m[35m.050[0m [3;94mval_loss[0m=[35m1[0m[35m.474[0m


[38;2;131;131;131m[2023-11-11 01:17:57][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:432[0m[38;2;119;119;119m][0m - Saving checkpoint to: [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/[0m[35mngpt[0m


[38;2;131;131;131m[2023-11-11 01:17:57][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:433[0m[38;2;119;119;119m][0m - Saving model to: [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/[0m[35mmodel.pth[0m


[38;2;131;131;131m[2023-11-11 01:17:57][0m[34m[INFO][0m[38;2;119;119;119m[configs.py:129[0m[38;2;119;119;119m][0m - Appending [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/[0m[35mngpt[0m to [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/ckpts/[0m[35mcheckpoints.log[0m


[38;2;131;131;131m[2023-11-11 01:18:04][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m4250[0m [3;94mloss[0m=[35m0[0m[35m.847[0m [3;94mdt[0m=[35m27[0m[35m.228[0m [3;94msps[0m=[35m36[0m[35m.728[0m [3;94mmtps[0m=[35m0[0m[35m.602[0m [3;94mmfu[0m=[35m13[0m[35m.511[0m [3;94mtrain_loss[0m=[35m0[0m[35m.696[0m [3;94mval_loss[0m=[35m1[0m[35m.637[0m


[38;2;131;131;131m[2023-11-11 01:18:11][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m4500[0m [3;94mloss[0m=[35m0[0m[35m.835[0m [3;94mdt[0m=[35m26[0m[35m.215[0m [3;94msps[0m=[35m38[0m[35m.147[0m [3;94mmtps[0m=[35m0[0m[35m.625[0m [3;94mmfu[0m=[35m13[0m[35m.581[0m [3;94mtrain_loss[0m=[35m0[0m[35m.696[0m [3;94mval_loss[0m=[35m1[0m[35m.637[0m


[38;2;131;131;131m[2023-11-11 01:18:18][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m4750[0m [3;94mloss[0m=[35m0[0m[35m.822[0m [3;94mdt[0m=[35m26[0m[35m.657[0m [3;94msps[0m=[35m37[0m[35m.513[0m [3;94mmtps[0m=[35m0[0m[35m.615[0m [3;94mmfu[0m=[35m13[0m[35m.621[0m [3;94mtrain_loss[0m=[35m0[0m[35m.696[0m [3;94mval_loss[0m=[35m1[0m[35m.637[0m


[38;2;131;131;131m[2023-11-11 01:18:24][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m5000[0m [3;94mloss[0m=[35m0[0m[35m.808[0m [3;94mdt[0m=[35m26[0m[35m.635[0m [3;94msps[0m=[35m37[0m[35m.544[0m [3;94mmtps[0m=[35m0[0m[35m.615[0m [3;94mmfu[0m=[35m13[0m[35m.658[0m [3;94mtrain_loss[0m=[35m0[0m[35m.696[0m [3;94mval_loss[0m=[35m1[0m[35m.637[0m


[38;2;131;131;131m[2023-11-11 01:18:31][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m5250[0m [3;94mloss[0m=[35m0[0m[35m.811[0m [3;94mdt[0m=[35m26[0m[35m.267[0m [3;94msps[0m=[35m38[0m[35m.071[0m [3;94mmtps[0m=[35m0[0m[35m.624[0m [3;94mmfu[0m=[35m13[0m[35m.711[0m [3;94mtrain_loss[0m=[35m0[0m[35m.696[0m [3;94mval_loss[0m=[35m1[0m[35m.637[0m


[38;2;131;131;131m[2023-11-11 01:18:38][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m5500[0m [3;94mloss[0m=[35m0[0m[35m.769[0m [3;94mdt[0m=[35m26[0m[35m.406[0m [3;94msps[0m=[35m37[0m[35m.870[0m [3;94mmtps[0m=[35m0[0m[35m.620[0m [3;94mmfu[0m=[35m13[0m[35m.751[0m [3;94mtrain_loss[0m=[35m0[0m[35m.696[0m [3;94mval_loss[0m=[35m1[0m[35m.637[0m


[38;2;131;131;131m[2023-11-11 01:18:44][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m5750[0m [3;94mloss[0m=[35m0[0m[35m.780[0m [3;94mdt[0m=[35m26[0m[35m.239[0m [3;94msps[0m=[35m38[0m[35m.111[0m [3;94mmtps[0m=[35m0[0m[35m.624[0m [3;94mmfu[0m=[35m13[0m[35m.796[0m [3;94mtrain_loss[0m=[35m0[0m[35m.696[0m [3;94mval_loss[0m=[35m1[0m[35m.637[0m


[38;2;131;131;131m[2023-11-11 01:18:51][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m6000[0m [3;94mloss[0m=[35m0[0m[35m.767[0m [3;94mdt[0m=[35m26[0m[35m.682[0m [3;94msps[0m=[35m37[0m[35m.478[0m [3;94mmtps[0m=[35m0[0m[35m.614[0m [3;94mmfu[0m=[35m13[0m[35m.813[0m [3;94mtrain_loss[0m=[35m0[0m[35m.696[0m [3;94mval_loss[0m=[35m1[0m[35m.637[0m


[38;2;131;131;131m[2023-11-11 01:18:55][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:432[0m[38;2;119;119;119m][0m - Saving checkpoint to: [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/[0m[35mngpt[0m


[38;2;131;131;131m[2023-11-11 01:18:55][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:433[0m[38;2;119;119;119m][0m - Saving model to: [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/[0m[35mmodel.pth[0m


[38;2;131;131;131m[2023-11-11 01:18:56][0m[34m[INFO][0m[38;2;119;119;119m[configs.py:129[0m[38;2;119;119;119m][0m - Appending [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/[0m[35mngpt[0m to [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/ckpts/[0m[35mcheckpoints.log[0m


[38;2;131;131;131m[2023-11-11 01:19:02][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m6250[0m [3;94mloss[0m=[35m0[0m[35m.773[0m [3;94mdt[0m=[35m31[0m[35m.104[0m [3;94msps[0m=[35m32[0m[35m.151[0m [3;94mmtps[0m=[35m0[0m[35m.527[0m [3;94mmfu[0m=[35m13[0m[35m.629[0m [3;94mtrain_loss[0m=[35m0[0m[35m.556[0m [3;94mval_loss[0m=[35m1[0m[35m.755[0m


[38;2;131;131;131m[2023-11-11 01:19:09][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m6500[0m [3;94mloss[0m=[35m0[0m[35m.759[0m [3;94mdt[0m=[35m27[0m[35m.142[0m [3;94msps[0m=[35m36[0m[35m.843[0m [3;94mmtps[0m=[35m0[0m[35m.604[0m [3;94mmfu[0m=[35m13[0m[35m.639[0m [3;94mtrain_loss[0m=[35m0[0m[35m.556[0m [3;94mval_loss[0m=[35m1[0m[35m.755[0m


[38;2;131;131;131m[2023-11-11 01:19:16][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m6750[0m [3;94mloss[0m=[35m0[0m[35m.753[0m [3;94mdt[0m=[35m26[0m[35m.712[0m [3;94msps[0m=[35m37[0m[35m.437[0m [3;94mmtps[0m=[35m0[0m[35m.613[0m [3;94mmfu[0m=[35m13[0m[35m.670[0m [3;94mtrain_loss[0m=[35m0[0m[35m.556[0m [3;94mval_loss[0m=[35m1[0m[35m.755[0m


[38;2;131;131;131m[2023-11-11 01:19:22][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m7000[0m [3;94mloss[0m=[35m0[0m[35m.745[0m [3;94mdt[0m=[35m26[0m[35m.871[0m [3;94msps[0m=[35m37[0m[35m.215[0m [3;94mmtps[0m=[35m0[0m[35m.610[0m [3;94mmfu[0m=[35m13[0m[35m.690[0m [3;94mtrain_loss[0m=[35m0[0m[35m.556[0m [3;94mval_loss[0m=[35m1[0m[35m.755[0m


[38;2;131;131;131m[2023-11-11 01:19:29][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m7250[0m [3;94mloss[0m=[35m0[0m[35m.733[0m [3;94mdt[0m=[35m26[0m[35m.266[0m [3;94msps[0m=[35m38[0m[35m.072[0m [3;94mmtps[0m=[35m0[0m[35m.624[0m [3;94mmfu[0m=[35m13[0m[35m.740[0m [3;94mtrain_loss[0m=[35m0[0m[35m.556[0m [3;94mval_loss[0m=[35m1[0m[35m.755[0m


[38;2;131;131;131m[2023-11-11 01:19:36][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m7500[0m [3;94mloss[0m=[35m0[0m[35m.723[0m [3;94mdt[0m=[35m26[0m[35m.817[0m [3;94msps[0m=[35m37[0m[35m.289[0m [3;94mmtps[0m=[35m0[0m[35m.611[0m [3;94mmfu[0m=[35m13[0m[35m.755[0m [3;94mtrain_loss[0m=[35m0[0m[35m.556[0m [3;94mval_loss[0m=[35m1[0m[35m.755[0m


[38;2;131;131;131m[2023-11-11 01:19:43][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m7750[0m [3;94mloss[0m=[35m0[0m[35m.747[0m [3;94mdt[0m=[35m26[0m[35m.461[0m [3;94msps[0m=[35m37[0m[35m.791[0m [3;94mmtps[0m=[35m0[0m[35m.619[0m [3;94mmfu[0m=[35m13[0m[35m.788[0m [3;94mtrain_loss[0m=[35m0[0m[35m.556[0m [3;94mval_loss[0m=[35m1[0m[35m.755[0m


[38;2;131;131;131m[2023-11-11 01:19:49][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m8000[0m [3;94mloss[0m=[35m0[0m[35m.729[0m [3;94mdt[0m=[35m29[0m[35m.348[0m [3;94msps[0m=[35m34[0m[35m.074[0m [3;94mmtps[0m=[35m0[0m[35m.558[0m [3;94mmfu[0m=[35m13[0m[35m.679[0m [3;94mtrain_loss[0m=[35m0[0m[35m.556[0m [3;94mval_loss[0m=[35m1[0m[35m.755[0m


[38;2;131;131;131m[2023-11-11 01:19:53][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:432[0m[38;2;119;119;119m][0m - Saving checkpoint to: [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/[0m[35mngpt[0m


[38;2;131;131;131m[2023-11-11 01:19:53][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:433[0m[38;2;119;119;119m][0m - Saving model to: [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/[0m[35mmodel.pth[0m


[38;2;131;131;131m[2023-11-11 01:19:54][0m[34m[INFO][0m[38;2;119;119;119m[configs.py:129[0m[38;2;119;119;119m][0m - Appending [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/[0m[35mngpt[0m to [32m/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/ckpts/[0m[35mcheckpoints.log[0m


[38;2;131;131;131m[2023-11-11 01:20:01][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m8250[0m [3;94mloss[0m=[35m0[0m[35m.718[0m [3;94mdt[0m=[35m26[0m[35m.464[0m [3;94msps[0m=[35m37[0m[35m.787[0m [3;94mmtps[0m=[35m0[0m[35m.619[0m [3;94mmfu[0m=[35m13[0m[35m.719[0m [3;94mtrain_loss[0m=[35m0[0m[35m.473[0m [3;94mval_loss[0m=[35m1[0m[35m.840[0m


[38;2;131;131;131m[2023-11-11 01:20:07][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m8500[0m [3;94mloss[0m=[35m0[0m[35m.705[0m [3;94mdt[0m=[35m27[0m[35m.051[0m [3;94msps[0m=[35m36[0m[35m.967[0m [3;94mmtps[0m=[35m0[0m[35m.606[0m [3;94mmfu[0m=[35m13[0m[35m.725[0m [3;94mtrain_loss[0m=[35m0[0m[35m.473[0m [3;94mval_loss[0m=[35m1[0m[35m.840[0m


[38;2;131;131;131m[2023-11-11 01:20:14][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m8750[0m [3;94mloss[0m=[35m0[0m[35m.704[0m [3;94mdt[0m=[35m26[0m[35m.298[0m [3;94msps[0m=[35m38[0m[35m.026[0m [3;94mmtps[0m=[35m0[0m[35m.623[0m [3;94mmfu[0m=[35m13[0m[35m.769[0m [3;94mtrain_loss[0m=[35m0[0m[35m.473[0m [3;94mval_loss[0m=[35m1[0m[35m.840[0m


[38;2;131;131;131m[2023-11-11 01:20:21][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m9000[0m [3;94mloss[0m=[35m0[0m[35m.694[0m [3;94mdt[0m=[35m27[0m[35m.131[0m [3;94msps[0m=[35m36[0m[35m.858[0m [3;94mmtps[0m=[35m0[0m[35m.604[0m [3;94mmfu[0m=[35m13[0m[35m.766[0m [3;94mtrain_loss[0m=[35m0[0m[35m.473[0m [3;94mval_loss[0m=[35m1[0m[35m.840[0m


[38;2;131;131;131m[2023-11-11 01:20:27][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m9250[0m [3;94mloss[0m=[35m0[0m[35m.700[0m [3;94mdt[0m=[35m26[0m[35m.291[0m [3;94msps[0m=[35m38[0m[35m.036[0m [3;94mmtps[0m=[35m0[0m[35m.623[0m [3;94mmfu[0m=[35m13[0m[35m.806[0m [3;94mtrain_loss[0m=[35m0[0m[35m.473[0m [3;94mval_loss[0m=[35m1[0m[35m.840[0m


[38;2;131;131;131m[2023-11-11 01:20:34][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m9500[0m [3;94mloss[0m=[35m0[0m[35m.668[0m [3;94mdt[0m=[35m27[0m[35m.353[0m [3;94msps[0m=[35m36[0m[35m.560[0m [3;94mmtps[0m=[35m0[0m[35m.599[0m [3;94mmfu[0m=[35m13[0m[35m.788[0m [3;94mtrain_loss[0m=[35m0[0m[35m.473[0m [3;94mval_loss[0m=[35m1[0m[35m.840[0m


[38;2;131;131;131m[2023-11-11 01:20:41][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m9750[0m [3;94mloss[0m=[35m0[0m[35m.658[0m [3;94mdt[0m=[35m26[0m[35m.422[0m [3;94msps[0m=[35m37[0m[35m.847[0m [3;94mmtps[0m=[35m0[0m[35m.620[0m [3;94mmfu[0m=[35m13[0m[35m.819[0m [3;94mtrain_loss[0m=[35m0[0m[35m.473[0m [3;94mval_loss[0m=[35m1[0m[35m.840[0m


[38;2;131;131;131m[2023-11-11 01:20:48][0m[34m[INFO][0m[38;2;119;119;119m[trainer.py:516[0m[38;2;119;119;119m][0m - [3;94mstep[0m=[35m10000[0m [3;94mloss[0m=[35m0[0m[35m.678[0m [3;94mdt[0m=[35m26[0m[35m.887[0m [3;94msps[0m=[35m37[0m[35m.192[0m [3;94mmtps[0m=[35m0[0m[35m.609[0m [3;94mmfu[0m=[35m13[0m[35m.823[0m [3;94mtrain_loss[0m=[35m0[0m[35m.473[0m [3;94mval_loss[0m=[35m1[0m[35m.840[0m


## Evaluate Model

In [7]:
query = "What is a supercomputer?"
outputs = trainer.evaluate(query, num_samples=1, display=False)
console.print(fr'\[prompt]: "{query}"')
console.print("\[response]:\n\n" + fr"{outputs['0']['raw']}")