# Try nanoGPT in a Jupyter Notebook

It is straight forward explained by Andrej Karpathy in his video, so I'll give it a try. First let's install all the modules.

In [None]:
!pip install torch numpy transformers datasets tiktoken wandb tqdm

In [None]:
!git clone https://github.com/karpathy/nanogpt

Cloning into 'nanogpt'...
remote: Enumerating objects: 682, done.[K
remote: Total 682 (delta 0), reused 0 (delta 0), pack-reused 682[K
Receiving objects: 100% (682/682), 952.47 KiB | 19.84 MiB/s, done.
Resolving deltas: 100% (385/385), done.


In [None]:
%cd nanogpt

/content/nanogpt


In [None]:
!python data/shakespeare_char/prepare.py

length of dataset in characters: 1,115,394
all the unique characters: 
 !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
vocab size: 65
train has 1,003,854 tokens
val has 111,540 tokens


## Error with T4 GPU

In order to have the Triton compiler running you have to change the train.py. Add these two lines:

``` python
import torch._dynamo
torch._dynamo.config.suppress_errors = True
```

It still threw a lot of errors, but the Triton compiler compiled regardless. After a little more than an hour the model is trained.

In [None]:
!python train.py config/train_shakespeare_char.py

This took 1h 8m 4s on 2024/07/29 and the T4 runtime. GPU RAM went up to 2.9 GB/15 GB, RAM to 3 GB.

Start:
```
step 0: train loss 4.2874, val loss 4.2823
iter 0: loss 4.2663, time 88620.85ms, mfu -100.00%
iter 10: loss 3.2415, time 527.00ms, mfu 0.71%
iter 20: loss 2.7773, time 528.23ms, mfu 0.71%
```

Result:
```
step 5000: train loss 0.6232, val loss 1.7158
iter 5000: loss 0.8169, time 70561.92ms, mfu 0.64%
```

The time per iteration is rather constant at 520 ms on the T4 GPU with 16 GB, so the total estimated time for 5000 iterations should be:

In [None]:
cycle_time = 520e-3
time_s = 5000 * cycle_time
time_m = time_s / 60
time_h = time_m / 60
print(f"The estimated time is {int(time_s)} seconds or about {int(time_m)} minutes.")

The estimated time is 2600 seconds or about 43 minutes.


Here you have it. Estimated time is 43 minutes. But each 250 steps we get another evaluation of the last steps that takes about 1 minutes. These 20 extra steps need 20 minutes. And the initial compile throws some errors, we had to add two lines to the train.py:

``` python
import torch._dynamo
torch._dynamo.config.suppress_errors = True
```

It still threw a lot of errors, but the Triton compiler compiled regardless. Some of the error messages are:

``` sh
compiling the model... (takes a ~minute)
torch/_dynamo/convert_frame.py:824] WON'T CONVERT forward /content/nanogpt/model.py line 170
torch/_dynamo/convert_frame.py:824] due to:
torch/_dynamo/convert_frame.py:824] Traceback (most recent call last):
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
RuntimeError: Internal Triton PTX codegen error:
ptxas /tmp/compile-ptx-src-0254d7, line 636; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-0254d7, line 636; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-0254d7, line 638; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-0254d7, line 638; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-0254d7, line 640; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-0254d7, line 640; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-0254d7, line 642; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-0254d7, line 642; error   : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas fatal   : Ptx assembly aborted due to errors
```

In [None]:
!python sample.py --out_dir=out-shakespeare-char

Overriding: out_dir = out-shakespeare-char
number of parameters: 10.65M
Loading meta from data/shakespeare_char/meta.pkl...


Clown:
So, who is her lady?

AUTOLYCUS:
A man and raze the gates start that us are very redelieved,
and that then and more than what they are but mildly;
and their eyes against the clutchyard, and now arms
To be pelish to the tenther death.

AUTOLYCUS:
What manners? nor of this death?

Clown:
Be not only so what evils? what thou wilt not denies?--

AUTOLYCUS:
Nor I thrust the prince to shame to all of me;
For I do with the house of the same advance.

Clown:
To help Marcius, Coriolanus!

VIRGILI
---------------

Men please your grace and given home.

LUCIO:
Well, sir, we will not have been so no well one.

DUKE VINCENTIO:
Do you but think you? peace, sir?

ISABELLA:
I take your cousin, if you throw the utmost of your cast:
I content; and now I will die to some court-probation. Have
your will serve your lordship stops to the right
From Irely in his unbrother's nob

This is phantastic! And all done in the cloud!