# Exploring the project

The function `load_encoder_hparams_and_params` is used to download and then load the `encoder`, hyperparameters `hparams`, and model parameters `params`.

In [4]:
from utils import load_encoder_hparams_and_params

encoder, hparams, params = load_encoder_hparams_and_params("124M", "models")

Fetching checkpoint: 1.00kb [00:00, 304kb/s]                                                        
Fetching encoder.json: 1.04Mb [00:00, 3.10Mb/s]                                                     
Fetching hparams.json: 1.00kb [00:00, 421kb/s]                                                      
Fetching model.ckpt.data-00000-of-00001: 498Mb [00:22, 22.3Mb/s]                                    
Fetching model.ckpt.index: 6.00kb [00:00, 1.25Mb/s]                                                 
Fetching model.ckpt.meta: 472kb [00:00, 2.08Mb/s]                                                   
Fetching vocab.bpe: 457kb [00:00, 1.65Mb/s]                                                         


## Encoder

This is the BPE tokenizer used with GPT-2.

In [5]:
ids = encoder.encode("Not all heroes wear capes.")
ids  # token indices

[3673, 477, 10281, 5806, 1451, 274, 13]

In [6]:
encoder.decode(ids)

'Not all heroes wear capes.'

In [7]:
for x in ids:
    print(encoder.decoder[x])

Not
Ġall
Ġheroes
Ġwear
Ġcap
es
.


> Notice, sometimes our tokens are words (e.g. Not), sometimes they are words but with a space in front of them (e.g. Ġall, the [Ġ represents a space](https://github.com/karpathy/minGPT/blob/37baab71b9abea1b76ab957409a1cc2fbfba8a26/mingpt/bpe.py#L22-L33)), sometimes there are part of a word (e.g. capes is split into Ġcap and es), and sometimes they are punctuation (e.g. .).

In [8]:
print(f"vocabulary size: {len(encoder.decoder)=}")

vocabulary size: len(encoder.decoder)=50257


## Hyperparameters

In [10]:
from pprint import pprint

pprint(hparams)

{'n_ctx': 1024, 'n_embd': 768, 'n_head': 12, 'n_layer': 12, 'n_vocab': 50257}


Which are:

- `n_vocab`: number of tokens in our vocabulary
- `n_ctx`: maximum possible sequence length of the input
- `n_embd`: embedding dimension (determines the "width" of the network)
- `n_head`: number of attention heads (n_embd must be divisible by n_head)
- `n_layer`: number of layers (determines the "depth" of the network)

## Parameters

> `params` is a nested json dictionary that hold the trained weights of our model.
> The leaf nodes of the json are NumPy arrays.
> If we print params, replacing the arrays with their shapes, we get:

In [11]:
import numpy as np

def shape_tree(tree):
    if isinstance(tree, np.ndarray):
        return list(tree.shape)
    elif isinstance(tree, list):
        return [shape_tree(v) for v in tree]
    elif isinstance(tree, dict):
        return {k: shape_tree(v) for k, v in tree.items()}
    else:
        raise ValueError(f"unexpected instance type {type(tree)=}")

In [19]:
shape_tree(params);

These are loaded from the original OpenAI tensorflow checkpoint.

In [16]:
import tensorflow as tf

tf_ckpt_path = tf.train.latest_checkpoint("models/124M")
for name, _ in tf.train.list_variables(tf_ckpt_path):
    arr = tf.train.load_variable(tf_ckpt_path, name).squeeze()
    # print(f"{name}: {arr.shape}")


The following code converts the params from tensorflow to numpy, and it is called in the `load_encoder_hparams_and_params` function.
This means that I can skip this step when reimplementing in tensorflow!

In [23]:
from utils import load_gpt2_params_from_tf_ckpt

load_gpt2_params_from_tf_ckpt??

[0;31mSignature:[0m [0mload_gpt2_params_from_tf_ckpt[0m[0;34m([0m[0mtf_ckpt_path[0m[0;34m,[0m [0mhparams[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m <no docstring>
[0;31mSource:[0m   
[0;32mdef[0m [0mload_gpt2_params_from_tf_ckpt[0m[0;34m([0m[0mtf_ckpt_path[0m[0;34m,[0m [0mhparams[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0mset_in_nested_dict[0m[0;34m([0m[0md[0m[0;34m,[0m [0mkeys[0m[0;34m,[0m [0mval[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0;32mif[0m [0;32mnot[0m [0mkeys[0m[0;34m:[0m[0;34m[0m
[0;34m[0m            [0;32mreturn[0m [0mval[0m[0;34m[0m
[0;34m[0m        [0;32mif[0m [0mkeys[0m[0;34m[[0m[0;36m0[0m[0;34m][0m [0;32mnot[0m [0;32min[0m [0md[0m[0;34m:[0m[0;34m[0m
[0;34m[0m            [0md[0m[0;34m[[0m[0mkeys[0m[0;34m[[0m[0;36m0[0m[0;34m][0m[0;34m][0m [0;34m=[0m [0;34m{[0m[0;34m}[0m[0;34m[0m
[0;34m[0m        [0md[0

In [20]:
np_params = load_gpt2_params_from_tf_ckpt(tf_ckpt_path, hparams)