# A basic rundown of this repository

We start by loading the base `config.json`

In [1]:
import json
import types

with open("../config.json") as fd:
    config = json.load(fd, object_hook=lambda d: types.SimpleNamespace(**d))
config

namespace(name='nlab-gpt',
          seed=1728,
          tokenizer='data/tokenizer.json',
          special_tokens=['<|startoftext|>', '<|endoftext|>'],
          vocab_size=8192,
          train_data='data/train_data.pt',
          val_data='data/val_data.pt',
          max_position_embeddings=512,
          hidden_size=256,
          intermediate_size=1024,
          num_hidden_layers=8,
          num_attention_heads=4,
          num_key_value_heads=2,
          tie_word_embeddings=True,
          dropout=0.1,
          rms_norm_eps=1e-05,
          per_device_train_batch_size=16,
          learning_rate=0.0003,
          max_steps=40000,
          eval_steps=500,
          max_eval_samples=50,
          runs_dir='runs',
          checkpoint_steps=1000)

We then create a new `Run` and inspect the model

In [2]:
import sys

sys.path.append("../src")
import train

In [3]:
import torch
import torchinfo

run = train.Run(config)

run.model.eval()
info = torchinfo.summary(
    run.model,
    input_size=(
        run.config.per_device_train_batch_size,
        run.config.max_position_embeddings,
    ),
    dtypes=[torch.long],
)
run.model.train()
info

Layer (type:depth-idx)                        Output Shape              Param #
Transformer                                   [16, 512, 8192]           --
├─Embedding: 1-1                              [16, 512, 256]            2,097,152
├─Embedding: 1-2                              [512, 256]                131,072
├─Dropout: 1-3                                [16, 512, 256]            --
├─ModuleList: 1-4                             --                        --
│    └─DecoderLayer: 2-1                      [16, 512, 256]            --
│    │    └─RMSNorm: 3-1                      [16, 512, 256]            256
│    │    └─GroupedQueryAttention: 3-2        [16, 512, 256]            196,608
│    │    └─RMSNorm: 3-3                      [16, 512, 256]            256
│    │    └─FeedForward: 3-4                  [16, 512, 256]            524,288
│    └─DecoderLayer: 2-2                      [16, 512, 256]            --
│    │    └─RMSNorm: 3-5                      [16, 512, 256]           

We create a new `Trainer` for our model

In [4]:
import logging

# We disable logging for the notebook
logging.disable(logging.CRITICAL)

# We need to change the data paths relative to the "notebooks/" directory
run.config.tokenizer = "../data/tokenizer.json"
run.config.train_data = "../data/train_data.pt"
run.config.val_data = "../data/val_data.pt"

# Change the maximum number of steps to something tractable inside the notebook
run.config.max_steps = 1000
run.config.eval_steps = 1000

trainer = train.Trainer(run, device="cuda")
trainer

<train.Trainer at 0x7f1ccef20510>

And now we're all set to run the actual training!

In [5]:
trainer.train(jupyter_notebook=True, no_warmup=True)

Training:   0%|          | 0/1000 [00:00<?, ?steps/s]

Now let's try our model

In [6]:
import infer


def generate_tokens(run, text, tokens=128, tokenizer=None):
    print(text, end="", flush=True)
    for token in infer.generate(
        run.model, text, tokens=tokens, tokenizer=(tokenizer or run.config.tokenizer)
    ):
        print(token, end="", flush=True)

In [30]:
generate_tokens(trainer.run, "A simplicial set is")

A simplicial set is be a context of homotopy theory is not to the above in the same in it is a a right is the category of the model theory of the _E: $X$ be a $X$ is a $T$ but $N$.

The category of the equivalence of the sense of $X$ is a $x$, $U$ that $K_1$ is a $c_n$ is a $X$ (the $n$ be a morphism is one.


The closed $F)$ be a $A$ are the $C$ is a $x}$, such that $X$

Now let's try a checkpoint that was trained for much longer

In [15]:
best = train.Run.from_file(
    "../runs/nlab-gpt-8.0M-a8525231/nlab-gpt-8.0M-a8525231-39000-best-7.21.pt"
)
print(
    f"Best validation loss: {best.best_validation_loss} | Our undertrained model: {run.best_validation_loss}"
)

Best validation loss: 1.9755235528945922 | Our undertrained model: 4.786942892074585


In [28]:
generate_tokens(best, "A simplicial set is", tokenizer="../data/tokenizer.json")

A simplicial set is a simplicial set that satisfies the Segal condition:

* all hom-objects are the morphisms that differ in $A$;

* the morphisms are the morphisms of simplicial sets.

=--

+-- {: .num_remark}
###### Remark

The intrinsic notion of _simplicial complexes_, def. \ref{ASimplicialSet}, is a model for a simplicial set object in the model structure on simplicial sets.

=--

+-- {: .num_remark }
###### Remark

For $X \in \mathbf{H}$ a simplicial set, the simplicial sets $C(\{X\})$ are