# Welcome to Modal notebooks!

Write Python code and collaborate in real time. Your code runs in Modal's
**serverless cloud**, and anyone in the same workspace can join.

This notebook comes with some common Python libraries installed. Run
cells with `Shift+Enter`.

## 1. Setting the environment

In [1]:
# Clone the repository
!git clone https://github.com/karpathy/nanochat.git
%cd nanochat

Cloning into 'nanochat'...
remote: Enumerating objects: 989, done.[K
remote: Counting objects:   0% (1/162)[Kremote: Counting objects:   1% (2/162)[Kremote: Counting objects:   2% (4/162)[Kremote: Counting objects:   3% (5/162)[Kremote: Counting objects:   4% (7/162)[Kremote: Counting objects:   5% (9/162)[Kremote: Counting objects:   6% (10/162)[Kremote: Counting objects:   7% (12/162)[Kremote: Counting objects:   8% (13/162)[Kremote: Counting objects:   9% (15/162)[Kremote: Counting objects:  10% (17/162)[Kremote: Counting objects:  11% (18/162)[Kremote: Counting objects:  12% (20/162)[Kremote: Counting objects:  13% (22/162)[Kremote: Counting objects:  14% (23/162)[Kremote: Counting objects:  15% (25/162)[Kremote: Counting objects:  16% (26/162)[Kremote: Counting objects:  17% (28/162)[Kremote: Counting objects:  18% (30/162)[Kremote: Counting objects:  19% (31/162)[Kremote: Counting objects:  20% (33/162)[Kremote: Counting objects:  21% 

In [2]:
# Set the base directory for artifacts
import os
os.environ['NANOCHAT_BASE_DIR'] = '/content/nanochat_data'
!mkdir -p $NANOCHAT_BASE_DIR

# Install uv package manager
!curl -LsSf https://astral.sh/uv/install.sh | sh

# Add uv and cargo to the PATH
os.environ['PATH'] = f"/root/.cargo/bin:/root/.local/bin:{os.environ['PATH']}"

# Create a virtual environment and install dependencies
!uv venv
!uv sync
print('-'*40)
print('Environment Set')

# Install Rust/Cargo for the tokenizer
!curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
print('-'*40)
print('Installed Rust/Cargo')

downloading uv 0.9.24 x86_64-unknown-linux-gnu
no checksums to verify
installing to /root/.local/bin
  uv
  uvx
everything's installed!

To add $HOME/.local/bin to your PATH, either restart your shell or run:

    source $HOME/.local/bin/env (sh, bash, zsh)
    source $HOME/.local/bin/env.fish (fish)
[31mWARN[m: The following commands are shadowed by other commands in your PATH: uv uvx
[2mcpython-3.10.19-linux-x86_64-gnu (download)[0m [32m[30m[2m------------------------------[0m[0m     0 B/27.84 MiB                                                                    [1A[2K[1B[2K[1A[2mcpython-3.10.19-linux-x86_64-gnu (download)[0m [32m[30m[2m------------------------------[0m[0m 16.00 KiB/27.84 MiB                                                                  [1A[2K[1B[2K[1A[2mcpython-3.10.19-linux-x86_64-gnu (download)[0m [32m[30m[2m------------------------------[0m[0m 32.00 KiB/27.84 MiB                                                 

## 3. Tokenizer

In [4]:
from huggingface_hub import hf_hub_download
# Download the specific file and get its local file path
tokenizer_pkl_path = hf_hub_download(repo_id="karpathy/nanochat-d34", filename="tokenizer.pkl")
tokenizer_pt_path = hf_hub_download(repo_id="karpathy/nanochat-d34", filename="token_bytes.pt")

tokenizer.pkl:   0%|          | 0.00/846k [00:00<?, ?B/s]

token_bytes.pt:   0%|          | 0.00/264k [00:00<?, ?B/s]

In [14]:
tokenizer_pkl_path

'/root/.cache/huggingface/hub/models--karpathy--nanochat-d34/snapshots/c48357d43863a3a6cdc5f5db5b4ec5964e4192d6/tokenizer.pkl'

In [5]:
# Copying the file into the correct folder
!mkdir '../../content/nanochat_data/tokenizer/'
!cp '/root/.cache/huggingface/hub/models--karpathy--nanochat-d34/snapshots/c48357d43863a3a6cdc5f5db5b4ec5964e4192d6/tokenizer.pkl' '/content/nanochat_data/tokenizer/tokenizer.pkl'
!cp '/root/.cache/huggingface/hub/models--karpathy--nanochat-d34/snapshots/c48357d43863a3a6cdc5f5db5b4ec5964e4192d6/token_bytes.pt' '/content/nanochat_data/tokenizer/token_bytes.pt'

## 4. Downloading SFT Trained Weights

Because you need almost more than 2 hours for Midtraining and Supervised Fine Tuning

In [6]:
# Download the specific file and get its local file path
d34_stf_gpt_model_path = hf_hub_download(repo_id="renatocastro33/nanochat-d34-sft", filename="chatsft_checkpoints/d34/model_000669.pt")
d34_sft_gpt_json_path = hf_hub_download(repo_id="renatocastro33/nanochat-d34-sft", filename="chatsft_checkpoints/d34/meta_000669.json")

chatsft_checkpoints/d34/model_000669.pt:   0%|          | 0.00/8.58G [00:00<?, ?B/s]

meta_000669.json:   0%|          | 0.00/260 [00:00<?, ?B/s]

In [7]:
d34_stf_gpt_model_path

'/root/.cache/huggingface/hub/models--renatocastro33--nanochat-d34-sft/snapshots/8e0266180c5c395b6f53e6625347370e2db645e7/chatsft_checkpoints/d34/model_000669.pt'

In [8]:
!mkdir '../../content/nanochat_data/chatsft_checkpoints/'

In [9]:
!mkdir '../../content/nanochat_data/chatsft_checkpoints/d34'

In [10]:
!cp '/root/.cache/huggingface/hub/models--renatocastro33--nanochat-d34-sft/snapshots/8e0266180c5c395b6f53e6625347370e2db645e7/chatsft_checkpoints/d34/model_000669.pt' '/content/nanochat_data/chatsft_checkpoints/d34/model_000669.pt'
!cp '/root/.cache/huggingface/hub/models--renatocastro33--nanochat-d34-sft/snapshots/8e0266180c5c395b6f53e6625347370e2db645e7/chatsft_checkpoints/d34/meta_000669.json' '/content/nanochat_data/chatsft_checkpoints/d34/meta_000669.json'

## 5. Inference

In [11]:
# Chat with the model via the Command Line Interface (CLI)
!bash -c 'source .venv/bin/activate && python -m scripts.chat_cli -p "Why the sky is blue?"'

Autodetected device type: cuda
2026-01-13 00:27:43,076 - nanochat.common - [32m[1mINFO[0m - Distributed world size: 1
2026-01-13 00:27:43,146 - nanochat.checkpoint_manager - [32m[1mINFO[0m - No model tag provided, guessing model tag: d34
2026-01-13 00:27:43,147 - nanochat.checkpoint_manager - [32m[1mINFO[0m - Loading model from /content/nanochat_data/chatsft_checkpoints/d34 with step 669
2026-01-13 00:27:47,968 - nanochat.checkpoint_manager - [32m[1mINFO[0m - Building model with config: {'sequence_len': 2048, 'vocab_size': 65536, 'n_layer': 34, 'n_head': 17, 'n_kv_head': 17, 'n_embd': 2176, 'window_pattern': 'L'}

NanoChat Interactive Mode
--------------------------------------------------
Type 'quit' or 'exit' to end the conversation
Type 'clear' to start a new conversation
--------------------------------------------------

Assistant: The sky appears blue due to a phenomenon known as Rayleigh scattering, named after the British physicist Lord Rayleigh. When su

In [18]:
# Chat with the model via the Command Line Interface (CLI)
!bash -c 'source .venv/bin/activate && python -m scripts.chat_cli -p "If x = 3, how is 3*x + 5?" --temperature 0.3'

Autodetected device type: cuda
2026-01-13 00:30:01,224 - nanochat.common - [32m[1mINFO[0m - Distributed world size: 1
2026-01-13 00:30:01,249 - nanochat.checkpoint_manager - [32m[1mINFO[0m - No model tag provided, guessing model tag: d34
2026-01-13 00:30:01,249 - nanochat.checkpoint_manager - [32m[1mINFO[0m - Loading model from /content/nanochat_data/chatsft_checkpoints/d34 with step 669
2026-01-13 00:30:05,898 - nanochat.checkpoint_manager - [32m[1mINFO[0m - Building model with config: {'sequence_len': 2048, 'vocab_size': 65536, 'n_layer': 34, 'n_head': 17, 'n_kv_head': 17, 'n_embd': 2176, 'window_pattern': 'L'}

NanoChat Interactive Mode
--------------------------------------------------
Type 'quit' or 'exit' to end the conversation
Type 'clear' to start a new conversation
--------------------------------------------------

Assistant: To evaluate 3*x + 5, we need to follow the order of operations, which states that we should perform any operations inside paren

In [19]:
# Chat with the model via the Command Line Interface (CLI)
!bash -c 'source .venv/bin/activate && python -m scripts.chat_cli -p "Calculate 5*x + 3 = 13, then x is equal to ?" --temperature 0.2' -

Autodetected device type: cuda
2026-01-13 00:30:23,492 - nanochat.common - [32m[1mINFO[0m - Distributed world size: 1
2026-01-13 00:30:23,527 - nanochat.checkpoint_manager - [32m[1mINFO[0m - No model tag provided, guessing model tag: d34
2026-01-13 00:30:23,527 - nanochat.checkpoint_manager - [32m[1mINFO[0m - Loading model from /content/nanochat_data/chatsft_checkpoints/d34 with step 669
2026-01-13 00:30:28,080 - nanochat.checkpoint_manager - [32m[1mINFO[0m - Building model with config: {'sequence_len': 2048, 'vocab_size': 65536, 'n_layer': 34, 'n_head': 17, 'n_kv_head': 17, 'n_embd': 2176, 'window_pattern': 'L'}

NanoChat Interactive Mode
--------------------------------------------------
Type 'quit' or 'exit' to end the conversation
Type 'clear' to start a new conversation
--------------------------------------------------

Assistant: To solve for x, we need to isolate x on one side of the equation. We can start by subtracting 3 from both sides of the equation

## 6. Deploying in NGROK

In [56]:
%uv pip install pyngrok

[2mUsing Python 3.12.6 environment at: /usr/local[0m
[37m⠋[0m [2mResolving dependencies...                                                     [0m[2K[37m⠋[0m [2mResolving dependencies...                                                     [0m[2K[37m⠙[0m [2mResolving dependencies...                                                     [0m[2K[37m⠙[0m [2mpyngrok==7.5.0                                                                [0m[2K[37m⠙[0m [2mpyyaml==6.0.2                                                                 [0m[2K[37m⠙[0m [2m                                                                              [0m[2K[2mResolved [1m2 packages[0m [2min 57ms[0m[0m
[37m⠋[0m [2mPreparing packages...[0m (0/0)                                                   [2K[37m⠋[0m [2mPreparing packages...[0m (0/1)                                                   [2K[37m⠙[0m [2mPreparing packages...[0m (0/1)                             

In [65]:
from pyngrok import ngrok
authtoken = "82pQNRYrr19fLeBVzxgnU_7LBDdaEy1z5oEbqw9qKQf"
ngrok.set_auth_token(authtoken)
public_url = ngrok.connect(8000)
print(f'Click to access the Web UI: {public_url}')

Click to access the Web UI: NgrokTunnel: "https://6c0b4c788d7f.ngrok-free.app" -> "http://localhost:8000"


t=2026-01-11T20:47:47+0000 lvl=warn msg="failed to open private leg" id=f7266d9824a8 privaddr=localhost:8000 err="dial tcp 127.0.0.1:8000: connect: connection refused"
t=2026-01-11T20:56:10+0000 lvl=warn msg="failed to open private leg" id=614517422a5a privaddr=localhost:8000 err="dial tcp 127.0.0.1:8000: connect: connection refused"
t=2026-01-11T23:18:38+0000 lvl=warn msg="failed to open private leg" id=8a2ad1aa6ab5 privaddr=localhost:8000 err="dial tcp 127.0.0.1:8000: connect: connection refused"
t=2026-01-11T23:23:04+0000 lvl=warn msg="failed to open private leg" id=60e659b50b3a privaddr=localhost:8000 err="dial tcp 127.0.0.1:8000: connect: connection refused"
t=2026-01-11T23:35:58+0000 lvl=warn msg="Stopping forwarder" name=http-8000-6b56b689-738e-4f96-b807-7b8b524e8667 acceptErr="failed to accept connection: Listener closed"


In [116]:
!bash -c "source .venv/bin/activate && python -m scripts.chat_web --temperature 0.6 --max-tokens 256 --top-k 50"

Autodetected device type: cuda
2026-01-11 23:32:47,984 - nanochat.common - [32m[1mINFO[0m - Distributed world size: 1
Starting NanoChat Web Server
Temperature: 0.6, Top-k: 50, Max tokens: 256
[32mINFO[0m:     Started server process [[36m15312[0m]
[32mINFO[0m:     Waiting for application startup.
Loading nanochat models across GPUs...
Initializing worker pool with 1 GPUs...
Loading model on GPU 0...
2026-01-11 23:32:48,022 - nanochat.checkpoint_manager - [32m[1mINFO[0m - No model tag provided, guessing model tag: d34
2026-01-11 23:32:48,022 - nanochat.checkpoint_manager - [32m[1mINFO[0m - Loading model from /content/nanochat_data/chatsft_checkpoints/d34 with step 669
2026-01-11 23:32:53,705 - nanochat.checkpoint_manager - [32m[1mINFO[0m - Building model with config: {'sequence_len': 2048, 'vocab_size': 65536, 'n_layer': 34, 'n_head': 17, 'n_kv_head': 17, 'n_embd': 2176}
All 1 workers initialized!
Server ready at http://localhost:8000
[32mINFO[0m:     Ap