Skip to content

ruixiang63/microgpt-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

microgpt-cpp

C++ port of Karpathy's microgpt — a minimal, dependency-free GPT training and inference implementation.

This version faithfully reproduces the original scalar autograd + GPT architecture in C++, and adds cuBLAS-accelerated linear layers (cublasSgemv for forward, cublasSgemv + cublasSger for backward) to offload matrix-vector operations to the GPU.

Structure

src/
├── main.cpp            # data loading, training loop, sampling
├── value.h             # scalar autograd engine (Value with backward)
├── gpt.h               # GPT forward pass: linear, softmax, rmsnorm, multi-head attention
└── linear_cublas.h     # cuBLAS wrappers for gemv forward & backward

Build

With CUDA (default)

The default build uses cuBLAS to accelerate linear layers. Requires CUDA toolkit (tested with CUDA 12.8). Adjust the CUDA path in CMakeLists.txt if needed.

cmake -B build
cmake --build build
./build/microgpt

CPU only

To build without CUDA, uncomment the CPU linear loop and comment out the cuBLAS block, then remove the CUDA-related lines from CMakeLists.txt.

The program auto-downloads names.txt from the makemore dataset if input.txt is not present.

Model

Param Value
n_embd 64
n_head 4
n_layer 1
block_size 16

Follows GPT-2 style with minor differences (same as the original): RMSNorm instead of LayerNorm, ReLU instead of GeLU, no biases. Trains with Adam (linear LR decay) for 1000 steps, then samples 20 names with temperature 0.5.

About

C++ version of MicroGPT with GPU acceleration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors