From-scratch implementation of a GPT-style transformer allowing to peek inside during inference/training.
- Runs entirely on CPU
- No network/API calls nor ML frameworks
- Pure Go, OpenBLAS can be optionally linked in for faster matrix products
$ go run . -mode train -model models/names -data ./data/names -text -v 200 \
-dmodel 32 -ctx 8 -blocks 2 -attn 2 -mlp 2 \
-iters 1000 -lr 0.01 -ub 64
This trains a character-level transformer to generate names:
Model:
- 32-dimensional embedding vectors
- context size of 8 tokens
- 2 blocks
- 2 attention heads per-block
- ~19k parameters
Training:
- location of training data
data/names - validation set size 200
- 1000 iterations (Adam, learning rate 0.01)
- batch size 64
Training above takes 2 seconds on my Zen5 CPU.
$ go run . -mode prompt -model ./models/names -text -prompt 'adam' -n 50
Sample output:
adam
allaunex
bandero
briestyn
nelun
kad
feren
dondlyn
$ go run . -mode peek -model ./models/names -prompt 'adam'
$ go run . -mode peek -attention -model ./models/names -prompt 'briestyn'
$ go test