sectorllm

The world's smallest llama2 inference engine

A complete Llama2 inference engine that fits in 1356 bytes of x86 real mode assembly. It boots directly from disk, loads a quantized model, and generates text before any operating system loads.

It can run the stories260K model, a tiny model trained on children's stories, with 260K parameters across 5 layers, 8 attention heads, and a vocabulary of 512 tokens.

Running

./download.sh && python3 quantize.py && make run

How it works

The boot sector loads the model data from disk into high memory, then runs a full transformer forward pass for each token.

A python script (quantize.py) packs the model into a custom binary format designed for minimal decoding overhead, Weights are quantized to int8 with a global absmax scale, lookup tables for exp and silu are precomputed and embedded directly, and the Q/K/V and gate/up weight matrices are fused so the assembly can issue a single matmul call rather than three.

The KV cache is quantized to int8 at runtime with a per-token scale stored in a separate buffer, keeping the cache small enough to fit in the available segment space for the full 512-token context.

Sampling is done only via greedy argmax at the moment. There should be enough space for a fancier sampling technique, but the goal was to minimize space.

Limitations

The code is written to be as golfed as possible, therefore performance and precision are not optimal.
The model architecture and prompt are hardcoded.

It could technically be possible to modify this to load a larger model (like stories15M), but this would require switching to protected mode (or possibly unreal mode).

Contributing

If you are an assembly god and can find a way to shrink the binary size, please contribute! The goal is to show what is possible in the least amount of bytes possible without cheating. Don't forget to add your name to the code ;)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
download.sh		download.sh
image.png		image.png
quantize.py		quantize.py
sectorllm.asm		sectorllm.asm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sectorllm

Running

How it works

Limitations

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sectorllm

Running

How it works

Limitations

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages