knlp: Kernel-Style Machine Learning

Rapid prototyping and automation for open source ML R&D

Applying Linux kernel development methodologies to machine learning research for rapid iteration and reproducible experimentation. Kconfig-driven configuration, defconfig presets, Makefile automation, and rigorous test matrices enable fast prototyping of transformer architectures, pruning algorithms, and optimization techniques while maintaining reproducibility and collaboration at scale.

Browse Interactive Demos

Research Highlights

Area	Result	Docs	Demo
Unified Signal	FIM diagonal ≈ Adam exp_avg_sq — unifies compression, pruning, and tiering	docs	demo
FIM-Guided Quantization	Diagonal Fisher identifies critical tensors. 1.26% better PPL at 1.8% size increase	docs	demo
KVSplice	~20% extra compression on top of MLA (7.2x vs 6x), 25% better PPL, +7 HellaSwag	docs	demo
Reciprocal Attention	Learned Q@K.T ↔ K@Q.T alternation. 5% better PPL, +2 HellaSwag	docs	demo
Adam State-Based Pruning	bitter7 achieves 15.6% better PPL than magnitude baseline (37.28 vs 44.15)	docs	demo
Page-Aware GNN Training	4× better I/O locality (6.8× vs 28.5× RA) with zero quality loss on DGraphFin	docs	demo
KV Bandwidth Scaling	Decode governed by memory bandwidth across 3 GPU architectures (7.6× BW range). 384K context on B200	docs	demo

Research Tracks

Bandwidth-Proportional Attention (BPA)

knlp explores bandwidth-aware transformer inference systems including KV cache scaling, compression, and selective memory access. Measurements across AMD RDNA 3, NVIDIA Hopper, and NVIDIA Blackwell confirm that autoregressive decode performance is governed by memory bandwidth, not compute capacity or model architecture. BPA investigates architectures where KV memory access per token scales with available bandwidth rather than full context length.

See docs/bpa.md for the current high-level BPA story, docs/paper/bpa/evolution.md for how RGSA evolved into BPA and then into fused KV quantization, and the KV Bandwidth visualization for the current generic public explanation that decode is the issue. A complementary structural explainer is available at AR Decode Bottleneck. A first public writeup of the concrete fused-kernel result is available at docs/fused_kv_quantization.md.

Paper-facing experiment scaffolding for the BPA KV scaling work lives in docs/paper/bpa/ and scripts/paper/bpa_paper/. These docs/scripts define smoke tests, matrix plans, manifest validation, fit-output contracts, and clean export packaging for the future knlp-paper-memory-decode results tree.

Development Philosophy

knlp applies Linux kernel development practices to machine learning research:

Kconfig-based configuration: Hierarchical menus for experiment management (like make menuconfig)
Defconfig presets: Reproducible configurations for different hardware and research goals
Makefile-driven builds: Consistent build and test workflows across models
Documented decisions: Every architectural choice explained in docs/
Rigorous validation: Automated test matrices before merging experiments

See docs/architecture.md for details on the kernel-inspired infrastructure.

Installation

For systems using torch.compile(), Python development headers are required:

# Ubuntu/Debian
sudo apt-get install python3-dev

# RHEL/CentOS/Fedora
sudo yum install python3-devel

pip install -r requirements.txt
wandb login # optional

make defconfig-gpt2-vanilla-baseline
make

See docs/quickstart.md for detailed workflow.

Contributing

knlp welcomes contributions.

Citation

If you use this work, please cite:

@misc{knlp2025,
  title        = {knlp: Kernel-Style Machine Learning - Transformer Architecture Research},
  author       = {Luis Chamberlain and contributors},
  year         = {2025},
  howpublished = {\url{https://github.com/mcgrof/knlp}},
  note         = {Collaborative ML research using Linux kernel development workflows}
}

License

This project is licensed under the MIT License.

Code: MIT license
Models: AI models generated by this project can be licensed as you choose
Documentation: CC-BY-SA 4.0 (collaborative, share-alike)

See LICENSE for details.

KNLP

BPA paper scaffolding

Paper-oriented KV scaling scaffolding now lives under scripts/paper/bpa_paper/ with supporting docs in docs/paper/bpa/.

The scaffold provides:

a canonical results/knlp-paper-memory-decode/ tree with raw/, derived/, figures/, manifests/, logs/, system/, and reports/
device configs for a100, h100, b200, and w7900
dry-run capable scripts for smoke validation, matrix planning, fit planning, and public-subset packaging
lightweight manifest/config validation coverage in tests/test_bpa_paper_manifest.py

Example dry-run commands:

python -m scripts.paper.bpa_paper.run_smoke --dry-run
python -m scripts.paper.bpa_paper.run_matrix --dry-run --devices a100 h100
python -m scripts.paper.bpa_paper.fit_scaling --dry-run
python -m scripts.paper.bpa_paper.package_results --dry-run

Name		Name	Last commit message	Last commit date
Latest commit History 1,538 Commits
.kdevops		.kdevops
LICENSES		LICENSES
backends		backends
configs		configs
docs		docs
fim		fim
gnn		gnn
gpt2		gpt2
images		images
lenet5		lenet5
lib		lib
methods		methods
patches		patches
plots		plots
resnet18		resnet18
resnet50		resnet50
rgsa_compare_20260128_065948		rgsa_compare_20260128_065948
rgsa_v14_results		rgsa_v14_results
rgsa_v16_results		rgsa_v16_results
rgsa_v18_results		rgsa_v18_results
rgsa_v19_results		rgsa_v19_results
scripts		scripts
tests		tests
utils		utils
.gitignore		.gitignore
.kdevops-1.0.0.tar.xz		.kdevops-1.0.0.tar.xz
CLAUDE.md		CLAUDE.md
CODEX.md		CODEX.md
CONTRIBUTING		CONTRIBUTING
COPYING		COPYING
Kconfig		Kconfig
Kconfig.models		Kconfig.models
Kconfig.optimizers		Kconfig.optimizers
Kconfig.pruning		Kconfig.pruning
Kconfig.ra_mla		Kconfig.ra_mla
LICENSE		LICENSE
Makefile		Makefile
Makefile.kconfig		Makefile.kconfig
README.md		README.md
compare_results.sh		compare_results.sh
data		data
eval_harness.py		eval_harness.py
eval_v14b.py		eval_v14b.py
eval_v15.py		eval_v15.py
eval_v16.py		eval_v16.py
eval_v18.py		eval_v18.py
eval_v19.py		eval_v19.py
eval_v20.py		eval_v20.py
eval_v21.py		eval_v21.py
eval_v22.py		eval_v22.py
eval_v23.py		eval_v23.py
eval_v24.py		eval_v24.py
eval_v26.py		eval_v26.py
eval_v27.py		eval_v27.py
requirements.txt		requirements.txt
run_state_pruning_compare.sh		run_state_pruning_compare.sh
test_sensitivity.py		test_sensitivity.py
test_variance_allocation.py		test_variance_allocation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

knlp: Kernel-Style Machine Learning

Research Highlights

Research Tracks

Bandwidth-Proportional Attention (BPA)

Development Philosophy

Installation

Contributing

Citation

License

KNLP

BPA paper scaffolding

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

knlp: Kernel-Style Machine Learning

Research Highlights

Research Tracks

Bandwidth-Proportional Attention (BPA)

Development Philosophy

Installation

Contributing

Citation

License

KNLP

BPA paper scaffolding

About

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages