llama.cpp-moe

llama.cpp-moe is built for practical Mixture-of-Experts (MoE) inference: local, efficient, and understandable.

This repository centers on MoE-first workflows with lightweight controls that make expert behavior visible, tunable, and easy to reason about.

MoE Innovation Design Philosophy

The original design philosophy section has been moved to work.md, focused on your MoE innovation and the --moe-gpu-expert-slot-num design approach.

Repository Notes

The previous project README has been preserved as README_OLD.md.
Use that file as a historical and technical reference while this new README defines project intent and guiding principles.

Name		Name	Last commit message	Last commit date
Latest commit History 9,012 Commits
.devops		.devops
.gemini		.gemini
.github		.github
.pi/gg		.pi/gg
benches		benches
benchmark-results-smoke-per-layer/gpu_expert_slot_16		benchmark-results-smoke-per-layer/gpu_expert_slot_16
benchmark-results-smoke/gpu_expert_slot_16		benchmark-results-smoke/gpu_expert_slot_16
benchmark-results		benchmark-results
ci		ci
cmake		cmake
common		common
docs		docs
examples		examples
ggml		ggml
gguf-py		gguf-py
grammars		grammars
include		include
licenses		licenses
locality-results-smoke/gpu_expert_slot_-1		locality-results-smoke/gpu_expert_slot_-1
locality-results		locality-results
media		media
models		models
pocs		pocs
requirements		requirements
scripts		scripts
src		src
tests		tests
tools		tools
vendor		vendor
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.ecrc		.ecrc
.editorconfig		.editorconfig
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
0_harness.sh		0_harness.sh
1_serving.sh		1_serving.sh
2_benchmark.sh		2_benchmark.sh
3_locality_benchmark.sh		3_locality_benchmark.sh
AGENTS.md		AGENTS.md
AUTHORS		AUTHORS
BENCHMARK.md		BENCHMARK.md
BENCHMARK_LOCALITY.md		BENCHMARK_LOCALITY.md
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_OLD.md		README_OLD.md
SECURITY.md		SECURITY.md
build-xcframework.sh		build-xcframework.sh
convert_hf_to_gguf.py		convert_hf_to_gguf.py
convert_hf_to_gguf_update.py		convert_hf_to_gguf_update.py
convert_llama_ggml_to_gguf.py		convert_llama_ggml_to_gguf.py
convert_lora_to_gguf.py		convert_lora_to_gguf.py
flake.lock		flake.lock
flake.nix		flake.nix
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
ty.toml		ty.toml
work.md		work.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama.cpp-moe

MoE Innovation Design Philosophy

Repository Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llama.cpp-moe

MoE Innovation Design Philosophy

Repository Notes

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages