llama.cpp-moe is built for practical Mixture-of-Experts (MoE) inference: local, efficient, and understandable.
This repository centers on MoE-first workflows with lightweight controls that make expert behavior visible, tunable, and easy to reason about.
The original design philosophy section has been moved to work.md, focused on your MoE innovation and the --moe-gpu-expert-slot-num design approach.
- The previous project README has been preserved as
README_OLD.md. - Use that file as a historical and technical reference while this new README defines project intent and guiding principles.