omnimergekit

Tools for model merging, expert pruning, differential competence-map extraction, and GGUF quantization — built on top of transformers, safetensors, and llama.cpp. The kit started as a fork-flavored alternative to mergekit (omnimerge_v2 recipe) and grew to cover Gemma 4 MoE surgery and Qwen3.5 hybrid-attention frankenmerges.

Status: research code. Not packaged for pip yet. Scripts assume the directory layout described below; paths inside scripts may need editing for your environment.

What's inside

Path	Purpose
`omnimergekit.py`	The main merge script. Methods: `dare_ties`, `omnimerge_v2`, plus features `obim`, `darex`, `emr`, `fisher`.
`competence/`	Differential competence-map pipeline: extract per-source per-task fisher signal → combine across tasks → feed into `omnimergekit.py --fisher`.
`gemma4/`	Gemma 4 MoE surgery: expert drop, DERN-style redistribution, CD-maps for contribution-aware quants, hybrid-expert assembly.
`quantization/`	`quantize_gguf.py` (multi-tier quants with imatrix), `convert_to_4bit.py`, `publish_model.py` (HF push with frontmatter).
`eval/`	Eval drivers — GPQA Diamond, LiveCodeBench (lcb_llama_server.py), HE/MBPP rescore-with-fence-strip helpers.
`recipes/`	End-to-end pipelines: 4B MicroCoder series, Gemma 4 109e/98e/120e/128e, 27B Omnimerge.
`pod/`	RunPod / Vast.ai helpers (setup, parallel run, retrieve, README publish).
`docs/`	Method docs, experiment journals, recipe deep-dives.
`experiments/`	Per-experiment notebooks / logs.

Two recipes you'll actually use

1. Frankenmerge with Fisher importance

# 1. Run HE / MBPP / etc. eval on each source model, save lm-eval samples_*.jsonl
# 2. Extract per-source fisher signal restricted to docs THAT source uniquely solved
python competence/competence_extract.py \
    --model $SRC1 --samples $EVAL/source1/samples_humaneval_*.jsonl \
    --task he --keep-doc-ids 1,4,7,11 \
    --output maps/source1__he.safetensors \
    --max-len 4096 --chunk-len 1280     # chunked grad-accum: full context on small VRAM

# 3. Combine across tasks (per source) into single competence map
python competence/competence_combine.py \
    --map "source1:humaneval:results.json:maps/source1__he.safetensors" \
    --map "source1:mbpp:results.json:maps/source1__mbpp.safetensors" \
    --raw-rate --signal weight_taylor \
    --output-dir maps/combined/

# 4. Merge with Fisher-aware omnimerge_v2
python omnimergekit.py \
    --base $BASE --source $SRC1 --source $SRC2 --source $SRC3 \
    --output merged/ \
    --method omnimerge_v2 --v2-features fisher,darex \
    --weights 0.35,0.40,0.25 --density 0.53 --darex-q 0.85 \
    --fisher "maps/combined/source1.safetensors,maps/combined/source2.safetensors,maps/combined/source3.safetensors" \
    --pr682-turbo --skip-patterns "model.visual,mtp.layers" --device cuda

See docs/METHOD_omnimerge_v2.md for the math (OBIM-lite + DAREx-q + EMR election + Fisher).

2. Gemma 4 MoE expert drop with router recalibration

# Drop weakest 19 experts/layer (128e → 109e), preserving routing semantics
python gemma4/expert_pruning/expert_drop.py \
    --model gemma-4-26B-A4B-it --n-keep 109 \
    --analysis gemma4/neuron_analysis/expert_neuron_v4.json \
    --output gemma-4-A4B-109e/ --recalibrate-router 2000

# Quantize with contribution-aware (CD) maps
python gemma4/cd_maps/generate_cd_maps_from_contribution.py \
    --analysis gemma4/neuron_analysis/expert_neuron_v4.json \
    --output cd_maps/

python quantization/quantize_gguf.py gemma-4-A4B-109e \
    --tier CD-Q4_K_M --cd-maps cd_maps/ \
    --imatrix calibration.dat --out gemma-4-A4B-109e-CD-Q4_K_M.gguf

See docs/METHOD_gemma4_pruning.md for full method.

KL distillation playbook

For "small student + larger teacher + small task corpus" scenarios that pair naturally with merge recipes (merge-then-distill is the standard order), see docs/METHOD_kl_distillation.md. It covers FKL / RKL / GKD / Hybrid losses, the on-policy vs off-policy trade-off, the recommended starter recipe, and the cross-tokenizer case.

Findings backed by full HE-164 + MBPP-378 evals on DS-Coder-1.3B-Instruct (student) with DS-Coder-6.7B-Instruct (same-vocab teacher). Forward KL on student-sampled positions ("DistillSpec") is the recommended default — it beat reverse-KL (MiniLLM) by 1.8 pp and SFT by 4.3 pp HE at the same recipe. Cross-vocab plumbing factored into the standalone library mann1x/cross-tokenizer-distill.

Hard-won rules

These are baked into the recipe scripts. They cost real compute or a published-model rollback to learn.

Always --use_cache <path> and --log_samples on lm_eval. Without sqlite cache, any death (PEG parser, OOM, network blip) restarts from 0. Without samples, you can't tell whether pass@1=0 means "model bad" or "scorer crashed on markdown fences".
imatrix.dat MUST be archived next to every quant. Recompute is 15-20 min of GPU time and depends on calibration data + seed. Lose it → quant cannot be reproduced bit-for-bit.
Never run lm_eval on a chat model via /v1/completions without an explicit chat template. Gemma 4 / Qwen3.5 reasoning variants emit fenced code that scorers can't exec(). Use /v1/chat/completions + apply_chat_template, or rescore samples with the fence-strip helpers in eval/.
Gemma 4 needs --reasoning-format deepseek --reasoning-budget 8192 when served via llama-server. Without budget, it emits malformed channel tokens and crashes eval mid-run.
Qwen3.5 has hybrid linear/full attention. Without flash-linear-attention and causal-conv1d installed, gradient extraction OOMs at >1k context. Either install them or use chunked-grad-accum (competence_extract.py --chunk-len).

Published artifacts using this kit

ManniX-ITA/Qwen3.5-27B-Omnimerge-v2 — 27B frankenmerge (4 sources, OBIM-lite + DAREx-q + EMR + Fisher).
ManniX-ITA/Qwen3.6-27B-Omnimerge-v3a — cross-base v3a (Qwen3.6 base + 3 Qwen3.5 sources).
ManniX-ITA/Qwen3.6-27B-Omnimerge-v3b — same-base v3b.
Gemma 4 A4B 109e — pruned MoE (128e → 109e), 75.25% → 71.72% GPQA Diamond, ~12 GB Q4_K_M.

License

MIT (see LICENSE).

Citation

If you use this in published work:

@misc{omnimergekit,
  author = {Calpini, Federico},
  title = {omnimergekit: model merging, expert pruning, and differential competence maps},
  year = {2026},
  howpublished = {\url{https://github.com/mann1x/omnimergekit}}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

omnimergekit

What's inside

Two recipes you'll actually use

1. Frankenmerge with Fisher importance

2. Gemma 4 MoE expert drop with router recalibration

KL distillation playbook

Hard-won rules

Published artifacts using this kit

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
competence		competence
docs		docs
eval		eval
experiments		experiments
gemma4		gemma4
pod		pod
quantization		quantization
recipes		recipes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
omnimergekit.py		omnimergekit.py
requirements-dev.txt		requirements-dev.txt
requirements-eval.txt		requirements-eval.txt
requirements-quant.txt		requirements-quant.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

omnimergekit

What's inside

Two recipes you'll actually use

1. Frankenmerge with Fisher importance

2. Gemma 4 MoE expert drop with router recalibration

KL distillation playbook

Hard-won rules

Published artifacts using this kit

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages