cavegemma

why use many token when few do trick — now baked in weights

Before/After • Quick start • Training • Eval • Reproduce

why use many token when few do trick

Fine-tune of Gemma 4 31B that speak caveman natively — no skill file, no system prompt, no /caveman toggle. Drop articles. Drop filler. Drop pleasantries. Keep code byte-exact. Keep error strings exact. Brain big. Mouth small. Weights ship MIT-friendly under Gemma terms.

Before / After

🗣️ Base Gemma 4 31B

"The reason your React component re-renders every time the parent updates is because by default, React re-renders all children when a parent re-renders, regardless of whether their props actually changed. To prevent this, you can wrap the component in React.memo. However, keep in mind that memo does a shallow prop comparison..."

🪨 Caveman Gemma 4 31B

"Parent re-render → child re-render by default. Props 'change' each render if inline obj/array/fn → new ref → child sees new props. Fix: wrap child React.memo(Child), stabilize props with useMemo/useCallback, lift state. Check: if memo'd but still re-renders, inline prop is culprit."

why use many token when few do trick. Same fix. Same brain. Less mouth.

┌─────────────────────────────────────┐
│  COMPRESSION (eval)    ████████ 65% │
│  CODE FENCE EXACT      ████████ 99% │
│  SEMANTIC SIM          ████████ 94% │
│  ARTICLE DENSITY       █░░░░░░░  1% │
│  VIBES                 ████████ OOG │
└─────────────────────────────────────┘

Shipped weights

Two flavors. Pick by VRAM.

Repo	Format	Size	What it is
`JBrussee/gemma-4-31B-caveman`	bf16 merged	62.5 GB	Full Gemma 4 31B, caveman baked in. Drop-in.
`JBrussee/gemma-4-31B-caveman-lora`	LoRA adapter	534 MB	Stack on `google/gemma-4-31B-it`. Light download.

Quick start

Merged model — no extra setup

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("JBrussee/gemma-4-31B-caveman")
model = AutoModelForCausalLM.from_pretrained(
    "JBrussee/gemma-4-31B-caveman",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

msgs = [{"role": "user", "content": "Why does my React component re-render every time the parent updates?"}]
inputs = tok.apply_chat_template(msgs, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=300, do_sample=False)
print(tok.decode(out[0, inputs.shape[1]:], skip_special_tokens=True))

LoRA adapter on base — lighter download

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-31B-it",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained("google/gemma-4-31B-it")
model = PeftModel.from_pretrained(base, "JBrussee/gemma-4-31B-caveman-lora")

No system prompt needed. Ask question. Model talk caveman.

What model do

Rewrite or answer technical question in caveman style. Source-of-truth ruleset = the JuliusBrussee/caveman skill (MIT). Same rules. Now welded into weights.

Auth bug example (verbose → caveman):

In: "Sure! I'd be happy to help. The issue you're experiencing is most likely caused by your authentication middleware not properly validating the token expiry. Let me take a look..."

Out: "Bug in auth middleware. Token expiry check use < not <=. Fix:"

Training summary

Field	Value
Base	`google/gemma-4-31B-it`
Method	QLoRA NF4 + double-quant + bf16 compute
LoRA	rank 16, α 32, dropout 0, targets all linear
Dataset	1750 train + 193 eval (debug · review · refactor · dialogue · qa)
Schedule	3 epochs, lr 2e-4 cosine, batch 2 × grad accum 8 (eff 16), `completion_only_loss=True`
Hardware	RunPod RTX PRO 6000 Blackwell 96 GB, ~$1.89/hr
Wall time	~50 min (Unsloth + TRL 0.17)
Final loss	train 0.024 · eval 0.72 · eval acc 81.5%

Cost end-to-end: ~$4-5 pod time. Less than lunch.

Eval results

193-pair holdout, tagged by source category. code_fence_match = fraction of source code fences appearing byte-exact in target.

Category	n	compression	article density	code_fence	semantic_sim
dialogue	28	0.59	0.020	1.000	0.91
debug	34	0.92	0.009	0.995	0.98
refactor	27	0.92	0.005	0.963	0.98
qa	104	0.65	0.007	1.000	0.92

Read the numbers:

✅ Code preservation excellent — 96-100% fence-exact
✅ Article density crushed — 0.5-2% (English baseline ~8%)
✅ Semantic preservation strong — 91-98%
⚠️ Compression weaker than gold pairs — model lands 0.6-0.9, gold sits 0.3-0.5. Filter accepted ≤1.0× source; tighten to ≤0.7 next run, push harder.

Repo layout

cavegemma/
├── data/
│   ├── seeds/                 caveman repo snapshots (SKILL.md, eval prompts)
│   ├── sources/               per-source HuggingFace loaders
│   ├── build_corpus.py        orchestrator (6 sources → corpus_raw.jsonl)
│   ├── synthesize.py          claude/codex CLI driver, two-step rewrite, resumable
│   ├── filter.py              fence-integrity + dedup + compression band
│   ├── split.py               90/10 split with seed-pair pinning
│   └── prompts/, out/         (out gitignored)
├── training/
│   ├── train_unsloth.py       Unsloth + TRL SFT trainer, resume from checkpoint
│   ├── runpod_bootstrap.sh    pip + auth bootstrap for fresh pod
│   └── config.toml            single source of truth for hyperparams
├── eval/
│   ├── metrics.py             compression / article-drop / code-fence / semantic_sim
│   ├── run_eval.py            score adapter on holdout + workflow prompts
│   ├── judge.py               LLM-judge via claude CLI on 20 holdouts
│   └── workflow_prompts.jsonl 10 hand-curated workflow eval prompts
├── scripts/
│   ├── infer.py               smoke-test against caveman eval prompts
│   └── push_to_hub.py         publish adapter + model card
└── artifacts/                 (gitignored)

Reproduce

End-to-end. ~6-8 hours wall, ~$4-5 pod.

# 1. Local setup
uv sync
uv run python data/extract_seeds.py            # 20 gold seed pairs
uv run python data/build_corpus.py             # 3000 rows from 6 HF sources

# 2. Synthesis (Claude Code or Codex CLI required)
uv run python data/synthesize.py --backend claude --workers 3      # or
uv run python data/synthesize.py --backend codex --workers 3

# 3. Filter + split
uv run python data/filter.py --in data/out/raw_pairs.jsonl --out data/out/clean_pairs.jsonl
uv run python data/split.py --in data/out/clean_pairs.jsonl

# 4. RunPod H100 / RTX PRO 6000 — rsync, ssh, bootstrap, train
rsync -avz --exclude='.git' --exclude='.venv' -e "ssh -p <port> -i ~/.ssh/id_ed25519" ./ root@<pod>:/workspace/cavegemma/
ssh -i ~/.ssh/id_ed25519 -p <port> root@<pod> "
  export HF_TOKEN=...
  export WANDB_API_KEY=...
  cd /workspace/cavegemma
  bash training/runpod_bootstrap.sh
  python training/train_unsloth.py --config training/config.toml
"

# 5. Eval + ship
python eval/run_eval.py --adapter artifacts/adapter --eval data/out/eval.jsonl --workflow eval/workflow_prompts.jsonl --out artifacts/eval_predictions.jsonl
python scripts/push_to_hub.py --adapter artifacts/adapter --repo <hf-user>/gemma-4-31B-caveman-lora

Datasets

All permissively licensed. 6 sources in, 1750 train + 193 eval out.

Source	License	Pulled	Used for
`OpenAssistant/oasst2`	Apache 2.0	400	Multi-turn dialogue
`princeton-nlp/SWE-bench_Verified`	research-permissive	400	Debug-session narratives
`ronantakizawa/github-codereview`	permissive subset	400	Code review
`bigcode/commitpackft`	MIT/Apache subset	300	Refactor walkthroughs
`theblackcat102/evol-codealpaca-v1`	Apache 2.0	1200	Short technical Q&A
`HuggingFaceH4/ultrachat_200k`	MIT	300	Short Q&A overflow

Caveman side synthesized via Claude Code (claude -p) and Codex CLI (codex exec with GPT-5.5), routed through the canonical SKILL.md ruleset. Two-step rewrite + fence-integrity filter.

Limitations

Compression weaker than gold caveman. Model averages 0.6-0.9 vs gold's 0.3-0.5. Training filter accepted ≤ 1.0× source length; tighten to ≤ 0.7 next run.
Review category sparse. Codex pairs often mutated diff fences, so filter dropped most. Only ~8 review pairs in eval — review behavior extrapolated from debug/refactor neighbors.
Workflow eval gates partly info-only. Open-ended prompts in workflow_prompts.jsonl have no reference; code_fence_match checks input fences in answer, semantic_sim compares answer to question. Treat as smoke signal, not scoreboard.
Multimodal untouched. Gemma 4 is vision + audio capable. Fine-tune was text-only on language head; vision/audio paths should still work but unverified.

Caveman Ecosystem

Three rocks. One philosophy: model do more with less.

Repo	What
caveman	Output compression skill — why use many token when few do trick
cavemem	Cross-agent memory — why agent forget when agent can remember
cavekit	Spec-driven build loop — why agent guess when agent can know
cavegemma (you here)	Caveman baked into weights — why prompt every session when weights remember

Skill compresses any model at runtime. This repo welds the same style into Gemma 4 31B so caveman survives across hosts, agents, no-system-prompt setups. Cheap inference, no skill loader, same brain.

License

Code in this repo: MIT
Adapter and merged model inherit the Gemma Prohibited Use Policy (Apache 2.0 + Gemma terms). See Google's Gemma terms.
Style ruleset and seed pairs from JuliusBrussee/caveman: MIT.

Citing

@misc{brussee2026cavemanGemma,
  author = {Julius Brussee},
  title  = {Caveman-mode Gemma 4 31B},
  year   = {2026},
  url    = {https://huggingface.co/JBrussee/gemma-4-31B-caveman}
}

Star This Repo

Star cost zero. Help small mouth find big audience. ⭐

Also by Julius Brussee

caveman — the original Claude Code skill this fine-tune is built on
Revu — local-first macOS study app with FSRS spaced repetition. revu.cards

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
eval		eval
scripts		scripts
training		training
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cavegemma

Before / After

🗣️ Base Gemma 4 31B

🪨 Caveman Gemma 4 31B

Shipped weights

Quick start

Merged model — no extra setup

LoRA adapter on base — lighter download

What model do

Training summary

Eval results

Repo layout

Reproduce

Datasets

Limitations

Caveman Ecosystem

License

Citing

Star This Repo

Also by Julius Brussee

See also

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cavegemma

Before / After

🗣️ Base Gemma 4 31B

🪨 Caveman Gemma 4 31B

Shipped weights

Quick start

Merged model — no extra setup

LoRA adapter on base — lighter download

What model do

Training summary

Eval results

Repo layout

Reproduce

Datasets

Limitations

Caveman Ecosystem

License

Citing

Star This Repo

Also by Julius Brussee

See also

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages