Record: Order-Adaptive BackoffMixer (mean val_bpb=0.5440) by hypery11 · Pull Request #825 · openai/parameter-golf

hypery11 · 2026-03-26T07:07:51Z

Results

Seed	val_bpb	Eval time
42	0.5437	~391s
1337	0.5450	~391s
2024	0.5434	~391s
Mean	0.5440
Std	0.0008

Artifact: ~16.0 MB
Train: 600s on 8xH100 SXM
Eval: ~391s (well under 600s)

Method

11-layer transformer (512d, 8/8 full MHA, XSA-all, LeakyReLU(0.5)^2, 3.5x MLP). Order-adaptive entropy-gated BackoffNgramMixer with per-order entropy thresholds. Score-first, backward-looking, deterministic.

Acknowledgments

Huge thanks to the incredible community that made this possible:

@abaybektursun (PR Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) #549) — base architecture + Legal TTT + Parallel Muon
@deanbrr (PR Record: 5-gram Eval Cache + LeakyReLU² + Parallel Muon val_bpb: 1.0920 (3-seed mean, std 0.0007) | ~15.9 MB | 8×H100 SXM #659, Record: BackoffNgramMixer + Drift-Free TTT (3-seed mean val_bpb=0.6683) #779) — invented the n-gram eval cache for this competition, BackoffNgramMixer
@Asukabot0 (PR Record: XSA-all + LeakyReLU² + VR + GA + 7-gram cache (val_bpb=1.0337) #715, Record: First Legal Sub-1.0 BPB — Multi-order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.9674, 3-seed) #727) — XSA-all concept, entropy-adaptive alpha formula
@gowtham0992 (PR Record: int5 GPTQ + Soft-Round QAT (3-seed mean 1.1162) #606) — int5 + Soft-Round QAT
@signalrush (PR Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233) #414) — EMA training recipe
@sofiabod (PR Record: 11L XSA4 + LeakyReLU(0.5)² + Cosine TTT 50ep (val_bpb=1.0622) #518) — LeakyReLU activation
@thwu1 (PR Record: 10L Int5-MLP + BigramHash(10240) + SWA(0.4) + WD=0.04 (val_bpb=1.1428, mean 3 seeds) #180) — mixed quantization, BigramHash, SmearGate
@RoyiRa (PR Record Submission: 1.0541 BPB - 5-expert Hedge Mixer + CROWN-Q + stride=64 #700) — TTT framework
@Christopher-Lee-McClendon (PR Non-record: 11L Depth Recurrence + High-Yield Legal TTT (1.14458 BPB) #461) — TTT recipe
@raahilshah (PR Record: Int6 MLP3x + SmearGate + BigramHash + MuonWD + SWA (mean val_bpb=1.1483) #162) — int6 quantization

This competition has been an amazing collaborative experience. Every improvement here builds on ideas shared openly.

8xH100 SXM, train <=600s
Eval <=600s (391s)
Artifact <=16MB
3-seed validation (std 0.0008)

Seeds: 0.5437 / 0.5450 / 0.5434 (std 0.0008). Order-adaptive entropy gating + BackoffNgramMixer. ~16MB artifact. Train 600s, eval 391s.

MatoTeziTanka · 2026-03-26T14:30:48Z

Really impressive work — the order-adaptive entropy gating with per-order thresholds is a thoughtful design, and the 3-seed consistency (std 0.0008) is excellent. The acknowledgments section is also great to see — this competition has been genuinely collaborative.

One thing to flag: checking the log output, it looks like seeds 42 and 2024 may exceed the 16,000,000 byte artifact cap:

Seed 1337: 15,948,371 bytes ✅
Seed 42: ~16,022,243 bytes (over by ~22K)
Seed 2024: ~16,030,231 bytes (over by ~30K)

We ran into the exact same issue on our PR #769 seed 42 (over by 25,731 bytes) and had to rerun with tighter quantization. It's a subtle one — the submission.json may not reflect the per-seed sizes accurately.

Might be worth double-checking the individual seed artifact sizes against the 16,000,000 limit before the maintainers review. The fix for us was minor — just tightening the compression/quantization slightly to get the headroom.

Record: Order-Adaptive BackoffMixer (mean val_bpb=0.5440, 3 seeds)

79ae889

Seeds: 0.5437 / 0.5450 / 0.5434 (std 0.0008). Order-adaptive entropy gating + BackoffNgramMixer. ~16MB artifact. Train 600s, eval 391s.

This was referenced Mar 26, 2026

PROTEUS+STYX — val_bpb 0.8495 (3-seed mean) — LeakyReLU(0.9)² + 5-gram Eval Cache #769

Open

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Order-Adaptive BackoffMixer (mean val_bpb=0.5440)#825

Record: Order-Adaptive BackoffMixer (mean val_bpb=0.5440)#825
hypery11 wants to merge 1 commit intoopenai:mainfrom
hypery11:submission/2026-03-26_final_champion

hypery11 commented Mar 26, 2026

Uh oh!

MatoTeziTanka commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hypery11 commented Mar 26, 2026

Results

Method

Acknowledgments

Uh oh!

MatoTeziTanka commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants