Record: Order-Adaptive BackoffMixer (mean val_bpb=0.5440)#825
Record: Order-Adaptive BackoffMixer (mean val_bpb=0.5440)#825hypery11 wants to merge 1 commit intoopenai:mainfrom
Conversation
Seeds: 0.5437 / 0.5450 / 0.5434 (std 0.0008). Order-adaptive entropy gating + BackoffNgramMixer. ~16MB artifact. Train 600s, eval 391s.
|
Really impressive work — the order-adaptive entropy gating with per-order thresholds is a thoughtful design, and the 3-seed consistency (std 0.0008) is excellent. The acknowledgments section is also great to see — this competition has been genuinely collaborative. One thing to flag: checking the log output, it looks like seeds 42 and 2024 may exceed the 16,000,000 byte artifact cap:
We ran into the exact same issue on our PR #769 seed 42 (over by 25,731 bytes) and had to rerun with tighter quantization. It's a subtle one — the submission.json may not reflect the per-seed sizes accurately. Might be worth double-checking the individual seed artifact sizes against the 16,000,000 limit before the maintainers review. The fix for us was minor — just tightening the compression/quantization slightly to get the headroom. |
Results
Method
11-layer transformer (512d, 8/8 full MHA, XSA-all, LeakyReLU(0.5)^2, 3.5x MLP). Order-adaptive entropy-gated BackoffNgramMixer with per-order entropy thresholds. Score-first, backward-looking, deterministic.
Acknowledgments
Huge thanks to the incredible community that made this possible:
This competition has been an amazing collaborative experience. Every improvement here builds on ideas shared openly.