Record: 5-gram Eval Cache + LeakyReLU² + Parallel Muon val_bpb: 1.0920 (3-seed mean, std 0.0007) | ~15.9 MB | 8×H100 SXM by deanbrr · Pull Request #659 · openai/parameter-golf

deanbrr · 2026-03-25T00:00:05Z

Summary

Mean val_bpb: 1.0920 (3 seeds, std: 0.0007)
Improvement over merged LeakyReLU_LegalTTT_ParallelMuon record: 0.0274 BPP (2.4% better)
Same architecture + training (TTT disabled), entirely better eval strategy

Seeds
SeedBPBEval timeArtifact13371.0916522s15.9 MB421.0928515s~~15.9 MB20241.0917516s~~15.9 MB
What Changed
Online 5-gram cache accumulated from already-scored tokens during sliding window eval. Confidence-gated log-sum-exp mixing with safety gate (can never worsen a prediction). Zero GPU cost, pure CPU dict lookups. Strictly backward looking at every step.
Base: LeakyReLU_LegalTTT_ParallelMuon by @abaybektursun (TTT disabled).

…1.0920, 3 seeds)

valerio-oai · 2026-03-25T01:12:49Z

I think the EvalCache here as implemented is illegal: at eval-time, for every token, the code scores the token under both the 5-gram cache and the actual language model, and then keeps whichever gives the lower loss on the true next token. That means the evaluation rule is using the ground-truth answer to decide which scorer to report after the fact, rather than committing to a single prediction rule in advance. This means it's effectively peeking at the correct token and then crediting whichever model happened to assign that token higher probability.

To be clear, I do not think the EvalCache idea itself is illegal. It's constructed correctly by looking back at tokens that have already been scored, so that part looks legal. The issue is specifically the hindsight selection step. If it used another condition to pick between the language model and the n-gram model (i.e. the entropy over the LM's distribution or something), I think it would be much more likely to be legal.

newjordan · 2026-03-25T01:56:47Z

wow... amazing. Been chasing a 1.108 signal for a minute, congrats, looks like you nailed it.

- LeakyReLU(0.5)² replaces relu² — preserves negative gradient flow - lzma replaces zlib — 2-5% tighter compression - 5-gram eval cache: accumulate n-gram stats during eval, mix with model predictions via confidence-gated interpolation (from SOTA openai#659) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@deanbrr

11L/512d U-Net + legal score-first 5-gram eval interpolation. Inspired by @deanbrr's n-gram cache technique (PR openai#659). 3-seed results: seed 1337: 1.0451 (15.63MB) seed 42: 1.0471 (15.59MB) seed 2045: 1.0460 (15.64MB) mean: 1.0461 Run: SEED=2045 MLP_ACT=leaky_relu_sq MLP_LEAKY_SLOPE=0.5 \ XSA_LAST_N=4 BIGRAM_VOCAB_SIZE=1536 ROPE_DIMS=24 \ NGRAM_EVAL_ORDER=5 NGRAM_EVAL_ALPHA=0.20 \ torchrun --nproc_per_node=8 train_gpt.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Record: 5-gram Eval Cache + LeakyReLU² + Parallel Muon (mean val_bpb=…

b6d1eca

…1.0920, 3 seeds)

notapplica mentioned this pull request Mar 25, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

Add README.md

d9349d0

valerio-oai closed this Mar 25, 2026

newjordan mentioned this pull request Mar 25, 2026

Podracing: 1.0461 BPB (3-seed mean) #674

Closed

0hq mentioned this pull request Mar 25, 2026

Illegal submissions megathread #677

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 5-gram Eval Cache + LeakyReLU² + Parallel Muon val_bpb: 1.0920 (3-seed mean, std 0.0007) | ~15.9 MB | 8×H100 SXM#659

Record: 5-gram Eval Cache + LeakyReLU² + Parallel Muon val_bpb: 1.0920 (3-seed mean, std 0.0007) | ~15.9 MB | 8×H100 SXM#659
deanbrr wants to merge 2 commits intoopenai:mainfrom
deanbrr:submission/5gram-eval-1.0920

deanbrr commented Mar 25, 2026

Uh oh!

valerio-oai commented Mar 25, 2026

Uh oh!

newjordan commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

deanbrr commented Mar 25, 2026

Uh oh!

valerio-oai commented Mar 25, 2026

Uh oh!

newjordan commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants