New SOTA: 1.12676 BPB - 11L XSA-all(11) + GPTQ-lite + EMA + Late QAT by gowtham0992 · Pull Request #478 · openai/parameter-golf

gowtham0992 · 2026-03-23T00:21:12Z

New SOTA Record: val_bpb 1.12676 (3-seed mean)

Beats current SOTA (1.14276) by 0.016 nats.

3-Seed Results (8xH100 SXM, 600s)

Seed	BPB	Size
42	1.12713	15.64 MB
1337	1.12648	15.62 MB
2024	1.12667	15.88 MB
Mean	1.12676	~15.7 MB

Key Techniques

XSA on ALL 11 layers
GPTQ-lite optimal clip percentile search
EMA(0.997) + Tight SWA
Late QAT int6-all at LR scale < 0.15
Raw binary + zstd22 serialization

Dependencies

zstandard, flash_attn_3 (see requirements.txt)

Verified on RunPod 8xH100 SXM (official template): 1.12753 BPB

See README.md for full details.

mohosy · 2026-03-23T00:22:58Z

xsa on all 11 layers is bold, most ppl only do last 3 or 4. does it actualy help on the early layers too or is it just not hurting? also gptq lite clip search is a nice touch i havent seen anyone else do that yet

gowtham0992 · 2026-03-23T00:33:23Z

xsa on all 11 layers is bold, most ppl only do last 3 or 4. does it actualy help on the early layers too or is it just not hurting? also gptq lite clip search is a nice touch i havent seen anyone else do that yet

thanks! yeah xsa on all layers actually helps, not just not hurting. its ablated to:

XSA-all(11): 1.12676 BPB, 6764 steps, 88.7ms/step XSA(4) last only: 1.13266 BPB, 6998 steps, 85.7ms/step

so ~0.006 BPB win even though its 3ms/step slower and we lose ~230 steps. early layers tend to repeat self-value patterns, xsa forces them to actually encode new info. at 11L 512d every layer counts

gptq-lite is free basically. 5 clip percentiles per row, pick min MSE. adds like 2 seconds to save and gets ~0.0006 BPB back from quant gap.

New SOTA: 1.12676 BPB - 11L XSA-all + GPTQ-lite + EMA + Late QAT

07884e6

notapplica mentioned this pull request Mar 23, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New SOTA: 1.12676 BPB - 11L XSA-all(11) + GPTQ-lite + EMA + Late QAT#478

New SOTA: 1.12676 BPB - 11L XSA-all(11) + GPTQ-lite + EMA + Late QAT#478
gowtham0992 wants to merge 1 commit intoopenai:mainfrom
gowtham0992:submission/11L-XSA-all-GPTQ-lite-EMA-LateQAT

gowtham0992 commented Mar 23, 2026

Uh oh!

mohosy commented Mar 23, 2026

Uh oh!

gowtham0992 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gowtham0992 commented Mar 23, 2026

New SOTA Record: val_bpb 1.12676 (3-seed mean)

3-Seed Results (8xH100 SXM, 600s)

Key Techniques

Dependencies

Verified on RunPod 8xH100 SXM (official template): 1.12753 BPB

Uh oh!

mohosy commented Mar 23, 2026

Uh oh!

gowtham0992 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants