[Non-record] XSA + EMA + TTT: Negative interaction study (val_bpb=1.1436) by sseanliu · Pull Request #303 · openai/parameter-golf

sseanliu · 2026-03-21T03:00:56Z

Summary

Non-record submission testing TTT on the XSA+EMA base (PR #287). Key finding: TTT hurts by 0.016 BPB.

Results

Configuration	val_bpb
XSA + EMA (PR #287, no TTT)	1.1280
XSA + EMA + TTT (this)	1.1436 (+0.016 worse)
SmearGate + TTT (PR #254)	1.1313

TTT makes the XSA+EMA model worse, confirming the mechanism redundancy pattern from #290 and #296.

Why TTT hurts XSA models

XSA and TTT both target local context modeling. XSA removes self-information from attention outputs; TTT adapts weights to local validation patterns. Stacking them double-counts the same signal, while TTT's SGD updates disrupt the smooth EMA weight landscape.

Reproducibility (2 seeds)

Seed	val_bpb
1337	1.1436
42	1.1441
Mean	1.1439

Artifact: 15.3MB. Training: 6,001 steps @ 100ms/step. TTT: 67s. Used FA2 (not FA3).

See README for full analysis.

…, clean up script Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mohosy · 2026-03-21T04:18:39Z

this is really good data, was literally about to stack ttt on ema+xsa and you saved me the compute lol. do you think theres any eval time trick that does work with ema or is it just fundamentally incompatible with adaptation

Add XSA+EMA+TTT negative interaction study (non-record)

9227d8e

notapplica mentioned this pull request Mar 21, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

mohosy pushed a commit to mohosy/parameter-golf that referenced this pull request Mar 21, 2026

Update: drop TTT (negative interaction with EMA+XSA per PR openai#303)…

c57bfbe

…, clean up script Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mohosy mentioned this pull request Mar 21, 2026

Non-record: 11L EMA + XSA + Int6 MLP3x (pending compute) #291

Open

sseanliu mentioned this pull request Mar 21, 2026

Neural Cache: Cross-Window KV Caching for Extended Eval Context (research proposal) #318

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Non-record] XSA + EMA + TTT: Negative interaction study (val_bpb=1.1436)#303

[Non-record] XSA + EMA + TTT: Negative interaction study (val_bpb=1.1436)#303
sseanliu wants to merge 1 commit intoopenai:mainfrom
sseanliu:submission/xsa-ema-ttt

sseanliu commented Mar 21, 2026

Uh oh!

mohosy commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sseanliu commented Mar 21, 2026

Summary

Results

Why TTT hurts XSA models

Reproducibility (2 seeds)

Uh oh!

mohosy commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants