Skip to content

Record candidate: long-context no-QV rank56/prefix3000 TTT — val_bpb 1.05875#1965

Open
himanshudongre wants to merge 1 commit intoopenai:mainfrom
himanshudongre:record/rank56-prefix3000-longctx
Open

Record candidate: long-context no-QV rank56/prefix3000 TTT — val_bpb 1.05875#1965
himanshudongre wants to merge 1 commit intoopenai:mainfrom
himanshudongre:record/rank56-prefix3000-longctx

Conversation

@himanshudongre
Copy link
Copy Markdown

Summary

Adds a 3-seed 10min/16MB record candidate package:

val_bpb = 1.05874877 (population std 0.00091680), max artifact 15,980,110 bytes, all three seeds under the 600s train/eval caps.

This is a clean score-first Track B refinement on the late-April CaseOps / LQER / SparseAttnGate / phased-TTT stack. It keeps the long-context no_qv setup from PR #1953 and reallocates the TTT eval budget with:

  • TTT_LORA_RANK=56
  • PHASED_TTT_PREFIX_DOCS=3000
  • PHASED_TTT_NUM_PHASES=3
  • EVAL_SEQ_LEN=2560, TTT_EVAL_SEQ_LEN=2560
  • TTT_MASK=no_qv, TTT_LOCAL_LR_MULT=0.75, QK_GAIN_INIT=5.25

Results

Seed Train ms Pre-quant BPB Quant BPB Final TTT BPB Eval ms Artifact bytes
42 596051 1.06108950 1.06949683 1.05780842 519467 15,975,989
0 596086 1.06200352 1.07034149 1.05844590 425873 15,976,674
1234 596146 1.06319088 1.07176584 1.05999198 400555 15,980,110
Mean 596094 1.06209463 1.07053472 1.05874877 448632 15,977,591

Leaderboard context at submission time:

Compliance

Checked against the Issue #1017 validity conditions:

  • C1 strict causal dependence: token predictions depend on artifact + strict prefix only.
  • C2 full normalized distribution: standard full-vocabulary neural distribution over the 8192 CaseOps token ids.
  • C3 score-before-update: phased TTT scores before LoRA/global updates use scored tokens.
  • C4 single pass: each validation token is scored once; no rescoring or best-of-k selection.

Additional exclusions: no SLOT, no byte/token PPM, no n-gram cache, no eval-time logit bias, no pre-quant TTT on validation data.

Files

  • records/track_10min_16mb/2026-04-30_LongCtx_NoQV_Rank56Prefix3000/README.md
  • submission.json
  • train_gpt.py
  • train_seed42.log, train_seed0.log, train_seed1234.log

Validation

Local checks before opening this PR:

  • python3 -m json.tool submission.json
  • python3 -m py_compile train_gpt.py
  • parsed all three logs to verify train ms, final BPB, eval ms, and artifact bytes match submission.json

TanishGudise added a commit to TanishGudise/parameter-golf that referenced this pull request Apr 30, 2026
… breakthrough

NULL/NEUTRAL RESULTS (within ±0.0005 noise):
- S37 GPTQ_BATCHES=32: 1.05884 (null)
- S38 TTT_BETA2=0.995: 1.05884 (null)
- S44 GLOBAL_TTT_LR=0.01: 1.05913 (within noise)
- S46 GLOBAL_TTT_EPOCHS=2: 1.05902 (null)

NEGATIVE RESULTS:
- S36 lzma compressor: rejected
- S36v2 LQER_TOP_K=2: 1.05912
- S41 openai#1965 bundle: 1.05916
- S42 LQER 8/5 + EMA 0.997: 1.05912 (EMA contaminated)
- S43 LQER 8/5 isolated: 1.05925
- S52 LeakyReLU 0.3: 1.05977 (PR openai#1948 doesn't transfer to PR openai#1797)
- S53 WARMDOWN_FRAC=0.95 + MIN_LR=0.05: 1.05950 (best pre-quant 1.06061 but bigger quant tax)

INFRASTRUCTURE FIXES:
- S39 lrzip -k flag bug, S40 SSH disconnect, S45 NCCL crash
- S47/S49/S51 LeakyReLU integration bugs

BREAKTHROUGH:
- S54 n-gram tilt port from PR openai#1145/openai#1967: 1.05692 single seed (seed 314)
  - Pre-quant: 1.06057, Quantized: 1.06917, Final: 1.05692
  - Eval: 503.4s under 600s cap, Size: 15,944,666 bytes under 16MB cap
  - Hint precompute outside timer: 173s (legal path)
  - Mode B with fused_log_softmax_dual_gather kernel
  - Hints fired on 13M of 47M tokens (27%)
  - Delta from current-env baseline: -0.00208 BPB

Validating seeds 42, 1234 next.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant