Record candidate: long-context no-QV rank56/prefix3000 TTT — val_bpb 1.05875#1965
Open
himanshudongre wants to merge 1 commit intoopenai:mainfrom
Open
Record candidate: long-context no-QV rank56/prefix3000 TTT — val_bpb 1.05875#1965himanshudongre wants to merge 1 commit intoopenai:mainfrom
himanshudongre wants to merge 1 commit intoopenai:mainfrom
Conversation
TanishGudise
added a commit
to TanishGudise/parameter-golf
that referenced
this pull request
Apr 30, 2026
… breakthrough NULL/NEUTRAL RESULTS (within ±0.0005 noise): - S37 GPTQ_BATCHES=32: 1.05884 (null) - S38 TTT_BETA2=0.995: 1.05884 (null) - S44 GLOBAL_TTT_LR=0.01: 1.05913 (within noise) - S46 GLOBAL_TTT_EPOCHS=2: 1.05902 (null) NEGATIVE RESULTS: - S36 lzma compressor: rejected - S36v2 LQER_TOP_K=2: 1.05912 - S41 openai#1965 bundle: 1.05916 - S42 LQER 8/5 + EMA 0.997: 1.05912 (EMA contaminated) - S43 LQER 8/5 isolated: 1.05925 - S52 LeakyReLU 0.3: 1.05977 (PR openai#1948 doesn't transfer to PR openai#1797) - S53 WARMDOWN_FRAC=0.95 + MIN_LR=0.05: 1.05950 (best pre-quant 1.06061 but bigger quant tax) INFRASTRUCTURE FIXES: - S39 lrzip -k flag bug, S40 SSH disconnect, S45 NCCL crash - S47/S49/S51 LeakyReLU integration bugs BREAKTHROUGH: - S54 n-gram tilt port from PR openai#1145/openai#1967: 1.05692 single seed (seed 314) - Pre-quant: 1.06057, Quantized: 1.06917, Final: 1.05692 - Eval: 503.4s under 600s cap, Size: 15,944,666 bytes under 16MB cap - Hint precompute outside timer: 173s (legal path) - Mode B with fused_log_softmax_dual_gather kernel - Hints fired on 13M of 47M tokens (27%) - Delta from current-env baseline: -0.00208 BPB Validating seeds 42, 1234 next.
3 tasks
12 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a 3-seed 10min/16MB record candidate package:
val_bpb = 1.05874877 (population std 0.00091680), max artifact 15,980,110 bytes, all three seeds under the 600s train/eval caps.
This is a clean score-first Track B refinement on the late-April CaseOps / LQER / SparseAttnGate / phased-TTT stack. It keeps the long-context
no_qvsetup from PR #1953 and reallocates the TTT eval budget with:TTT_LORA_RANK=56PHASED_TTT_PREFIX_DOCS=3000PHASED_TTT_NUM_PHASES=3EVAL_SEQ_LEN=2560,TTT_EVAL_SEQ_LEN=2560TTT_MASK=no_qv,TTT_LOCAL_LR_MULT=0.75,QK_GAIN_INIT=5.25Results
Leaderboard context at submission time:
1.06108): -0.00233123 BPB, approximately -0.00510 nats.1.05855370): +0.00019507 BPB worse. This PR is a clean record candidate against the merged leaderboard and a reproducible ablation on the Record: PR #1945 base + 2560 long-context + no_qv TTT mask + TTT LR 0.75 + QK_GAIN 5.25 — val_bpb 1.05855 (3-seed mean) #1953 lineage, not a claim to beat Record: PR #1945 base + 2560 long-context + no_qv TTT mask + TTT LR 0.75 + QK_GAIN 5.25 — val_bpb 1.05855 (3-seed mean) #1953 if that PR is accepted first.Compliance
Checked against the Issue #1017 validity conditions:
Additional exclusions: no SLOT, no byte/token PPM, no n-gram cache, no eval-time logit bias, no pre-quant TTT on validation data.
Files
records/track_10min_16mb/2026-04-30_LongCtx_NoQV_Rank56Prefix3000/README.mdsubmission.jsontrain_gpt.pytrain_seed42.log,train_seed0.log,train_seed1234.logValidation
Local checks before opening this PR:
python3 -m json.tool submission.jsonpython3 -m py_compile train_gpt.pysubmission.json