Record: Pair-Geometric Value Projection on PR1855#2075
Open
deusexnatura wants to merge 6 commits intoopenai:mainfrom
Open
Record: Pair-Geometric Value Projection on PR1855#2075deusexnatura wants to merge 6 commits intoopenai:mainfrom
deusexnatura wants to merge 6 commits intoopenai:mainfrom
Conversation
Adds Pair-Geometric Value Projection submission based on accepted PR openai#1855.
Adds Pair-Geometric Value Projection submission based on accepted PR openai#1855.
Adds Pair-Geometric Value Projection submission based on accepted PR openai#1855.
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
May 1, 2026
…ence After user feedback that LEAK calls relied too heavily on lineage-inheritance and path heuristics, applied stricter criterion: a LEAK verdict requires at least one of (a) explicit shell-script invocation of prepare_caseops_data.py without --val-docs=50000, (b) README "Data setup" matching actual train log path, (c) audit/submission.json admission text, (d) train log path with `_caseops/datasets/datasets/<name>` triple-nesting OR single `<root>/datasets/<name>` (which only local prep produces; HF always gives double-nesting). Records that previously got LEAK by lineage-inheritance alone are now AMBIGUOUS unless they meet at least one of those tests. Changes: - openai#1945 LEAK → CLEAN (finalize_v18.sh has snapshot_download from HF; actual run path matches HF target; README's prepare_caseops_data.py section is stale documentation) - openai#1953 LEAK → AMBIGUOUS (PR ships only train_gpt.py + logs; no prep evidence; path matches HF target; parent openai#1945 confirmed CLEAN — leans CLEAN but no direct PR evidence) - openai#2041 LEAK → AMBIGUOUS (no prep invocation; double-nested path consistent with EITHER HF or local prep) - openai#2075 LEAK → AMBIGUOUS (ships prep file but no explicit invocation; path matches HF target) Updated tally: CLEAN 9, LEAK 21, AMBIGUOUS 3, INHERIT 1 (was 8/25/0/1). Headline impact: realistic clean SOTA is at most ~0.012 bpb below the claimed frontier openai#2118 (1.04350). Best clean BPB candidates in order: openai#2019 1.05847 (HF, confirmed) openai#1953 1.05855 (AMBIGUOUS, leans CLEAN) openai#1945 1.05943 (HF, confirmed via re-audit) openai#2031 1.05985 (HF, confirmed) openai#1908 1.06081 (HF, confirmed) openai#1851 1.06128 (HF, MERGED SOTA)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pair-Geometric Value Projection on PR #1855
This submission starts from the accepted PR #1855 stack and replaces the dense
attention value projection with a structured pair-geometric value projection.
Baseline value path:
PairGeom-V value path:
The validated setting uses
PAIRGEOM_V_COLLAPSE=1, which algebraically reducesthe signed rule to per-dimension learned coefficients on the two hidden halves.
This removes the dense trained/stored
W_vvalue matrix while keeping theaccepted PR #1855 recipe otherwise aligned.
Validation:
Seed 43 completed, but we ran out of time before recovering the full remote stdout log, so this PR includes a recovered summary log instead.
Comparison:
Claim boundary:
This is an architectural alteration using a pair-geometric value-projection model rather than a new SOTA claim. It replaces the dense attention value projection path, but it does not
replace Q/K projections, attention score dot products, output projection, or MLP
matrix products.