Record: Pair-Geometric Value Projection on PR1855 by deusexnatura · Pull Request #2075 · openai/parameter-golf

deusexnatura · 2026-05-01T02:56:45Z

Pair-Geometric Value Projection on PR #1855

This submission starts from the accepted PR #1855 stack and replaces the dense
attention value projection with a structured pair-geometric value projection.

Baseline value path:

v = W_v x

PairGeom-V value path:

base = rms_norm(x)
a = base[:kv_dim]
b = base[kv_dim:2*kv_dim]
d = a - b
s = a + b
v = a*w0 + b*w1 + d*wd + s*ws

The validated setting uses PAIRGEOM_V_COLLAPSE=1, which algebraically reduces
the signed rule to per-dimension learned coefficients on the two hidden halves.
This removes the dense trained/stored W_v value matrix while keeping the
accepted PR #1855 recipe otherwise aligned.

Validation:

hardware:      JarvisLabs 8xH100 80GB HBM3
seeds:         42 and 43
train target:  600s wall-clock
steps:         4981, 4996
artifacts:     15,304,981 and 15,312,945 bytes
post-TTT BPB:  1.07006241, 1.07031169
mean BPB:      1.07018705
std BPB:       0.00017627

Seed 43 completed, but we ran out of time before recovering the full remote stdout log, so this PR includes a recovered summary log instead.

Comparison:

reproduced PR #1855 baseline: 1.06021565 BPB
PairGeom-V candidate mean:    1.07018705 BPB
delta:                        +0.00997140 BPB

Claim boundary:
This is an architectural alteration using a pair-geometric value-projection model rather than a new SOTA claim. It replaces the dense attention value projection path, but it does not
replace Q/K projections, attention score dot products, output projection, or MLP
matrix products.

Adds Pair-Geometric Value Projection submission based on accepted PR openai#1855.

…ectory Typo

Adds Pair-Geometric Value Projection submission based on accepted PR openai#1855.

…ence After user feedback that LEAK calls relied too heavily on lineage-inheritance and path heuristics, applied stricter criterion: a LEAK verdict requires at least one of (a) explicit shell-script invocation of prepare_caseops_data.py without --val-docs=50000, (b) README "Data setup" matching actual train log path, (c) audit/submission.json admission text, (d) train log path with `_caseops/datasets/datasets/<name>` triple-nesting OR single `<root>/datasets/<name>` (which only local prep produces; HF always gives double-nesting). Records that previously got LEAK by lineage-inheritance alone are now AMBIGUOUS unless they meet at least one of those tests. Changes: - openai#1945 LEAK → CLEAN (finalize_v18.sh has snapshot_download from HF; actual run path matches HF target; README's prepare_caseops_data.py section is stale documentation) - openai#1953 LEAK → AMBIGUOUS (PR ships only train_gpt.py + logs; no prep evidence; path matches HF target; parent openai#1945 confirmed CLEAN — leans CLEAN but no direct PR evidence) - openai#2041 LEAK → AMBIGUOUS (no prep invocation; double-nested path consistent with EITHER HF or local prep) - openai#2075 LEAK → AMBIGUOUS (ships prep file but no explicit invocation; path matches HF target) Updated tally: CLEAN 9, LEAK 21, AMBIGUOUS 3, INHERIT 1 (was 8/25/0/1). Headline impact: realistic clean SOTA is at most ~0.012 bpb below the claimed frontier openai#2118 (1.04350). Best clean BPB candidates in order: openai#2019 1.05847 (HF, confirmed) openai#1953 1.05855 (AMBIGUOUS, leans CLEAN) openai#1945 1.05943 (HF, confirmed via re-audit) openai#2031 1.05985 (HF, confirmed) openai#1908 1.06081 (HF, confirmed) openai#1851 1.06128 (HF, MERGED SOTA)

deusexnatura added 6 commits April 30, 2026 19:31

Add PairGeom-V PR1855 record

dac96c1

Adds Pair-Geometric Value Projection submission based on accepted PR openai#1855.

Delete records/track_10min_16mb/2026-04-30_PR1855_PairGeomV_Clean dir…

0d47fc2

…ectory Typo

Add PairGeom-V PR1855 record

d8ee45b

Adds Pair-Geometric Value Projection submission based on accepted PR openai#1855.

Delete pairgeom_v_pr1855_submission directory

c6e1909

Add PairGeom-V PR1855 record

59d2937

Adds Pair-Geometric Value Projection submission based on accepted PR openai#1855.

Add Readme for PairGeom-V PR1855 record

7a33465

leon2k2k2k mentioned this pull request May 1, 2026

Train/val data leakage in CaseOps records — prepare_caseops_data.py default overlaps 80% of val docs with training data #2127

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Pair-Geometric Value Projection on PR1855#2075

Record: Pair-Geometric Value Projection on PR1855#2075
deusexnatura wants to merge 6 commits intoopenai:mainfrom
deusexnatura:pairgeom-v-pr1855

deusexnatura commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

deusexnatura commented May 1, 2026

Pair-Geometric Value Projection on PR #1855

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant