Skip to content

Record: Pair-Geometric Value Projection on PR1855#2075

Open
deusexnatura wants to merge 6 commits intoopenai:mainfrom
deusexnatura:pairgeom-v-pr1855
Open

Record: Pair-Geometric Value Projection on PR1855#2075
deusexnatura wants to merge 6 commits intoopenai:mainfrom
deusexnatura:pairgeom-v-pr1855

Conversation

@deusexnatura
Copy link
Copy Markdown

Pair-Geometric Value Projection on PR #1855

This submission starts from the accepted PR #1855 stack and replaces the dense
attention value projection with a structured pair-geometric value projection.

Baseline value path:

v = W_v x

PairGeom-V value path:

base = rms_norm(x)
a = base[:kv_dim]
b = base[kv_dim:2*kv_dim]
d = a - b
s = a + b
v = a*w0 + b*w1 + d*wd + s*ws

The validated setting uses PAIRGEOM_V_COLLAPSE=1, which algebraically reduces
the signed rule to per-dimension learned coefficients on the two hidden halves.
This removes the dense trained/stored W_v value matrix while keeping the
accepted PR #1855 recipe otherwise aligned.

Validation:

hardware:      JarvisLabs 8xH100 80GB HBM3
seeds:         42 and 43
train target:  600s wall-clock
steps:         4981, 4996
artifacts:     15,304,981 and 15,312,945 bytes
post-TTT BPB:  1.07006241, 1.07031169
mean BPB:      1.07018705
std BPB:       0.00017627

Seed 43 completed, but we ran out of time before recovering the full remote stdout log, so this PR includes a recovered summary log instead.

Comparison:

reproduced PR #1855 baseline: 1.06021565 BPB
PairGeom-V candidate mean:    1.07018705 BPB
delta:                        +0.00997140 BPB

Claim boundary:
This is an architectural alteration using a pair-geometric value-projection model rather than a new SOTA claim. It replaces the dense attention value projection path, but it does not
replace Q/K projections, attention score dot products, output projection, or MLP
matrix products.

Adds Pair-Geometric Value Projection submission based on accepted PR openai#1855.
Adds Pair-Geometric Value Projection submission based on accepted PR openai#1855.
Adds Pair-Geometric Value Projection submission based on accepted PR openai#1855.
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request May 1, 2026
…ence

After user feedback that LEAK calls relied too heavily on lineage-inheritance
and path heuristics, applied stricter criterion: a LEAK verdict requires at
least one of (a) explicit shell-script invocation of prepare_caseops_data.py
without --val-docs=50000, (b) README "Data setup" matching actual train log
path, (c) audit/submission.json admission text, (d) train log path with
`_caseops/datasets/datasets/<name>` triple-nesting OR single `<root>/datasets/<name>`
(which only local prep produces; HF always gives double-nesting).

Records that previously got LEAK by lineage-inheritance alone are now AMBIGUOUS
unless they meet at least one of those tests.

Changes:
  - openai#1945 LEAK → CLEAN  (finalize_v18.sh has snapshot_download from HF;
    actual run path matches HF target; README's prepare_caseops_data.py
    section is stale documentation)
  - openai#1953 LEAK → AMBIGUOUS  (PR ships only train_gpt.py + logs; no prep
    evidence; path matches HF target; parent openai#1945 confirmed CLEAN —
    leans CLEAN but no direct PR evidence)
  - openai#2041 LEAK → AMBIGUOUS  (no prep invocation; double-nested path
    consistent with EITHER HF or local prep)
  - openai#2075 LEAK → AMBIGUOUS  (ships prep file but no explicit invocation;
    path matches HF target)

Updated tally: CLEAN 9, LEAK 21, AMBIGUOUS 3, INHERIT 1 (was 8/25/0/1).

Headline impact: realistic clean SOTA is at most ~0.012 bpb below the
claimed frontier openai#2118 (1.04350). Best clean BPB candidates in order:
  openai#2019 1.05847 (HF, confirmed)
  openai#1953 1.05855 (AMBIGUOUS, leans CLEAN)
  openai#1945 1.05943 (HF, confirmed via re-audit)
  openai#2031 1.05985 (HF, confirmed)
  openai#1908 1.06081 (HF, confirmed)
  openai#1851 1.06128 (HF, MERGED SOTA)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant