[WIP] Sparse Attention + Recursive Weight Sharing for 16MB Efficiency by albertorkive · Pull Request #5 · openai/parameter-golf

albertorkive · 2026-03-18T18:20:47Z

Research Lead: Rkive AI

This PR introduces a flexible architecture skeleton designed to maximize parameter density within the 16MB artifact constraint.

Key Mechanisms:

Sliding Window Attention (WINDOW_SIZE): Restricts causal attention to a local window. This reduces the $O(N^2)$ compute bottleneck on the 8xH100s, potentially allowing for wider d_model configurations within the 10-minute training wall-clock.
Recursive Weight Tying (NUM_PHYSICAL_LAYERS): Implements weight sharing across logical layers (e.g., 3 physical blocks acting as 9 logical layers). This increases "effective depth" without increasing the stored parameter count in the 16MB .pt file.
Optimized for 8xH100s: Includes hooks for torch.compile and quantization-ready layers to hit the sub-1.20 bpb target.

Status: Initial skeleton pushed. Seeking Development Grant (~$500) to perform hyperparameter sweeps on sparsity patterns and tie-ratios.

Current baseline: 1.2244 bpb.
Target: < 1.20 bpb.

Adds three research levers to train_gpt.py (1177 lines, under 1500 cap): - Sliding window attention (WINDOW_SIZE env var, default 0 = dense) Causal attention restricted to a local window; reduces compute and may allow wider/deeper models within the 16MB artifact budget. - Recursive weight tying (NUM_PHYSICAL_LAYERS env var) N physical blocks reused across NUM_LAYERS logical layers. E.g. NUM_PHYSICAL_LAYERS=3 with NUM_LAYERS=9 gives 3x effective depth at the same parameter count. - Dev/mini-run support (DEV_MODE, SKIP_QUANT env vars) DEV_MODE=1 allows MPS/CPU fallback for local architecture testing. SKIP_QUANT=1 skips int8+zlib serialization for fast iteration. Also adds mini_run.sh convenience script for quick local runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jordankzf · 2026-03-18T18:36:57Z

bro we can see claude did it

vukrosic · 2026-03-18T19:07:38Z

@jordankzf goal is scientific advent, not who did it

albertorkive · 2026-03-18T20:05:36Z

Lol @jordankzf cause you definitely are hand coding every character 😂 Based on that comment you are either a hypocrite or slow (adopting tools I mean) 😬

jordankzf · 2026-03-18T20:20:10Z

@albertorkive Just messing with you, man. Claude (and Codex) is awesome! Good luck on your submission :)

albertorkive · 2026-03-18T20:40:39Z

@jordankzf likewise, good luck. May the best coding assistant win 😌

PR openai#672 maxes TTT at 30 epochs (590s/600s eval budget), so all future improvements must be orthogonal to TTT. This update: - Sets 1.0781 BPB (PR openai#672) as the new target to beat - Reorders Top 8 directions: XSA-all confirmed at #1, Full GPTQ #2, SwiGLU #3, Muon-VS #4, aggressive quant openai#5, MASA openai#6, depth recurrence openai#7 with int6 risk warning, AdEMAMix openai#8 - Deprioritizes TTT-related directions already exploited by PR openai#672 - Collapses ~1000 lines of stale Round 0-3.9 session logs into a concise historical summary - Removes resolved blockers (flash_attn, SSH hangs, local runtime) - Adds fresh Round 1 section with 5 submitted experiments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

albertorkive force-pushed the research/sparse-recurrent-skeleton branch from 193a53b to 31e777e Compare March 19, 2026 11:00

0hq added the not ready for review label Mar 19, 2026

0hq closed this Mar 19, 2026

gb250e referenced this pull request in gb250e/parameter-golf Mar 21, 2026

docs: add PR #5 summary placeholder

0b98199

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Sparse Attention + Recursive Weight Sharing for 16MB Efficiency#5

[WIP] Sparse Attention + Recursive Weight Sharing for 16MB Efficiency#5
albertorkive wants to merge 1 commit intoopenai:mainfrom
albertorkive:research/sparse-recurrent-skeleton

albertorkive commented Mar 18, 2026

Uh oh!

jordankzf commented Mar 18, 2026

Uh oh!

vukrosic commented Mar 18, 2026

Uh oh!

albertorkive commented Mar 18, 2026

Uh oh!

jordankzf commented Mar 18, 2026

Uh oh!

albertorkive commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

albertorkive commented Mar 18, 2026

Research Lead: Rkive AI

Uh oh!

jordankzf commented Mar 18, 2026

Uh oh!

vukrosic commented Mar 18, 2026

Uh oh!

albertorkive commented Mar 18, 2026

Uh oh!

jordankzf commented Mar 18, 2026

Uh oh!

albertorkive commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants