draft by abaybektursun · Pull Request #650 · openai/parameter-golf

abaybektursun · 2026-03-24T21:59:38Z

No description provided.

Case study: reordering training shards by model difficulty (hardest first) gives -0.0033 BPB improvement over sequential ordering. Zero architecture changes, zero compute cost, ten lines of code. Key finding: token-level statistics (KL divergence) find 0.0009 range across shards. Model perplexity finds 0.0475 range -- 100x more variation. The two metrics are uncorrelated (r = -0.056). 3-seed validated on PR openai#549 (merged openai#1): Seed 1337: 1.1217 -> 1.1183 (-0.0034) Seed 42: 1.1222 -> 1.1181 (-0.0041) Seed 2025: 1.1221 -> 1.1198 (-0.0023) Mean: 1.1220 -> 1.1187 (-0.0033) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

abaybektursun changed the title ~~You're Training on 57% of Your Data. Does It Matter Which 57%?~~ We Beat Our Own #1 Record With 10 Lines of Code Mar 24, 2026

notapplica mentioned this pull request Mar 24, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

abaybektursun changed the title ~~We Beat Our Own #1 Record With 10 Lines of Code~~ 1.1187 BPB by Changing the Order of Training Data Mar 24, 2026

abaybektursun changed the title ~~1.1187 BPB by Changing the Order of Training Data~~ -0.0041 BPB by Changing the Order of Training Data Mar 24, 2026

abaybektursun changed the title ~~-0.0041 BPB by Changing the Order of Training Data~~ -0.0041 BPB by Reordering Training Data (Is This Cheating?) Mar 24, 2026

abaybektursun changed the title ~~-0.0041 BPB by Reordering Training Data (Is This Cheating?)~~ -0.0041 BPB by Reordering Training Data (Curriculum Learning) Mar 24, 2026

abaybektursun force-pushed the submission/shard-ordering-case-study branch 3 times, most recently from beb400c to ca1a938 Compare March 24, 2026 22:48

abaybektursun force-pushed the submission/shard-ordering-case-study branch from ca1a938 to f4b3468 Compare March 24, 2026 23:00

abaybektursun closed this Mar 24, 2026

abaybektursun deleted the submission/shard-ordering-case-study branch March 24, 2026 23:26

abaybektursun changed the title ~~-0.0041 BPB by Reordering Training Data (Curriculum Learning)~~ draft Mar 24, 2026

ndokutovich mentioned this pull request Mar 25, 2026

Record: Curriculum Learning + LeakyReLU(0.9)² + 7-gram Backoff (val_bpb=0.9633) #764

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft#650

draft#650
abaybektursun wants to merge 1 commit intoopenai:mainfrom
abaybektursun:submission/shard-ordering-case-study

abaybektursun commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abaybektursun commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

abaybektursun commented Mar 24, 2026 •

edited

Loading