Skip to content

draft#650

Closed
abaybektursun wants to merge 1 commit intoopenai:mainfrom
abaybektursun:submission/shard-ordering-case-study
Closed

draft#650
abaybektursun wants to merge 1 commit intoopenai:mainfrom
abaybektursun:submission/shard-ordering-case-study

Conversation

@abaybektursun
Copy link
Contributor

@abaybektursun abaybektursun commented Mar 24, 2026

No description provided.

@abaybektursun abaybektursun changed the title You're Training on 57% of Your Data. Does It Matter Which 57%? We Beat Our Own #1 Record With 10 Lines of Code Mar 24, 2026
@abaybektursun abaybektursun changed the title We Beat Our Own #1 Record With 10 Lines of Code 1.1187 BPB by Changing the Order of Training Data Mar 24, 2026
@abaybektursun abaybektursun changed the title 1.1187 BPB by Changing the Order of Training Data -0.0041 BPB by Changing the Order of Training Data Mar 24, 2026
@abaybektursun abaybektursun changed the title -0.0041 BPB by Changing the Order of Training Data -0.0041 BPB by Reordering Training Data (Is This Cheating?) Mar 24, 2026
@abaybektursun abaybektursun changed the title -0.0041 BPB by Reordering Training Data (Is This Cheating?) -0.0041 BPB by Reordering Training Data (Curriculum Learning) Mar 24, 2026
@abaybektursun abaybektursun force-pushed the submission/shard-ordering-case-study branch 3 times, most recently from beb400c to ca1a938 Compare March 24, 2026 22:48
Case study: reordering training shards by model difficulty (hardest
first) gives -0.0033 BPB improvement over sequential ordering. Zero
architecture changes, zero compute cost, ten lines of code.

Key finding: token-level statistics (KL divergence) find 0.0009 range
across shards. Model perplexity finds 0.0475 range -- 100x more
variation. The two metrics are uncorrelated (r = -0.056).

3-seed validated on PR openai#549 (merged openai#1):
  Seed 1337: 1.1217 -> 1.1183 (-0.0034)
  Seed 42:   1.1222 -> 1.1181 (-0.0041)
  Seed 2025: 1.1221 -> 1.1198 (-0.0023)
  Mean:      1.1220 -> 1.1187 (-0.0033)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abaybektursun abaybektursun force-pushed the submission/shard-ordering-case-study branch from ca1a938 to f4b3468 Compare March 24, 2026 23:00
@abaybektursun abaybektursun deleted the submission/shard-ordering-case-study branch March 24, 2026 23:26
@abaybektursun abaybektursun changed the title -0.0041 BPB by Reordering Training Data (Curriculum Learning) draft Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant