Skip to content

perf: add scaling benchmark — FastBPE is 7-12x faster than graph BPE#18

Closed
AmitMY wants to merge 1 commit intomainfrom
perf/larger-benchmarks
Closed

perf: add scaling benchmark — FastBPE is 7-12x faster than graph BPE#18
AmitMY wants to merge 1 commit intomainfrom
perf/larger-benchmarks

Conversation

@AmitMY
Copy link
Copy Markdown
Contributor

@AmitMY AmitMY commented Apr 8, 2026

Summary

  • Add bench_scaling.py comparing graph BPE vs FastBPE across text sizes and merge counts
  • FastBPE consistently 7-12x faster, produces identical merges

Stacked on #17.

What improved

  • Clear evidence FastBPE should be the default for BPE training
  • Scaling test verifies 270k chars trains in <5s

Results

Config Graph BPE Fast BPE Speedup
10 texts × 10 rep, 50 merges 0.06s 0.005s 12.5x
50 texts × 50 rep, 200 merges 0.86s 0.11s 7.5x
100 texts × 50 rep, 200 merges 1.79s 0.23s 7.9x

Test plan

  • 2 scaling tests pass
  • ruff check . passes

🤖 Generated with Claude Code

@AmitMY AmitMY force-pushed the perf/larger-benchmarks branch 15 times, most recently from 81e451c to 887b986 Compare April 8, 2026 17:25
- bench_scaling.py compares graph BPE vs FastBPE across text sizes
  (5k-270k chars) and merge counts (50-200)
- FastBPE consistently 7-12x faster, identical merge output
- Add scaling tests: 270k chars in <5s, identical merges across sizes

Results (270k chars, 200 merges): Graph 1.8s vs Fast 0.23s (7.9x)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@AmitMY AmitMY force-pushed the perf/larger-benchmarks branch from 887b986 to 6e71789 Compare April 8, 2026 17:26
@AmitMY
Copy link
Copy Markdown
Contributor Author

AmitMY commented Apr 8, 2026

Closing — the scaling benchmark was built on FastBPETrainer which was removed in #13.

@AmitMY AmitMY closed this Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant