Skip to content

Conversation

@jlevy
Copy link
Owner

@jlevy jlevy commented Jan 15, 2026

Summary

  • Fixed 54x performance regression in text wrapping (6,300ms → 117ms on test doc)
  • Current version is now 45% faster than v0.6.0 baseline
  • Added benchmarking tooling for ongoing performance monitoring

Changes

Performance fixes:

  • Cache word splitter instances using @cache decorator instead of creating new instances (with 891 pattern groups) for every paragraph
  • Add quick-reject optimization to skip pattern matching for words that cannot possibly match any pattern

New tooling:

  • Add devtools/benchmark.py for comparing performance against releases
  • Add make benchmark and make profile Makefile targets

API cleanup:

  • Rename html_md_word_splitter to get_html_md_word_splitter(atomic_tags: bool) with required explicit parameter to avoid bugs

Test plan

  • All 177 tests pass
  • Linting passes with zero warnings
  • Benchmark confirms performance improvement

Benchmark results

Version Mean vs v0.6.0
Before fix 6,391ms 29x slower
After fix 117ms 45% faster
v0.6.0 219ms baseline

🤖 Generated with Claude Code

Performance fixes:
- Cache word splitter instances using @cache decorator instead of creating
  new instances (with 891 pattern groups) for every paragraph
- Add quick-reject optimization to skip pattern matching for words that
  cannot possibly match any pattern

Result: 54x faster (6,300ms to 117ms), now 45% faster than v0.6.0

Also:
- Add devtools/benchmark.py for comparing performance against releases
- Add make benchmark and make profile targets
- Rename html_md_word_splitter to get_html_md_word_splitter() with
  explicit atomic_tags parameter (no default to avoid bugs)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jlevy jlevy merged commit 704b73a into main Jan 15, 2026
5 checks passed
@jlevy jlevy deleted the feature/benchmark-tooling branch January 15, 2026 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants