Skip to content

πŸš€ Chunklet-py v2.3.0 β€” smarter sentence splitting, faster visualizer

Choose a tag to compare

@speedyk-005 speedyk-005 released this 01 May 21:58
· 17 commits to main since this release

✨ What's New

  • Non-Latin scripts in fallback splitter β€” Arabic, Chinese, Japanese, etc. now handled correctly via Unicode property escapes (\p{Lo}, \p{Lt})
  • Fallback splitter preserves quotes, parens, and numbered lists β€” quoted text, parenthesized content, and 1. 2. 3. lists stay as single sentences instead of getting split apart (uses hash-based masking)
  • Visualizer API now supports MessagePack β€” browser requests it automatically for ~30-50% smaller payloads; programmatic clients can opt in via Accept: application/msgpack header (JSON still default)
  • ~2x faster span detection β€” replaced regex-based _find_span with a deterministic finder, no more backtracking on large texts
  • Visualizer extra has a new shortcut "chunklet-py[viz]"
  • Lazy imports for splitter libraries β€” faster startup
  • Better markdown heading detection in DocumentChunker

πŸ”§ The Fixes

  • pkg_resources crash on install β€” finally sorted out the setuptools dependency mess
  • Custom splitter registration β€” no more TypeError when registering functools.partial or other callables without a __name__
  • Log spam with lang='auto' β€” stopped warning you every single time you auto-detect a language
  • CodeChunker tree hierarchy β€” methods now appear under their class instead of "global"

🧹 Removed

  • Python 3.10 support β€” Dropped becuase of recurring CI multiprocessing hangs + approaching EOL.

πŸ“¦ Quick Install

pip install chunklet-py -U

πŸ”— Additional Information

Feedback and bug reports welcome. Thanks!