π Chunklet-py v2.3.0 β smarter sentence splitting, faster visualizer
β¨ What's New
- Non-Latin scripts in fallback splitter β Arabic, Chinese, Japanese, etc. now handled correctly via Unicode property escapes (
\p{Lo},\p{Lt}) - Fallback splitter preserves quotes, parens, and numbered lists β quoted text, parenthesized content, and
1. 2. 3.lists stay as single sentences instead of getting split apart (uses hash-based masking) - Visualizer API now supports MessagePack β browser requests it automatically for ~30-50% smaller payloads; programmatic clients can opt in via
Accept: application/msgpackheader (JSON still default) - ~2x faster span detection β replaced regex-based
_find_spanwith a deterministic finder, no more backtracking on large texts - Visualizer extra has a new shortcut "chunklet-py[viz]"
- Lazy imports for splitter libraries β faster startup
- Better markdown heading detection in DocumentChunker
π§ The Fixes
pkg_resourcescrash on install β finally sorted out the setuptools dependency mess- Custom splitter registration β no more
TypeErrorwhen registeringfunctools.partialor other callables without a__name__ - Log spam with
lang='auto'β stopped warning you every single time you auto-detect a language - CodeChunker tree hierarchy β methods now appear under their class instead of "global"
π§Ή Removed
- Python 3.10 support β Dropped becuase of recurring CI multiprocessing hangs + approaching EOL.
π¦ Quick Install
pip install chunklet-py -Uπ Additional Information
Feedback and bug reports welcome. Thanks!