Releases: feyninc/chonkie
Release list
v1.6.8
What's Changed
- Support Gemini embedding dimensions in Catsu wrappers by @hannibal-lee in #571
- chore(deps): bump idna from 3.11 to 3.15 by @dependabot[bot] in #595
- feat: add PyEmscripten wheels by @chonk-lain in #597
- fix: write non-ASCII text as-is in JSON export instead of \uXXXX escapes by @zzhdbw in #596
- chore(deps): bump uv from 0.11.6 to 0.11.15 by @dependabot[bot] in #598
- chore: bump library by @chonk-lain in #599
New Contributors
Full Changelog: v1.6.7...v1.6.8
v1.6.8-alpha.1
What's Changed
- Support Gemini embedding dimensions in Catsu wrappers by @hannibal-lee in #571
- chore(deps): bump idna from 3.11 to 3.15 by @dependabot[bot] in #595
- feat: add PyEmscripten wheels by @chonk-lain in #597
Full Changelog: v1.6.7...v1.6.8-alpha.1
v1.6.7
Chonkie v1.6.7 β¨
MistralOCR support
Extract text from images with the new MistralOCR integration:
from chonkie import MistralOCR
ocr = MistralOCR()
out = ocr("image.jpg")
print(out.content)What's Changed
- chore(deps): bump langsmith from 0.7.31 to 0.8.0 by @dependabot[bot] in #590
- Fix: initial CodeChunker languages downloaded by @chonk-lain in #591
- Add justified method for the overlap option by @anaslimem in #562
- feat: add MistralOCR by @chonk-lain in #593
- chore: bump version by @chonk-lain in #594
Full Changelog: v1.6.6...v1.6.7
v1.6.6
Highlight
π₯ CodeChunker update and huge performance speedup.
bechmarks below using semble as a reference:
| Metric | Baseline | Semble (uring the current version) | Delta | % Change |
|---|---|---|---|---|
| NDCG@10 (search quality) | 0.854 | 0.856 | +0.002 | +0.2% |
| p50 latency | 7.19 ms | 6.16 ms | -1.03 ms | -14.3% |
| p90 latency | 22.15 ms | 18.57 ms | -3.57 ms | -16.1% |
| p95 latency | 25.12 ms | 20.60 ms | -4.51 ms | -18.0% |
| p99 latency | 28.69 ms | 23.70 ms | -4.99 ms | -17.4% |
| Index time | 5,138 ms | 2,853 ms | -2,285 ms | -44.5% |
What's Changed
- update code chunker by @chonk-lain in #587
Dependencies
- chore(deps): bump urllib3 from 2.6.3 to 2.7.0 by @dependabot[bot] in #584
- chore(deps): bump langchain-core from 1.2.31 to 1.3.3 by @dependabot[bot] in #581
- chore(deps-dev): update turbopuffer requirement from ~=1.0 to >=1,<3 by @dependabot[bot] in #585
- chore(deps): bump mako from 1.3.11 to 1.3.12 by @dependabot[bot] in #579
- chore(deps): bump authlib from 1.6.11 to 1.6.12 by @dependabot[bot] in #588
Full Changelog: v1.6.5...v1.6.6
v1.6.5
What's Changed
Chonkie now supports agentic skills via
you can get started directly by running the following command π€
npx skills add chonkie-inc/skills
- feat: add lazy import workflow by @chonk-lain in #568
- feat (workflow) : add trfflehog workflow by @chonk-lain in #577
- feat: replace all tokenizer backends with tokie by @chonknick in #565
- chore: add instructions on how to use skills by @chonk-lain in #578
- chore(deps): bump lxml from 6.0.2 to 6.1.0 by @dependabot[bot] in #567
- chore(deps): bump litellm from 1.83.0 to 1.83.7 by @dependabot[bot] in #569
- chore(deps): bump python-dotenv from 1.0.1 to 1.2.2 by @dependabot[bot] in #573
Full Changelog: v1.6.4...v1.6.5
v1.6.4
π Chonkie v1.6.4
π Fixes
- Pandas Import Fix : Fixed
import chonkiefailing withModuleNotFoundError: No module named 'pandas'when installed without the[table]extra. The pandas import inutils/table_converter.pyis now lazy-loaded, matching the existing pattern for optional dependencies. By @Pringled in #566
Tip
If you encountered ModuleNotFoundError: No module named 'pandas' on v1.6.3, upgrading to v1.6.4 resolves the issue:
pip install --upgrade chonkieFull Changelog: v1.6.3...v1.6.4
v1.6.3
π Chonkie v1.6.3
Caution
Known Bug: import chonkie fails with ModuleNotFoundError: No module named 'pandas' when installed without the [table] extra. This is caused by an unconditional top-level pandas import in utils/table_converter.py. Please upgrade to v1.6.4 which fixes this issue.
β¨ Features
- LanceDB Handshake : Introduced a new handshake mechanism for LanceDB integration by @chonk-lain in #546
- Metadata Enhancements : Added
filenameto metadata for better traceability by @chonk-lain in #554 - Markdown Support Improvements : Added
MarkdownDocumentsupport forCodeChunkerand fixed no-op behavior inTableChunkerby @chonknick in #563 - Table Utilities : Added a table-to-JSON converter by @anaslimem in #531
π§ Improvements
- Chunking Consistency : Deduplicated delimiter-based text splitting across chunkers by @anaslimem in #510
- Model Loading Robustness : Improved error handling for neural model and tokenizer loading by @chimchim89 in #472
- Refactor Handshake IDs : Moved
_generate_default_idintoBaseHandshakeby @chimchim89 in #455
π Fixes
- CJK Delimiter Handling : Fixed handling of single-character delimiters in
RecursiveChunker._split_textby @nightcityblade in #537
π Documentation
- JavaScript Docs : Added JavaScript documentation by @chonk-lain in #545
- Semantic Chunker Examples : Fixed embedding examples by @narumiruna in #544
- README Cleanup : Removed outdated full API documentation link by @narumiruna in #543
- General Docs Updates : Refactored and improved documentation by @chonk-lain in #542 and #557
- Contribution Guidelines : Added PR checklist to
CONTRIBUTING.mdby @swamy18 in #465
π§ Maintenance & Dependencies
- Test Coverage : Improved test coverage by @chonk-lain in #555
- Version Bump : Bumped library version by @chonk-lain in #564
π New Contributors
- @narumiruna made their first contribution in #544
- @nightcityblade made their first contribution in #537
- @swamy18 made their first contribution in #465
Full Changelog: v1.6.2...v1.6.3
v1.6.2
TeraflopAI Chunker
A new chunker has been added to your toolkit π
you can now use the newly added TeraflopAI chunker freely using the code below
from chonkie import TeraflopAIChunker
chunker = TeraflopAIChunker(api_key="your_api_key_here")
text = "Your text here"
chunker.chunk(text)What's Changed
- Fixed per-chunk overlap calculation for float context_size by @anaslimem in #512
- add teraflopai chunker by @chonk-lain in #539
- Validate tree-sitter language support in CodeChunker by @chimchim89 in #469
- chore: bump version by @chonk-lain in #541
Dependencies
- chore(deps): bump requests from 2.32.5 to 2.33.0 by @dependabot[bot] in #532
- chore(deps): bump pygments from 2.19.2 to 2.20.0 by @dependabot[bot] in #535
- chore(deps): bump cryptography from 46.0.5 to 46.0.6 by @dependabot[bot] in #534
- chore(deps): bump langchain-core from 1.2.19 to 1.2.22 by @dependabot[bot] in #533
- chore(deps): bump aiohttp from 3.13.3 to 3.13.4 by @dependabot[bot] in #538
- chore(deps): bump litellm from 1.82.3 to 1.83.0 by @dependabot[bot] in #540
Full Changelog: v1.6.1...v1.6.2
v1.6.1
Chonkie 1.6.1 (patch release) π¨
This patch release focuses on fixing import issues and updating dependencies
What's Changed
- chore(deps): bump pyjwt from 2.10.1 to 2.12.0 by @dependabot[bot] in #521
- chore(deps): bump orjson from 3.11.3 to 3.11.6 by @dependabot[bot] in #520
- chore: fix imports by @chonk-lain in #525
- chore: add httpx and update [all] by @chonk-lain in #526
- chore: bump version by @chonk-lain in #527
- chore(deps): bump authlib from 1.6.7 to 1.6.9 by @dependabot[bot] in #524
Full Changelog: v1.6.0...v1.6.1
v1.6.0
Chonkie 1.6.0 ππ¦
This release brings HTML table support, a self-hostable chunking API, native async capabilities, and a migration to ty for faster type-checking, alongside a range of internal refactors and quality improvements that make the library leaner and easier to extend.
η° HTML Table Support β¨
Structured data in HTML documents has historically been a challenge for chunking pipelines. With 1.6.0, we're introducing first-class support for HTML tables via two new components TableChef and TableChunker.
TableChefhandles the extraction and normalization of HTML tables into clean, structured representations that are ready for downstream processing.TableChunkerbuilds on top of that to intelligently chunk tabular content, preserving row and column semantics rather than blindly splitting on token boundaries.
Whether you're working with HTML or markdown, you can now process tables in your chunking pipeline.
| Input Code | Original HTML Table | Chunked Output | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
π» Chonkie API: Self-Hostable Chunking Server
Chonkie can now run as a fully self-hosted REST API, making it easy to expose chunking capabilities as a service within your infrastructure. Built on FastAPI, the Chonkie OSS API can be spun up with a single CLI command and accessed from any machine on your network.
This is ideal for teams that want to centralize their chunking logic, integrate Chonkie into polyglot stacks, or simply avoid re-initializing models in every service that needs chunking.
To start the server, run:
chonkie serveOnce running, any client on your network can hit the endpoint on the consumer side, full documentation can be found in http://localhost:8000/docs
β‘ Async Support
Chonkie now supports asynchronous chunking out of the box. This has been a long-requested feature for users building async-native applications, whether that's async data pipelines, or anything else running on an event loop. You can now await chunking operations without blocking, making Chonkie a natural fit for high-throughput, I/O-bound workloads.
| Method | Async Equivalent | Description |
|---|---|---|
chunk(text) |
achunk(text) |
Chunk a single text |
chunk_batch(texts) |
achunk_batch(texts) |
Chunk a list of texts |
chunk_document(doc) |
achunk_document(doc) |
Chunk a Document object |
What's Changed
- Add test coverage plan documenting gaps and implementation strategy by @chonknick in #502
- docs: comprehensive documentation review and updates by @chonknick in #503
- Refactor AutoTokenizer for less if/else by @akx in #422
- chore: fix lint errors introduced in #502 by @akx in #506
- refactor: replace redundant embedding providers with CatsuEmbeddings wrappers by @chonknick in #505
- feat: support pathlib.Path objects for filesystem paths by @akx in #507
- Use
logger.warningfor warnings by @akx in #425 - chore: normalize import style for importlib.util by @akx in #501
- chore: fix & simplify tests by @akx in #509
- feat: switch to ty by @chonk-lain in #409
- feat: support html tables by @chonk-lain in #500
- feat: Add Chonkie OSS API - Self-hostable FastAPI server by @chonknick in #504
- chore: cleanup unecessary files by @chonk-lain in #514
- chore(deps): bump authlib from 1.6.6 to 1.6.7 by @dependabot[bot] in #515
- Implement Async Support by @aryxenv in #437
- bump version by @chonk-lain in #518
- chore: add async docs by @chonk-lain in #519
New Contributors
Full Changelog: v1.5.6...v1.6.0






