v1.6.0
Chonkie 1.6.0 ππ¦
This release brings HTML table support, a self-hostable chunking API, native async capabilities, and a migration to ty for faster type-checking, alongside a range of internal refactors and quality improvements that make the library leaner and easier to extend.
η° HTML Table Support β¨
Structured data in HTML documents has historically been a challenge for chunking pipelines. With 1.6.0, we're introducing first-class support for HTML tables via two new components TableChef and TableChunker.
TableChefhandles the extraction and normalization of HTML tables into clean, structured representations that are ready for downstream processing.TableChunkerbuilds on top of that to intelligently chunk tabular content, preserving row and column semantics rather than blindly splitting on token boundaries.
Whether you're working with HTML or markdown, you can now process tables in your chunking pipeline.
| Input Code | Original HTML Table | Chunked Output | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
π» Chonkie API: Self-Hostable Chunking Server
Chonkie can now run as a fully self-hosted REST API, making it easy to expose chunking capabilities as a service within your infrastructure. Built on FastAPI, the Chonkie OSS API can be spun up with a single CLI command and accessed from any machine on your network.
This is ideal for teams that want to centralize their chunking logic, integrate Chonkie into polyglot stacks, or simply avoid re-initializing models in every service that needs chunking.
To start the server, run:
chonkie serveOnce running, any client on your network can hit the endpoint on the consumer side, full documentation can be found in http://localhost:8000/docs
β‘ Async Support
Chonkie now supports asynchronous chunking out of the box. This has been a long-requested feature for users building async-native applications, whether that's async data pipelines, or anything else running on an event loop. You can now await chunking operations without blocking, making Chonkie a natural fit for high-throughput, I/O-bound workloads.
| Method | Async Equivalent | Description |
|---|---|---|
chunk(text) |
achunk(text) |
Chunk a single text |
chunk_batch(texts) |
achunk_batch(texts) |
Chunk a list of texts |
chunk_document(doc) |
achunk_document(doc) |
Chunk a Document object |
What's Changed
- Add test coverage plan documenting gaps and implementation strategy by @chonknick in #502
- docs: comprehensive documentation review and updates by @chonknick in #503
- Refactor AutoTokenizer for less if/else by @akx in #422
- chore: fix lint errors introduced in #502 by @akx in #506
- refactor: replace redundant embedding providers with CatsuEmbeddings wrappers by @chonknick in #505
- feat: support pathlib.Path objects for filesystem paths by @akx in #507
- Use
logger.warningfor warnings by @akx in #425 - chore: normalize import style for importlib.util by @akx in #501
- chore: fix & simplify tests by @akx in #509
- feat: switch to ty by @chonk-lain in #409
- feat: support html tables by @chonk-lain in #500
- feat: Add Chonkie OSS API - Self-hostable FastAPI server by @chonknick in #504
- chore: cleanup unecessary files by @chonk-lain in #514
- chore(deps): bump authlib from 1.6.6 to 1.6.7 by @dependabot[bot] in #515
- Implement Async Support by @aryxenv in #437
- bump version by @chonk-lain in #518
- chore: add async docs by @chonk-lain in #519
New Contributors
Full Changelog: v1.5.6...v1.6.0



