Skip to content

v1.6.0

Choose a tag to compare

@chonk-lain chonk-lain released this 11 Mar 04:54
575b789

Chonkie 1.6.0 πŸŽ‰πŸ¦›

Chonkie logo

This release brings HTML table support, a self-hostable chunking API, native async capabilities, and a migration to ty for faster type-checking, alongside a range of internal refactors and quality improvements that make the library leaner and easier to extend.


η”° HTML Table Support ✨

Structured data in HTML documents has historically been a challenge for chunking pipelines. With 1.6.0, we're introducing first-class support for HTML tables via two new components TableChef and TableChunker.

  • TableChef handles the extraction and normalization of HTML tables into clean, structured representations that are ready for downstream processing.
  • TableChunker builds on top of that to intelligently chunk tabular content, preserving row and column semantics rather than blindly splitting on token boundaries.

Whether you're working with HTML or markdown, you can now process tables in your chunking pipeline.

Input Code Original HTML Table Chunked Output

Chonkie table

IDStatus
1Active
2Pending
3Inactive
4Active
IDStatus
1Active
2Pending

IDStatus
3Inactive
4Active

πŸ’» Chonkie API: Self-Hostable Chunking Server

Chonkie can now run as a fully self-hosted REST API, making it easy to expose chunking capabilities as a service within your infrastructure. Built on FastAPI, the Chonkie OSS API can be spun up with a single CLI command and accessed from any machine on your network.

This is ideal for teams that want to centralize their chunking logic, integrate Chonkie into polyglot stacks, or simply avoid re-initializing models in every service that needs chunking.

To start the server, run:

chonkie serve

Once running, any client on your network can hit the endpoint on the consumer side, full documentation can be found in http://localhost:8000/docs

image api


⚑ Async Support

Chonkie now supports asynchronous chunking out of the box. This has been a long-requested feature for users building async-native applications, whether that's async data pipelines, or anything else running on an event loop. You can now await chunking operations without blocking, making Chonkie a natural fit for high-throughput, I/O-bound workloads.

Method Async Equivalent Description
chunk(text) achunk(text) Chunk a single text
chunk_batch(texts) achunk_batch(texts) Chunk a list of texts
chunk_document(doc) achunk_document(doc) Chunk a Document object

What's Changed

New Contributors

Full Changelog: v1.5.6...v1.6.0