Skip to content

Releases: feyninc/chonkie

v1.6.8

Choose a tag to compare

@chonk-lain chonk-lain released this 01 Jun 18:07
ec6875d

lancedb handshake

What's Changed

New Contributors

Full Changelog: v1.6.7...v1.6.8

v1.6.8-alpha.1

v1.6.8-alpha.1 Pre-release
Pre-release

Choose a tag to compare

@chonk-lain chonk-lain released this 29 May 18:17
ad7e0f6

What's Changed

Full Changelog: v1.6.7...v1.6.8-alpha.1

v1.6.7

Choose a tag to compare

@chonk-lain chonk-lain released this 19 May 04:50
d2b3422

Chonkie v1.6.7 ✨

image

MistralOCR support

Extract text from images with the new MistralOCR integration:

from chonkie import MistralOCR
ocr = MistralOCR()
out = ocr("image.jpg")
print(out.content)

What's Changed

Full Changelog: v1.6.6...v1.6.7

v1.6.6

Choose a tag to compare

@chonk-lain chonk-lain released this 13 May 16:34
e43e5f5

Highlight

πŸ”₯ CodeChunker update and huge performance speedup.
bechmarks below using semble as a reference:

Metric Baseline Semble (uring the current version) Delta % Change
NDCG@10 (search quality) 0.854 0.856 +0.002 +0.2%
p50 latency 7.19 ms 6.16 ms -1.03 ms -14.3%
p90 latency 22.15 ms 18.57 ms -3.57 ms -16.1%
p95 latency 25.12 ms 20.60 ms -4.51 ms -18.0%
p99 latency 28.69 ms 23.70 ms -4.99 ms -17.4%
Index time 5,138 ms 2,853 ms -2,285 ms -44.5%

What's Changed

Dependencies

  • chore(deps): bump urllib3 from 2.6.3 to 2.7.0 by @dependabot[bot] in #584
  • chore(deps): bump langchain-core from 1.2.31 to 1.3.3 by @dependabot[bot] in #581
  • chore(deps-dev): update turbopuffer requirement from ~=1.0 to >=1,<3 by @dependabot[bot] in #585
  • chore(deps): bump mako from 1.3.11 to 1.3.12 by @dependabot[bot] in #579
  • chore(deps): bump authlib from 1.6.11 to 1.6.12 by @dependabot[bot] in #588

Full Changelog: v1.6.5...v1.6.6

v1.6.5

Choose a tag to compare

@chonk-lain chonk-lain released this 06 May 01:17
39d2ef3

What's Changed

Chonkie now supports agentic skills via skills.sh
you can get started directly by running the following command πŸ€–

npx skills add chonkie-inc/skills

Full Changelog: v1.6.4...v1.6.5

v1.6.4

Choose a tag to compare

@chonknick chonknick released this 21 Apr 20:50

πŸš€ Chonkie v1.6.4

πŸ› Fixes

  • Pandas Import Fix : Fixed import chonkie failing with ModuleNotFoundError: No module named 'pandas' when installed without the [table] extra. The pandas import in utils/table_converter.py is now lazy-loaded, matching the existing pattern for optional dependencies. By @Pringled in #566

Tip

If you encountered ModuleNotFoundError: No module named 'pandas' on v1.6.3, upgrading to v1.6.4 resolves the issue:

pip install --upgrade chonkie

Full Changelog: v1.6.3...v1.6.4

v1.6.3

Choose a tag to compare

@chonk-lain chonk-lain released this 20 Apr 20:48
33dd1b9

lancedb handshake

πŸš€ Chonkie v1.6.3

Caution

Known Bug: import chonkie fails with ModuleNotFoundError: No module named 'pandas' when installed without the [table] extra. This is caused by an unconditional top-level pandas import in utils/table_converter.py. Please upgrade to v1.6.4 which fixes this issue.

✨ Features

  • LanceDB Handshake : Introduced a new handshake mechanism for LanceDB integration by @chonk-lain in #546
  • Metadata Enhancements : Added filename to metadata for better traceability by @chonk-lain in #554
  • Markdown Support Improvements : Added MarkdownDocument support for CodeChunker and fixed no-op behavior in TableChunker by @chonknick in #563
  • Table Utilities : Added a table-to-JSON converter by @anaslimem in #531

🧠 Improvements

  • Chunking Consistency : Deduplicated delimiter-based text splitting across chunkers by @anaslimem in #510
  • Model Loading Robustness : Improved error handling for neural model and tokenizer loading by @chimchim89 in #472
  • Refactor Handshake IDs : Moved _generate_default_id into BaseHandshake by @chimchim89 in #455

πŸ› Fixes

  • CJK Delimiter Handling : Fixed handling of single-character delimiters in RecursiveChunker._split_text by @nightcityblade in #537

πŸ“š Documentation

  • JavaScript Docs : Added JavaScript documentation by @chonk-lain in #545
  • Semantic Chunker Examples : Fixed embedding examples by @narumiruna in #544
  • README Cleanup : Removed outdated full API documentation link by @narumiruna in #543
  • General Docs Updates : Refactored and improved documentation by @chonk-lain in #542 and #557
  • Contribution Guidelines : Added PR checklist to CONTRIBUTING.md by @swamy18 in #465

πŸ”§ Maintenance & Dependencies

πŸ™Œ New Contributors

Full Changelog: v1.6.2...v1.6.3

v1.6.2

Choose a tag to compare

@chonk-lain chonk-lain released this 07 Apr 01:21
d24ab74

image

TeraflopAI Chunker

A new chunker has been added to your toolkit πŸŽ‰
you can now use the newly added TeraflopAI chunker freely using the code below

from chonkie import TeraflopAIChunker


chunker = TeraflopAIChunker(api_key="your_api_key_here")

text = "Your text here"
chunker.chunk(text)

What's Changed

Dependencies

  • chore(deps): bump requests from 2.32.5 to 2.33.0 by @dependabot[bot] in #532
  • chore(deps): bump pygments from 2.19.2 to 2.20.0 by @dependabot[bot] in #535
  • chore(deps): bump cryptography from 46.0.5 to 46.0.6 by @dependabot[bot] in #534
  • chore(deps): bump langchain-core from 1.2.19 to 1.2.22 by @dependabot[bot] in #533
  • chore(deps): bump aiohttp from 3.13.3 to 3.13.4 by @dependabot[bot] in #538
  • chore(deps): bump litellm from 1.82.3 to 1.83.0 by @dependabot[bot] in #540

Full Changelog: v1.6.1...v1.6.2

v1.6.1

Choose a tag to compare

@chonk-lain chonk-lain released this 18 Mar 17:05
30d75f4

Chonkie 1.6.1 (patch release) πŸ”¨

This patch release focuses on fixing import issues and updating dependencies

What's Changed

Full Changelog: v1.6.0...v1.6.1

v1.6.0

Choose a tag to compare

@chonk-lain chonk-lain released this 11 Mar 04:54
575b789

Chonkie 1.6.0 πŸŽ‰πŸ¦›

Chonkie logo

This release brings HTML table support, a self-hostable chunking API, native async capabilities, and a migration to ty for faster type-checking, alongside a range of internal refactors and quality improvements that make the library leaner and easier to extend.


η”° HTML Table Support ✨

Structured data in HTML documents has historically been a challenge for chunking pipelines. With 1.6.0, we're introducing first-class support for HTML tables via two new components TableChef and TableChunker.

  • TableChef handles the extraction and normalization of HTML tables into clean, structured representations that are ready for downstream processing.
  • TableChunker builds on top of that to intelligently chunk tabular content, preserving row and column semantics rather than blindly splitting on token boundaries.

Whether you're working with HTML or markdown, you can now process tables in your chunking pipeline.

Input Code Original HTML Table Chunked Output

Chonkie table

IDStatus
1Active
2Pending
3Inactive
4Active
IDStatus
1Active
2Pending

IDStatus
3Inactive
4Active

πŸ’» Chonkie API: Self-Hostable Chunking Server

Chonkie can now run as a fully self-hosted REST API, making it easy to expose chunking capabilities as a service within your infrastructure. Built on FastAPI, the Chonkie OSS API can be spun up with a single CLI command and accessed from any machine on your network.

This is ideal for teams that want to centralize their chunking logic, integrate Chonkie into polyglot stacks, or simply avoid re-initializing models in every service that needs chunking.

To start the server, run:

chonkie serve

Once running, any client on your network can hit the endpoint on the consumer side, full documentation can be found in http://localhost:8000/docs

image api


⚑ Async Support

Chonkie now supports asynchronous chunking out of the box. This has been a long-requested feature for users building async-native applications, whether that's async data pipelines, or anything else running on an event loop. You can now await chunking operations without blocking, making Chonkie a natural fit for high-throughput, I/O-bound workloads.

Method Async Equivalent Description
chunk(text) achunk(text) Chunk a single text
chunk_batch(texts) achunk_batch(texts) Chunk a list of texts
chunk_document(doc) achunk_document(doc) Chunk a Document object

What's Changed

New Contributors

Full Changelog: v1.5.6...v1.6.0