Skip to content

Chunklet-py 2.0.0 Released: Major Enhancements and New Features

Choose a tag to compare

@speedyk-005 speedyk-005 released this 20 Nov 02:21
· 130 commits to main since this release
6ef8cea

We are thrilled to announce the release of Chunklet-py version 2.0.0! This is a major update that brings a host of new features, significant performance improvements, and a more intuitive user experience.

✨ What's New in Version 2.0.0?

  • New Chunking Engines:

    • DocumentChunker: You can now seamlessly process various document formats including .pdf, .docx, .epub, .html, .rst, and .tex. The DocumentChunker automatically converts documents to Markdown where possible, extracts rich metadata, and provides a unified interface for all your document processing needs.
    • CodeChunker: A new language-agnostic chunker for source code has been introduced. It is designed to understand and preserve the structural integrity of your code for more meaningful chunks.
  • Expanded Multilingual Support: We've significantly improved our multilingual capabilities, now offering robust sentence splitting for over 50 languages.

  • Enhanced Customization:

    • Custom Document Processors: You can now create and plug in your own custom processors to handle any file type you need.
    • Custom Tokenizer Commands: The CLI now supports custom tokenizer commands, allowing for more accurate token counting with your preferred tokenizer.
  • Streamlined CLI: The command-line interface has been refactored for a more user-friendly experience, with simplified flags for input (--source) and output (--destination).

  • Comprehensive Documentation: Our documentation has been completely overhauled for clarity and ease of use. It now includes more examples, detailed guides for each chunker, and a new section comparing chunklet-py to other libraries.

📈 Improvements

  • Performance: Batch processing has been optimized for better performance and reduced memory usage.
  • Code Quality: The codebase has undergone significant refactoring for improved readability, maintainability, and security.
  • Error Handling: We have introduced more specific and informative error messages to aid in debugging.

⚠️ Breaking Changes

This release introduces breaking changes, particularly in the CLI and the renaming of some core components. Please consult the Migration Guide for a smooth transition.

📚 Further Information

  • Full Changelog: For a detailed list of every change, bug fix, and improvement, please see our Changelog.
  • Documentation: Explore all features and usage examples on our Documentation Site.

It is on pypi as of now Pypi
We're excited to see what you'll build with the new and improved chunklet-py! Your feedback is always welcome.

Full Changelog: v1.3.2...v2.0.0