AutoChunks is a specialized engine designed to eliminate the guesswork from Retrieval-Augmented Generation (RAG). By treating chunking as an optimization problem rather than a set of heuristics, it empirically discovers the most performant data structures for your specific documents and retrieval models.
- The Vectorized Tournament: Runs parallel searches across 15+ strategy families (Recursive, Semantic, Layout-Aware) using NumPy-accelerated simulation.
- Adversarial Synthetic QA: Automatically generates "needle-in-a-haystack" QA pairs to test your data structure against real-world search intent.
- Multi-Objective Optimization: Align engineering with goals like Speed and Precision, Cost Efficiency, or Comprehensive Recall.
- Framework Native: Built-in bridges for LangChain, LlamaIndex, and Haystack.
- Enterprise Ready: Air-gap compatible, local model support, and SHA-256 binary fingerprinting.
Install the stable version from PyPI:
pip install autochunksFor GPU acceleration or RAGAS semantic evaluation, see the Advanced Installation Guide.
The easiest way to optimize is through the interactive visual dashboard:
autochunks serveNavigate to http://localhost:8000 to start your first optimization run.
Search for the best plan directly from the terminal:
autochunks optimize --docs ./my_data_folder --mode light --objective balancedfrom autochunk import AutoChunker
# Initialize and Discover the optimal plan
optimizer = AutoChunker(mode="light")
plan, report = optimizer.optimize(documents="./my_data", objective="balanced")
# Apply the winning strategy
chunks = plan.apply("./new_documents", optimizer)If you want to contribute or build from source:
-
Clone the repository:
git clone https://github.com/s8ilabs/AutoChunks.git cd AutoChunks -
Setup virtual environment:
python -m venv venv source venv/bin/activate # venv\Scripts\activate on Windows
-
Install in editable mode:
pip install -e . -
Running Tests:
pytest tests/
- Full Documentation Portal
- PyPI Project Page
- Getting Started Guide
- The Optimization Lifecycle
- Metric Definitions and Scoring
Developed with ❤️ for the RAG and LLM Community. AutoChunks is released under the Apache License 2.0.
