Skip to content

s8ilabs/AutoChunks

Repository files navigation

AutoChunks

The Intelligent Data Optimization Layer for RAG Engineering

PyPI Version Documentation License

AutoChunks is a specialized engine designed to eliminate the guesswork from Retrieval-Augmented Generation (RAG). By treating chunking as an optimization problem rather than a set of heuristics, it empirically discovers the most performant data structures for your specific documents and retrieval models.

AutoChunks Architecture


🚀 Key Features

  • The Vectorized Tournament: Runs parallel searches across 15+ strategy families (Recursive, Semantic, Layout-Aware) using NumPy-accelerated simulation.
  • Adversarial Synthetic QA: Automatically generates "needle-in-a-haystack" QA pairs to test your data structure against real-world search intent.
  • Multi-Objective Optimization: Align engineering with goals like Speed and Precision, Cost Efficiency, or Comprehensive Recall.
  • Framework Native: Built-in bridges for LangChain, LlamaIndex, and Haystack.
  • Enterprise Ready: Air-gap compatible, local model support, and SHA-256 binary fingerprinting.

📦 Installation

Install the stable version from PyPI:

pip install autochunks

For GPU acceleration or RAGAS semantic evaluation, see the Advanced Installation Guide.


🛠️ Usage

Launch the Dashboard

The easiest way to optimize is through the interactive visual dashboard:

autochunks serve

Navigate to http://localhost:8000 to start your first optimization run.

CLI Optimization

Search for the best plan directly from the terminal:

autochunks optimize --docs ./my_data_folder --mode light --objective balanced

Python API

from autochunk import AutoChunker

# Initialize and Discover the optimal plan
optimizer = AutoChunker(mode="light")
plan, report = optimizer.optimize(documents="./my_data", objective="balanced")

# Apply the winning strategy
chunks = plan.apply("./new_documents", optimizer)

Development

If you want to contribute or build from source:

  1. Clone the repository:

    git clone https://github.com/s8ilabs/AutoChunks.git
    cd AutoChunks
  2. Setup virtual environment:

    python -m venv venv
    source venv/bin/activate  # venv\Scripts\activate on Windows
  3. Install in editable mode:

    pip install -e .
  4. Running Tests:

    pytest tests/

📖 Documentation and Resources


Developed with ❤️ for the RAG and LLM Community. AutoChunks is released under the Apache License 2.0.

About

Chunking optimization engine for RAG pipelines

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors