AutoChunks

The Intelligent Data Optimization Layer for RAG Engineering

AutoChunks is a specialized engine designed to eliminate the guesswork from Retrieval-Augmented Generation (RAG). By treating chunking as an optimization problem rather than a set of heuristics, it empirically discovers the most performant data structures for your specific documents and retrieval models.

🚀 Key Features

The Vectorized Tournament: Runs parallel searches across 15+ strategy families (Recursive, Semantic, Layout-Aware) using NumPy-accelerated simulation.
Adversarial Synthetic QA: Automatically generates "needle-in-a-haystack" QA pairs to test your data structure against real-world search intent.
Multi-Objective Optimization: Align engineering with goals like Speed and Precision, Cost Efficiency, or Comprehensive Recall.
Framework Native: Built-in bridges for LangChain, LlamaIndex, and Haystack.
Enterprise Ready: Air-gap compatible, local model support, and SHA-256 binary fingerprinting.

📦 Installation

Install the stable version from PyPI:

pip install autochunks

For GPU acceleration or RAGAS semantic evaluation, see the Advanced Installation Guide.

🛠️ Usage

Launch the Dashboard

The easiest way to optimize is through the interactive visual dashboard:

autochunks serve

Navigate to http://localhost:8000 to start your first optimization run.

CLI Optimization

Search for the best plan directly from the terminal:

autochunks optimize --docs ./my_data_folder --mode light --objective balanced

Python API

from autochunk import AutoChunker

# Initialize and Discover the optimal plan
optimizer = AutoChunker(mode="light")
plan, report = optimizer.optimize(documents="./my_data", objective="balanced")

# Apply the winning strategy
chunks = plan.apply("./new_documents", optimizer)

Development

If you want to contribute or build from source:

Clone the repository:

git clone https://github.com/s8ilabs/AutoChunks.git
cd AutoChunks

Setup virtual environment:

python -m venv venv
source venv/bin/activate  # venv\Scripts\activate on Windows

Install in editable mode:
```
pip install -e .
```
Running Tests:
```
pytest tests/
```

📖 Documentation and Resources

Developed with ❤️ for the RAG and LLM Community. AutoChunks is released under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
autochunk		autochunk
docs		docs
examples		examples
scripts		scripts
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
VERSION		VERSION
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoChunks

The Intelligent Data Optimization Layer for RAG Engineering

🚀 Key Features

📦 Installation

🛠️ Usage

Launch the Dashboard

CLI Optimization

Python API

Development

📖 Documentation and Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoChunks

The Intelligent Data Optimization Layer for RAG Engineering

🚀 Key Features

📦 Installation

🛠️ Usage

Launch the Dashboard

CLI Optimization

Python API

Development

📖 Documentation and Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages