Skip to content

sonesuke/tiny-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tiny-rag

A lightweight RAG (Retrieval-Augmented Generation) package that runs entirely on CPU without requiring GPUs or cloud services.

Overview

tiny-rag is designed to make RAG accessible to everyone by removing the typical hardware and service dependencies. While traditional RAG systems require GPU-accelerated embedding models and reranker models (like OpenAI's text-embedding services), tiny-rag provides a fully functional RAG implementation that runs efficiently on CPU-only environments.

tiny-rag leverages static-embedding-japanese, an ultra-fast embedding model that achieves 126x faster CPU inference compared to traditional transformer models while maintaining strong performance on Japanese text tasks.

Key Features

  • CPU-Only Execution: No GPU required - runs on any standard computer
  • No Cloud Dependencies: Fully offline operation without external API calls
  • Lightweight: Minimal resource footprint for embedding and reranking
  • Easy to Use: Simple API for quick integration into your projects
  • Cost-Effective: No cloud service fees or expensive hardware requirements
  • Japanese Language Support: Currently optimized for Japanese text processing

Language Support

⚠️ Important: tiny-rag currently supports Japanese language only. Support for additional languages is planned for future releases.

Installation

pip install tiny-rag

Quick Start

from tiny_rag import TinyRAG

# Initialize tiny-rag
rag = TinyRAG()

# Add documents (Japanese text)
rag.add_documents([
    "ドキュメント1の内容...",
    "ドキュメント2の内容...",
])

# Query (in Japanese)
results = rag.query("あなたの質問をここに入力")

Use Cases

  • Local Development: Test RAG pipelines without cloud costs
  • Edge Computing: Deploy RAG on resource-constrained devices
  • Privacy-Sensitive Applications: Keep all data processing local
  • Educational Projects: Learn RAG concepts without infrastructure overhead
  • Japanese Text Processing: Optimized for Japanese language applications

Technical Details

Embedding Model

tiny-rag uses static-embedding-japanese, which provides:

  • Ultra-fast CPU Performance: 126x faster inference than comparable transformer models
  • 1024-dimensional embeddings: Can be reduced to 32-512 dimensions for efficiency
  • Simple Architecture: Uses token embedding averaging instead of complex attention mechanisms
  • Strong Benchmark Performance: JMTEB score of 67.17 (micro-average)
  • Matryoshka Representation Learning: Enables efficient dimension reduction without retraining

Reranker Model

For improved retrieval accuracy, tiny-rag employs japanese-reranker-tiny-v2:

  • Ultra-Compact: Only 29.4M parameters with 3 layers
  • Blazing Fast: 50-65% faster query processing than larger models
  • Good Performance: Average score of 0.8138 on Japanese benchmarks
  • CPU-Optimized: Specifically designed for CPU and Apple Silicon
  • Modern Architecture: Based on ModernBERT with 256 hidden dimensions
  • Strong Benchmark Results: JaCWIR (0.9287), JSQuAD (0.9608), MIRACL (0.7201)

Both models are specifically chosen for their exceptional CPU performance while maintaining high-quality results for Japanese text processing.

Benchmark Performance

tiny-rag has been evaluated on standard Japanese retrieval benchmarks to demonstrate its effectiveness:

Datasets

  • JQaRA (Japanese Question Answering with Retrieval Augmentation): 144,372 documents
  • JaCWIR (Japanese Casual Web IR): 513,107 documents

Results

Dataset NDCG@10 MRR@10 MAP@10 Hits@10 Avg Query Time
JQaRA 0.8553 0.8796 - - 0.771 sec
JaCWIR - - 0.8368 0.8646 0.925 sec

Running Benchmarks

# Quick test with 5 queries per dataset
make bench

# Full benchmark evaluation
make bench-full

# Custom benchmark
python -m bench.benchmark --dimensions 512 --max-queries 10

Benchmark Options

  • --dimensions: Embedding dimensions (32, 64, 128, 256, 512, 1024) - Default: 1024
  • --max-queries: Maximum queries per dataset (for testing)

Performance Insights

  • High Accuracy: Achieves 0.84-0.88 scores across all metrics
  • Practical Speed: Query processing under 1 second even for large datasets
  • Scalable: Performance scales reasonably with dataset size
  • CPU-Friendly: All processing runs efficiently on standard hardware

Requirements

  • Python 3.13+
  • No GPU required
  • Minimal RAM requirements

Development

Setup

# Clone the repository
git clone https://github.com/sonesuke/tiny-rag.git
cd tiny-rag

# Install development dependencies
uv sync

# Run tests
uv run pytest --cov=src

# Run benchmarks
make bench       # Quick test (5 queries)
make bench-full  # Full evaluation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Acknowledgments

Built with a focus on accessibility and efficiency, making RAG technology available to everyone regardless of hardware limitations.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •