A lightweight RAG (Retrieval-Augmented Generation) package that runs entirely on CPU without requiring GPUs or cloud services.
tiny-rag is designed to make RAG accessible to everyone by removing the typical hardware and service dependencies. While traditional RAG systems require GPU-accelerated embedding models and reranker models (like OpenAI's text-embedding services), tiny-rag provides a fully functional RAG implementation that runs efficiently on CPU-only environments.
tiny-rag leverages static-embedding-japanese, an ultra-fast embedding model that achieves 126x faster CPU inference compared to traditional transformer models while maintaining strong performance on Japanese text tasks.
- CPU-Only Execution: No GPU required - runs on any standard computer
- No Cloud Dependencies: Fully offline operation without external API calls
- Lightweight: Minimal resource footprint for embedding and reranking
- Easy to Use: Simple API for quick integration into your projects
- Cost-Effective: No cloud service fees or expensive hardware requirements
- Japanese Language Support: Currently optimized for Japanese text processing
pip install tiny-ragfrom tiny_rag import TinyRAG
# Initialize tiny-rag
rag = TinyRAG()
# Add documents (Japanese text)
rag.add_documents([
"ドキュメント1の内容...",
"ドキュメント2の内容...",
])
# Query (in Japanese)
results = rag.query("あなたの質問をここに入力")- Local Development: Test RAG pipelines without cloud costs
- Edge Computing: Deploy RAG on resource-constrained devices
- Privacy-Sensitive Applications: Keep all data processing local
- Educational Projects: Learn RAG concepts without infrastructure overhead
- Japanese Text Processing: Optimized for Japanese language applications
tiny-rag uses static-embedding-japanese, which provides:
- Ultra-fast CPU Performance: 126x faster inference than comparable transformer models
- 1024-dimensional embeddings: Can be reduced to 32-512 dimensions for efficiency
- Simple Architecture: Uses token embedding averaging instead of complex attention mechanisms
- Strong Benchmark Performance: JMTEB score of 67.17 (micro-average)
- Matryoshka Representation Learning: Enables efficient dimension reduction without retraining
For improved retrieval accuracy, tiny-rag employs japanese-reranker-tiny-v2:
- Ultra-Compact: Only 29.4M parameters with 3 layers
- Blazing Fast: 50-65% faster query processing than larger models
- Good Performance: Average score of 0.8138 on Japanese benchmarks
- CPU-Optimized: Specifically designed for CPU and Apple Silicon
- Modern Architecture: Based on ModernBERT with 256 hidden dimensions
- Strong Benchmark Results: JaCWIR (0.9287), JSQuAD (0.9608), MIRACL (0.7201)
Both models are specifically chosen for their exceptional CPU performance while maintaining high-quality results for Japanese text processing.
tiny-rag has been evaluated on standard Japanese retrieval benchmarks to demonstrate its effectiveness:
- JQaRA (Japanese Question Answering with Retrieval Augmentation): 144,372 documents
- JaCWIR (Japanese Casual Web IR): 513,107 documents
| Dataset | NDCG@10 | MRR@10 | MAP@10 | Hits@10 | Avg Query Time |
|---|---|---|---|---|---|
| JQaRA | 0.8553 | 0.8796 | - | - | 0.771 sec |
| JaCWIR | - | - | 0.8368 | 0.8646 | 0.925 sec |
# Quick test with 5 queries per dataset
make bench
# Full benchmark evaluation
make bench-full
# Custom benchmark
python -m bench.benchmark --dimensions 512 --max-queries 10--dimensions: Embedding dimensions (32, 64, 128, 256, 512, 1024) - Default: 1024--max-queries: Maximum queries per dataset (for testing)
- High Accuracy: Achieves 0.84-0.88 scores across all metrics
- Practical Speed: Query processing under 1 second even for large datasets
- Scalable: Performance scales reasonably with dataset size
- CPU-Friendly: All processing runs efficiently on standard hardware
- Python 3.13+
- No GPU required
- Minimal RAM requirements
# Clone the repository
git clone https://github.com/sonesuke/tiny-rag.git
cd tiny-rag
# Install development dependencies
uv sync
# Run tests
uv run pytest --cov=src
# Run benchmarks
make bench # Quick test (5 queries)
make bench-full # Full evaluationContributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- sonesuke - GitHub
Built with a focus on accessibility and efficiency, making RAG technology available to everyone regardless of hardware limitations.