🚀 A completely free, local RAG (Retrieval-Augmented Generation) system that runs entirely on your computer with no API costs! Perfect for querying your software documentation using AI.
pip install sentence-transformers faiss-cpu transformers torch requests numpymkdir docsCreate these files in the docs/ folder (see examples below)
python rag_system.pyThat's it! The system will start and you can begin asking questions.
- Python 3.8+ (Check with
python --version) - 4GB RAM minimum (for model loading)
- 2GB free disk space (for model downloads)
- Internet connection (initial setup only)
git clone https://github.com/yourusername/your-repo-name.git
cd your-repo-nameOption A: Using pip (Recommended)
pip install sentence-transformers faiss-cpu transformers torch requests numpy pathlibOption B: Using requirements.txt (if provided)
pip install -r requirements.txtTroubleshooting Dependencies:
- If
faiss-cpufails: Trypip install faiss-cpu --no-cache-dir - If
torchis slow: Visit pytorch.org for platform-specific install - For GPU support: Replace
faiss-cpuwithfaiss-gpu
Create a docs/ folder in your project directory and add your documentation files.
Required file structure:
your-project/
├── rag_system.py
├── docs/ # Create this folder
│ ├── api_guide.md # Your API documentation
│ ├── installation.txt # Installation instructions
│ └── config.json # Configuration documentation
└── README.md
Example Documentation Files:
docs/api_guide.md:
# CloudSync API Guide
## Authentication
To authenticate with the CloudSync API, include your API key in the Authorization header:
Authorization: Bearer your-api-key-here
## Available Endpoints
### Users
- `GET /api/users` - List all users
- `POST /api/users` - Create new user
- `PUT /api/users/{id}` - Update user
### Synchronization
- `POST /api/sync` - Start synchronization process
- `GET /api/sync/status` - Check sync status
- `DELETE /api/sync/{id}` - Cancel sync job
## Rate Limits
- 1000 requests per hour per API key
- 10 concurrent connections maximum
## Error Codes
- 401: Unauthorized - Invalid API key
- 429: Too Many Requests - Rate limit exceeded
- 500: Internal Server Error - Contact support
docs/installation.txt:
CloudSync Installation Guide
SYSTEM REQUIREMENTS:
- Python 3.8 or higher
- 4GB RAM minimum (8GB recommended)
- 1GB free disk space
- PostgreSQL 12+ database
- Redis server (for caching)
REQUIRED PORTS:
- Port 8080: Main application
- Port 5432: PostgreSQL database
- Port 6379: Redis cache
- Port 443: HTTPS traffic
INSTALLATION STEPS:
1. Install Python dependencies:
pip install -r requirements.txt
2. Configure database:
- Create database: cloudsync_production
- Update connection string in config.yaml
- Run migrations: python manage.py migrate
3. Configure Redis:
- Install Redis server
- Update Redis URL in config.yaml
- Test connection: redis-cli ping
4. Start the application:
python app.py --port 8080
5. Verify installation:
- Visit: http://localhost:8080/health
- Check logs: tail -f logs/application.log
FIREWALL CONFIGURATION:
- Allow inbound: 8080, 443
- Allow outbound: 80, 443, 5432, 6379
- Block all other ports
TROUBLESHOOTING:
- Database connection fails: Check PostgreSQL service status
- Redis errors: Verify Redis server is running
- Port conflicts: Use netstat -an to check port usage
- Permission errors: Run as administrator/sudo
docs/config.json:
{
"configuration_guide": {
"database": {
"connection_string": "postgresql://user:pass@localhost:5432/cloudsync_db",
"pool_size": 20,
"timeout": 30,
"ssl_mode": "require"
},
"redis": {
"url": "redis://localhost:6379",
"max_connections": 10,
"timeout": 5
},
"sync_settings": {
"real_time_sync": true,
"batch_size": 100,
"retry_attempts": 3,
"sync_interval": 300
},
"security": {
"api_key_required": true,
"rate_limit_per_hour": 1000,
"max_file_size": "10MB",
"allowed_file_types": ["pdf", "docx", "txt", "md"]
},
"logging": {
"level": "INFO",
"file_path": "logs/application.log",
"max_size": "100MB",
"backup_count": 5
}
}
}python rag_system.pyWhat happens when you run:
- System initializes (downloads models on first run - be patient!)
- Checks for documentation files in
docs/folder - Processes and creates embeddings for your documents
- Starts interactive Q&A session
🚀 FREE RAG SYSTEM FOR SOFTWARE DOCUMENTATION
============================================================
This system runs entirely on your computer - no API costs!
🔧 Initializing RAG system...
(First run will download models - please be patient)
📥 Loading embedding model: all-MiniLM-L6-v2
📥 Loading Hugging Face model (this may take a few minutes on first run)...
✅ Hugging Face model loaded successfully!
📂 Checking for documentation files...
✅ Found: docs/api_guide.md
✅ Found: docs/installation.txt
✅ Found: docs/config.json
📚 Loading and processing 3 documents...
🔄 Creating embeddings for 12 documents...
✅ Added 12 documents to vector store
✅ Successfully loaded and processed all documents!
💾 Saving RAG system...
✅ RAG system saved!
============================================================
🎉 RAG SYSTEM READY!
Ask questions about your CloudSync documentation.
Type 'quit' to exit
============================================================
After the first run, the system starts much faster as models are cached.
🤔 Your question: How do I authenticate with the API?
🤔 Your question: What are the system requirements for installation?
🤔 Your question: How do I configure the database connection?
🤔 Your question: What endpoints are available in the API?
🤔 Your question: What firewall ports need to be open?
🤔 Your question: How do I enable real-time sync?
🤔 Your question: How do I authenticate with the API?
🤖 Thinking... (this may take 10-30 seconds)
✅ Answer:
To authenticate with the CloudSync API, you need to include your API key in the Authorization header using the Bearer token format: "Authorization: Bearer your-api-key-here". The API requires authentication for all endpoints and has rate limits of 1000 requests per hour per API key.
📚 Sources:
📄 api_guide.md
------------------------------------------------------------
- Ask any question about your documentation
quitorexitorq- Exit the program- Ctrl+C - Force quit
Option 1: Hugging Face (Default - Simpler Setup)
# In rag_system.py, line ~154:
rag = FreeRAGSystem(llm_backend="huggingface", always_rebuild=True)Option 2: Ollama (Better Responses)
# In rag_system.py, line ~154:
rag = FreeRAGSystem(llm_backend="ollama", always_rebuild=True)To use Ollama:
- Install Ollama: ollama.ai
- Install a model:
ollama pull llama2 - Start Ollama:
ollama serve - Change the backend in the code
# Adjust chunk size and overlap
processor = DocumentProcessor(
chunk_size=512, # Characters per chunk
chunk_overlap=50 # Overlap between chunks
)For Speed:
- Use
chunk_size=256 - Set
always_rebuild=Falseafter first run - Use fewer retrieval results:
k=3
For Quality:
- Use
chunk_size=1024 - Use more retrieval results:
k=5 - Use Ollama backend with larger models
❌ "No module named 'sentence_transformers'"
pip install sentence-transformers❌ "faiss-cpu not found"
pip install faiss-cpu --no-cache-dir❌ "No documents loaded"
- Check that
docs/folder exists - Verify files exist:
docs/api_guide.md,docs/installation.txt,docs/config.json - Check file permissions
❌ "Error initializing RAG system"
# Install all dependencies
pip install sentence-transformers faiss-cpu transformers torch requests numpy
# If still failing, try one by one:
pip install sentence-transformers
pip install faiss-cpu
pip install transformers
pip install torch❌ "Model download is slow"
- First run downloads ~400MB of models
- Be patient (2-5 minutes)
- Models are cached for future runs
❌ "Out of memory"
- Close other applications
- Use smaller chunk_size (256)
- Reduce number of documents
❌ "Ollama connection failed"
# Make sure Ollama is installed and running
ollama serve
# Make sure you have a model
ollama pull llama2
# Test connection
curl http://localhost:11434/api/versionAdd print statements to see what's happening:
# Check document loading
print(f"Looking for files: {doc_paths}")
for path in doc_paths:
print(f"File exists: {Path(path).exists()} - {path}")# Check Python version
python --version
# Check installed packages
pip list | grep -E "(sentence|faiss|transformers|torch)"
# Check available memory
# Windows: wmic OS get TotalVisibleMemorySize,FreePhysicalMemory
# Linux/Mac: free -hyour-project/
├── rag_system.py # Main application
├── README.md # This file
├── .gitignore # Excludes large files
├── requirements.txt # Dependencies
├── docs/ # Your documentation
│ ├── api_guide.md
│ ├── installation.txt
│ └── config.json
├── rag_index/ # Generated (auto-created)
│ ├── index.faiss # Vector embeddings
│ └── documents.json # Document metadata
└── .cache/ # Model cache (auto-created)
└── huggingface/
└── (model files)
- Python 3.8+ installed
- Dependencies installed:
pip install sentence-transformers faiss-cpu transformers torch requests numpy - Created
docs/folder - Added documentation files (api_guide.md, installation.txt, config.json)
- Run:
python rag_system.py - Wait for models to download (first run only)
- Start asking questions!
- Write clear questions: "How do I configure the database?" vs "database stuff"
- Use specific terms: Use exact terminology from your docs
- Ask follow-up questions: Build on previous answers
- Check sources: The system shows which documents it used
- Update docs regularly: The system rebuilds embeddings to catch changes
🎉 You're ready to go! Run python rag_system.py and start querying your documentation with AI!