A powerful Retrieval Augmented Generation (RAG) application that combines real-time web search, Wikipedia knowledge, and local language models to provide comprehensive answers to user questions.
- Multi-Source Knowledge Retrieval: Integrates Google Search and Wikipedia for comprehensive information gathering
- Local LLM Processing: Uses Ollama with Llama 3.1 for intelligent response generation
- Vector Database Storage: Implements Chroma DB with NVIDIA embeddings for semantic search
- Real-time Streaming: Provides live response generation with custom UI feedback
- Persistent Knowledge Base: Maintains conversation history and learned information
- User-Friendly Interface: Clean Streamlit web application with expandable output containers
Before running this application, ensure you have:
- Python 3.8 or higher
- Anaconda distribution installed
- Ollama installed and running
- Google Serper API key
- NVIDIA API key
- At least 8GB RAM (recommended for local LLM)
git clone https://github.com/yourusername/rag-qa-system.git
cd rag-qa-systemconda create -n env_name python=3.8
conda activate env_namepip install -r requirements.txt# Install Ollama (visit https://ollama.ai/ for OS-specific instructions)
# After installation on your machine, run
ollama --version
# Pull the required model
ollama pull llama3.1Edit the file named id.py in the project root directory:
# id.py
SERPER_API_KEY = "your_serper_api_key_here"
nvapi_key = "your_nvidia_api_key_here"- Visit Serper.dev
- Sign up for a free account
- Get your API key from the dashboard
- Add it to your
id.pyfile
- Visit NVIDIA AI Foundation Models
- Sign up and get your API key
- Add it to your
id.pyfile
streamlit run RAG.pyOpen your browser and navigate to http://localhost:8501
- Enter your question in the text input field
- Watch as the system retrieves information and generates answers in real-time
- Expand the output container to see the complete response
rag-qa-system/
├── app.py # Main application file
├── id.py # API keys configuration
├── requirements.txt # Python dependencies
├── README.md # Project documentation
├── chroma_db/ # Vector database storage (created automatically)
└── .gitignore # Git ignore file
To use a different Ollama model, modify the llm initialization in RAG.py:
llm = ChatOllama(model="your-preferred-model", temperature=0, streaming=True)Modify the text splitter parameters for different chunk sizes:
splitter = CharacterTextSplitter(chunk_size=1500, chunk_overlap=300)Change the persistence directory for the Chroma database:
persist_directory = "./your-custom-db-path"1. Ollama Connection Error
# Ensure Ollama is running
ollama serve
# Pull the required model
ollama pull llama3.12. API Key Errors
- Verify your API keys are correctly set in
id.py - Check API key quotas and permissions
3. ChromaDB Persistence Issues
- Delete the
chroma_dbfolder and restart the application - Ensure proper write permissions in the project directory
4. Memory Issues
- Reduce chunk size in the text splitter
- Close other memory-intensive applications
- Never commit your
id.pyfile to version control - Add
id.pyto your.gitignorefile - Use environment variables in production deployments
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- LangChain for the RAG framework
- Ollama for local LLM hosting
- Streamlit for the web interface
- ChromaDB for vector storage
- NVIDIA for embedding models
Note: Make sure to keep your API keys secure and never share them publicly!