This project implements a Retrieval-Augmented Generation (RAG) system using OpenAI's GPT-4, Pinecone for vector storage, and LangChain for seamless document retrieval and query processing.
- ✅ Document Processing: Supports TXT, PDF, and CSV files.
- ✅ Embeddings with OpenAI: Converts text into vector embeddings.
- ✅ Efficient Search: Uses Pinecone to store and retrieve relevant information.
- ✅ Modular Architecture: Well-structured codebase for easy scalability and maintenance.
- ✅ Logging & Error Handling: Helps identify issues efficiently.
git clone https://github.com/patrick-cuppi/rag-system
cd rag_systempip install -r requirements.txtCreate a .env file in the root directory and add the following:
OPENAI_API_KEY=your_openai_api_key
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENV=your_pinecone_environment
PINECONE_INDEX=your_pinecone_index_namePlace your TXT, PDF, or CSV files inside the data/documents/folder.
python main.pyYou'll be prompted to enter a question based on the stored documents.
Loads Documents → Extracts text from supported formats.
Embeds the Content → Converts documents into vector representations using OpenAI.
Stores in Pinecone → Enables fast and efficient retrieval.
Retrieves & Generates Answers → Finds relevant information and uses GPT-4 to generate a response.
Enter your question (or type 'exit' to quit): What is the main topic of the document?🔹 Response: "The document discusses advanced machine learning techniques for image processing."
If you encounter any issues:
Ensure API keys are correct in .env.
Verify Pinecone index exists.
Run pip install -r requirements.txt to reinstall dependencies.
Check logs in rag_pipeline.py for detailed errors.
This project is licensed under the MIT License.
Pull requests and improvements are welcome! Feel free to submit issues or enhancements.
