Skip to content

indrahacks/RAG_Hackathon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generative AI RAG Pipeline using LLaMA & FAISS

A retrieval-augmented generation (RAG) system that combines document retrieval with local AI-powered question answering. Ask questions about your documents and get accurate, context-aware answers from LLaMA 3 running locally via Ollama.

Inspiration

The project was inspired by the growing need for efficient, privacy-preserving AI systems that can answer questions based on personal or domain-specific documents without relying on external APIs. With the rise of large language models like LLaMA, we wanted to combine retrieval-augmented generation (RAG) with local vector search to create a tool that enhances factual accuracy and reduces hallucinations in AI responses, making it accessible for developers and researchers.

What it does

This RAG Pipeline processes user-provided documents (text and PDF files), converts them into vector embeddings, stores them in a FAISS index for fast similarity search, retrieves relevant context based on queries, and generates natural language answers using a local LLaMA model via Ollama. It provides an interactive Q&A interface where users can ask questions about their documents, with answers grounded in the retrieved content, ensuring reliable and context-aware responses.

How we built it

We built the system using Python with key libraries:

  • LangChain: For the RAG pipeline orchestration
  • Sentence Transformers: For text embeddings
  • FAISS: For vector storage and similarity search
  • PyPDF: For document parsing
  • LlamaIndex: For indexing
  • Ollama: For running LLaMA 3 locally

The architecture includes modular components—data loader, embeddings generator, vector store, retriever, and generator—integrated into a main script that handles initialization and interactive querying. The vector store persists across sessions for efficiency, and the pipeline uses LangChain's RetrievalQA chain to combine retrieval and generation.

Challenges we ran into

  • Local LLM Integration: Ensuring model availability and compatibility with Ollama
  • Vector Database Optimization: Handling large document sets without memory overflow
  • Document Format Handling: Parsing diverse formats (especially PDFs with encoding issues)
  • Embedding Quality: Handling short or noisy text effectively
  • Balance: Achieving coherence between retrieval relevance and generation quality
  • Debugging: Managing end-to-end pipeline error handling for missing dependencies

Accomplishments we're proud of

Fully functional, end-to-end RAG system running entirely locally
No external APIs required—complete data privacy
Modular, extensible architecture for easy improvements
Efficient FAISS-based vector search for scalability
Practical integration of LLaMA for high-quality text generation
Interactive user experience with source document attribution

What we learned

  • Architecture: Importance of modular design for maintainability and scalability
  • Vector Databases: Trade-offs between speed and accuracy in FAISS
  • Local LLMs: Value of self-hosted solutions for data privacy
  • Optimization: Balancing computational resources with performance
  • Prompt Engineering: Techniques to improve answer quality through context and phrasing
  • Retrieval Strategies: Advanced methods for better context relevance

What's next

Planned improvements:

  • Support for more document types (images via OCR)
  • Advanced chunking strategies for better retrieval
  • Multi-modal inputs and processing
  • Domain-specific LLaMA fine-tuning
  • Web UI for broader accessibility
  • Hybrid retrieval (keyword + semantic search)
  • Cloud deployment options
  • Evaluation metrics and user feedback loops

Built with

Languages & Frameworks:

  • Python
  • LangChain
  • Sentence Transformers
  • FAISS (Vector Database)
  • PyPDF
  • LlamaIndex
  • Ollama API

Platforms: Local execution (Windows/Linux/Mac)

Cloud Services: None (fully local)


Installation & Usage

Prerequisites

  • Python 3.8+
  • Ollama installed and running
  • LLaMA 3 model downloaded: ollama pull llama3

Setup

  1. Clone the repository

    git clone https://github.com/indrahacks/RAG_Hackathon.git
    cd RAG_Hackathon
  2. Create virtual environment

    python -m venv .venv
    .venv\Scripts\activate  # On Windows
    source .venv/bin/activate  # On macOS/Linux
  3. Install dependencies

    pip install -r requirements.txt
  4. Ensure Ollama is running

    ollama serve
  5. Add sample documents (place .txt or .pdf files in data/sample_documents/)

  6. Run the pipeline

    python main.py
  7. Ask questions - Follow the interactive prompts to query your documents


Project Structure

rag_project/
├── main.py                    # Entry point
├── requirements.txt           # Python dependencies
├── config/
│   ├── __init__.py
│   └── settings.py           # Configuration settings
├── data/
│   ├── processed/            # (Reserved for processed data)
│   └── sample_documents/     # Place your .txt/.pdf files here
├── faiss_index/
│   └── index.faiss           # Persisted vector index
└── src/
    ├── __init__.py
    ├── data_loader.py        # Document loading logic
    ├── embeddings.py         # Embedding generation
    ├── vector_store.py       # FAISS integration
    ├── retriever.py          # Context retrieval
    ├── generator.py          # LLaMA integration
    └── rag_pipeline.py       # RAG chain orchestration

How to Present to a Judge

  1. Setup Environment: Install dependencies, start Ollama, pull LLaMA 3
  2. Prepare Data: Add sample documents to data/sample_documents/
  3. Run the System: Execute python main.py to initialize the pipeline
  4. Demonstrate Features:
    • Show step-by-step initialization output
    • Ask sample questions about documents
    • Highlight retrieval accuracy and generation quality
    • Display source documents used for answers
  5. Key Points to Emphasize:
    • Scalability: Handles multiple documents efficiently
    • Privacy: No cloud services or external APIs
    • Accuracy: Retrieval-augmented approach reduces hallucinations
    • Local Execution: Runs entirely on-device for data security

Generative AI Implementation

LLaMA 3 is the core generative AI model used for the hardest task—converting retrieved context into natural language answers. It's used in:

  • src/generator.py: Model initialization and connection
  • src/rag_pipeline.py: Integration into RetrievalQA chain
  • main.py: Interactive question answering loop

The choice of LLaMA 3 ensures high-quality, coherent responses while maintaining full control over the model through local execution.


License

MIT License


Built with at the RAG Hackathon | GitHub Profile

About

It gives a project demonstration on the RAG project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages