## Retrieval-augmented generation(RAG) using Open-Source tools

### Overview of the Project

This project demonstrates an implementation of Retrieval-Augmented Generation (RAG) using an open-source Large Language Model (LLM)-llama3.2 and Pinecone as a vector database.

RAG enhances LLMs by fetching real-time data from your documents before generating answers. It ensures responses come directly from your trusted documents—not guesses—making it essential for domains where accuracy and privacy matter


#### Documents collection and Library installation

Create a folder named `docs`in the same directory as your notebook. Add your files to `./docs/` (supports: `.pdf`, `.docx`, `.txt`).

**Example files to use:**  
- University course syllabi (PDF)   
- Research papers (PDF)
- Company policy documents (Word/PDF) 
- Product manuals (PDF) 

*Required*: Add at least 2-3 documents for the RAG pipeline to work.

### Install Dependencies  
Run the cell below to install required dependencies

**Key libraries:**  
- `llama2` (via Ollama) - Local LLM  
- `sentence-transformers` 
- `pinecone-client` 
These pachages are needed for Document parsing (PDF, DOCX), Text chunking and embedding, Vector database (Pinecone) and LLM integration via LangChain and Ollama

*Note*: Requires Python ≥3.8 and pip

In [None]:
pip install pinecone-client langchain llama-cpp-python sentence-transformers chromadb
pip install pymupdf python-docx

#### Load and Prepare Documents (TXT, PDF, Word)
Read documents from a folder (`docs/`) and prepare them for the next steps

In [1]:
import os
import fitz  # PyMuPDF for PDF processing
import docx

# Function to extract text from different document formats
def extract_text_from_file(file_path):
    if file_path.endswith('.txt'):
        with open(file_path, 'r', encoding='utf-8') as file:
            return file.read()
    elif file_path.endswith('.pdf'):
        text = ''
        with fitz.open(file_path) as pdf:
            for page in pdf:
                text += page.get_text()
        return text
    elif file_path.endswith('.docx'):
        doc = docx.Document(file_path)
        return '\n'.join([para.text for para in doc.paragraphs])
    else:
        return None  # Unsupported format

In [2]:
# Load documents from the 'docs/' folder
def load_documents(folder_path='docs/'):
    documents = []
    for filename in os.listdir(folder_path):
        file_path = os.path.join(folder_path, filename)
        text = extract_text_from_file(file_path)
        if text:
            documents.append(text)
    return documents

docs = load_documents()

#### Chunking the documents
used a chunk size of 500 and chunk overlap of 50

In [3]:
# Split documents into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

chunks = text_splitter.split_text(' '.join(docs))

print(f'Loaded {len(chunks)} text chunks.')

Loaded 13663 text chunks.


#### Embed chunks and Store in Pinecone
we use `sentence-transformers` to convert each chunks into vector embeddings and store them in Pinecone.

In [4]:
from pinecone import Pinecone, ServerlessSpec

with open("Pinecone_API_key.txt", "r", encoding="utf-8") as file:
    API_KEY = file.read()

# Initialize Pinecone (Replace 'YOUR_API_KEY' with your Pinecone API key)
pc = Pinecone(api_key=API_KEY)

In [6]:
index_name = "rag-pipeline"
dimension = 384  # Dimension of the model we're using (e.g., all-MiniLM-L6-v2)

# Create index if it doesn't exist
if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        dimension=dimension,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )

# Connect to the index
index = pc.Index(index_name)

In [8]:
from sentence_transformers import SentenceTransformer

# Load embedding model: to map sentences / text to embeddings.
embedder = SentenceTransformer("all-MiniLM-L6-v2") 

In [9]:
# Create vector records for the chunks
vectors = [
    {
        "id": str(i),
        "values": embedder.encode(chunk).tolist(),
        "metadata": {"text": chunk}
    }
    for i, chunk in enumerate(chunks)
]

In [10]:
# Upsert = Update values and metadata if exist, Insert if not. This allows safe re-run of code without creating duplicates.
# Upsert in small batches to avoid payload limit
batch_size = 50
for i in range(0, len(vectors), batch_size):
    batch = vectors[i:i + batch_size]
    index.upsert(batch)

print(f"✅   Inserted {len(vectors)} vectors into '{index_name}'.")

✅ Inserted 13663 vectors into 'rag-tutorial'.


#### Load an Open-Source LLM and embedd your question
We use Open-Source LLM(Ollama) locally for answer generation.

In [13]:
from langchain_ollama import OllamaLLM
# Creates an instance of the LLM interface, pointing to a local Ollama model called "llama3.2".  
model = OllamaLLM(model="llama3.2")

question = "Who is the professor of the course Data Science ?" # Your Question
# Embed question
question_vector = embedder.encode(question).tolist()

#### Retrieve Relevant Information from pinecone db and Prompt Engineering**
We retrieve top-10 relevant documents and use them for answering questions.

In [22]:
# Query Pinecone
results = index.query(
    vector=question_vector,
    top_k=10,
    include_metadata=True
)

#  Prepare retrieved context
contexts = [match["metadata"]["text"] for match in results["matches"]]
context_text = "\n\n".join(contexts)

# Build prompt
prompt = f"""Answer the question based only on the context below.

Context:
{context_text}

Question: {question}
Answer:"""

# Query local LLM via Ollama to get an answer
response = model.invoke(prompt)