# Retrieval-Augmented Generation (RAG) - Basic Demo

This notebook demonstrates how RAG works using FAISS and OpenAI's GPT API (or a local model substitute).

## Objectives
- Ingest documents and chunk them
- Create embeddings and store in a vector DB (FAISS)
- Retrieve top-k chunks based on query
- Send retrieved context to a language model to generate an answer

In [None]:
# Install required libraries
%pip install faiss-cpu sentence-transformers openai python-dotenv -q

## Step 1: Prepare and Embed Document Chunks

In [None]:
from sentence_transformers import SentenceTransformer
import numpy as np
import faiss

# Sample document split into chunks
doc_chunks = [
    "The mitochondria is the powerhouse of the cell.",
    "Photosynthesis occurs in the chloroplasts of plant cells.",
    "DNA contains genetic information and is located in the nucleus.",
    "Proteins are synthesized by ribosomes using mRNA.",
    "ATP is the energy currency of the cell."
]

model = SentenceTransformer("all-MiniLM-L6-v2", device="cpu")
doc_embeddings = model.encode(doc_chunks, show_progress_bar=True)

# Create FAISS index
index = faiss.IndexFlatL2(doc_embeddings.shape[1])
index.add(np.array(doc_embeddings).astype("float32"))

## Step 2: Retrieve Relevant Chunks Based on Query

In [None]:
# Define a user query
query = "What produces energy in a cell?"
query_vec = model.encode([query])

D, I = index.search(np.array(query_vec).astype("float32"), k=2)
retrieved_chunks = [doc_chunks[i] for i in I[0]]

print("Top chunks:")
for chunk in retrieved_chunks:
    print("-", chunk)

## Step 3: Use Retrieved Context for Answer Generation

In [None]:
from dotenv import load_dotenv
import os
from openai import OpenAI

# Load variables from .env file
load_dotenv()
api_key=os.getenv("OPENAI_API_KEY")
client = OpenAI(
  api_key=api_key
)

# Construct prompt using ChatCompletion format
context = "\n".join(retrieved_chunks)
# prompt = f"Context:\n{context}\n\nQuestion: {query}"
prompt = f"Question: {query}"

response = client.chat.completions.create(
    model="gpt-3.5-turbo",  # or "gpt-4" if you have access
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ],
    temperature=0.3,
    max_tokens=100
)

print(response.choices[0].message.content.strip())


## Summary
- This was a basic RAG pipeline.
- Retrieved top-matching context chunks from FAISS.
- Used the context to prompt a language model and generate a grounded answer.