# Retrieval-Augmented Generation Pipeline (Gemini)

This notebook implements the **final RAG pipeline** by integrating:
- Semantic retrieval from a FAISS vector store
- Answer generation using Google Gemini Pro

The system produces **grounded answers** along with retrieved source chunks.


In [None]:
import os

from dotenv import load_dotenv

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains import RetrievalQA


In [None]:
load_dotenv()

print("Environment variables loaded.")


In [None]:
VECTOR_DB_PATH = "../vectorstore/faiss_index"

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

vectorstore = FAISS.load_local(
    VECTOR_DB_PATH,
    embeddings,
    allow_dangerous_deserialization=True
)

print("FAISS vector store loaded.")


In [None]:
llm = ChatGoogleGenerativeAI(
    model="gemini-pro",
    google_api_key=os.getenv("GOOGLE_API_KEY"),
    temperature=0.2
)

print("Gemini LLM initialized.")


In [None]:
retriever = vectorstore.as_retriever(
    search_kwargs={"k": 4}
)

print("Retriever configured.")


In [None]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff",
    return_source_documents=True
)

print("RAG pipeline constructed.")


In [None]:
query = "What is the main topic discussed in this video?"

response = qa_chain(query)

print("Answer:\n")
print(response["result"])


In [None]:
print("\nRetrieved source chunks:\n")

for i, doc in enumerate(response["source_documents"], start=1):
    print(f"--- Source {i} ---")
    print(doc.page_content[:400])
    print()


In [None]:
follow_up_query = "Explain the key idea in simple terms."

follow_up_response = qa_chain(follow_up_query)

print("Follow-up Answer:\n")
print(follow_up_response["result"])


In [None]:
print("RAG pipeline executed successfully.")


## Observations

- The RAG system generates coherent and context-aware answers.
- Retrieved chunks align well with the user query, validating the embedding and chunking strategy.
- Gemini Pro produces concise and grounded responses when provided with retrieved context.

This confirms the effectiveness of the end-to-end RAG pipeline.


## Summary

- Loaded a persisted FAISS vector store
- Retrieved semantically relevant transcript chunks
- Generated grounded answers using Gemini Pro
- Returned source documents for transparency

The RAG pipeline is now **complete and reusable**.

This notebook serves as the **final experimental validation** before
consolidating the system into `main.py`.


In [1]:
"""
PROJECT: NeuralTranscript: Semantic Search & Q&A for YouTube Content
MODULE: 04_RAG_QUERY_ENGINE
-------------------------------------------------------------------------
DESCRIPTION:
This is the final stage of the pipeline. It takes a user query, retrieves
relevant context from the FAISS vector store, and uses Google Gemini to 
generate a precise, context-aware answer based on the video transcript.

AUTHOR: Engr. Inam Ullah Khan
-------------------------------------------------------------------------
"""

import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains.retrieval_qa.base import RetrievalQA
from langchain.prompts import PromptTemplate

# --- 1. CONFIGURATION & ENV SETUP ---
load_dotenv() # Ensure your GOOGLE_API_KEY is in your .env file
INDEX_PATH = "data/faiss_index"

# --- 2. CORE FUNCTIONS ---

def load_vector_store():
    """
    Loads the FAISS index. Note: We must provide the same embedding 
    function used during indexing to 'understand' the vectors.
    """
    print("üìÇ Loading Vector Database...")
    
    # We use the same model as Notebook 03 for consistency
    # (If you used HuggingFace in 03, we use it here to load)
    from langchain_huggingface import HuggingFaceEmbeddings
    embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
    
    vector_db = FAISS.load_local(
        INDEX_PATH, 
        embeddings, 
        allow_dangerous_deserialization=True # Required for loading local pkl files
    )
    return vector_db

def build_rag_chain(vector_db):
    """
    Sets up the RAG pipeline: Retriever + Prompt + Gemini LLM.
    """
    print("ü§ñ Initializing Google Gemini Pro...")
    
    # 1. Initialize Gemini
    llm = ChatGoogleGenerativeAI(
        model="gemini-1.5-pro",
        temperature=0.2, # Lower temperature for factual accuracy
        top_p=0.9
    )
    
    # 2. Create a Custom Prompt (Human-Centered Design)
    template = """
    You are an AI Assistant specialized in analyzing video content. 
    Use the following pieces of retrieved context from a video transcript to answer the question. 
    If you don't know the answer based on the context, just say you don't know. 
    Keep the answer concise and professional.

    CONTEXT:
    {context}

    QUESTION: 
    {question}

    HELPFUL ANSWER:
    """
    
    QA_CHAIN_PROMPT = PromptTemplate(
        input_variables=["context", "question"],
        template=template,
    )

    # 3. Create the Chain
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff", # "Stuffs" all retrieved context into the prompt
        retriever=vector_db.as_retriever(search_kwargs={"k": 3}),
        chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
        return_source_documents=True # Critical for citations
    )
    
    return qa_chain

# --- 3. EXECUTION PIPELINE ---

if __name__ == "__main__":
    print("--- Starting NeuralTranscript Query Engine ---")
    
    # Step 1: Load the "Memory"
    db = load_vector_store()
    
    # Step 2: Initialize the "Brain"
    neural_qa = build_rag_chain(db)
    
    # Step 3: Interactive Query
    user_query = "What is Demis Hassabis's view on the potential of AI to solve scientific problems?"
    
    print(f"\n‚ùì User Query: {user_query}")
    print("‚è≥ Processing answer...\n")
    
    response = neural_qa.invoke({"query": user_query})
    
    # Step 4: Display Result
    print("‚ú® AI RESPONSE:")
    print(response["result"])
    
    print("\nüìö SOURCES USED:")
    for doc in response["source_documents"]:
        print(f"- Chunk ID: {doc.metadata.get('chunk_id')} (Source: {doc.metadata.get('source')})")

ModuleNotFoundError: No module named 'langchain.chains'

In [4]:
"""
PROJECT: NeuralTranscript: Semantic Search & Q&A for YouTube Content
MODULE: 04_RAG_QUERY_ENGINE (v2026 Fixed)
"""

import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

load_dotenv()

def build_rag_chain(vector_db):
    print("ü§ñ Initializing Gemini 1.5 Pro & Modern RAG Chain...")
    
    # 1. Initialize Gemini
    llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.2)
    
    # 2. Define the System Prompt (Human-Centered Instruction)
    system_prompt = (
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer the question. "
        "If you don't know the answer, say that you don't know. "
        "Use three sentences maximum and keep the answer concise.\n\n"
        "{context}"
    )
    
    prompt = ChatPromptTemplate.from_messages([
        ("system", system_prompt),
        ("human", "{input}"),
    ])

    # 3. Create the "Stuff Documents" Chain (Handles the context)
    question_answer_chain = create_stuff_documents_chain(llm, prompt)

    # 4. Create the final Retrieval Chain
    rag_chain = create_retrieval_chain(
        vector_db.as_retriever(search_kwargs={"k": 3}), 
        question_answer_chain
    )
    
    return rag_chain

# --- EXECUTION ---
if __name__ == "__main__":
    # (Assuming load_vector_store() from previous response is above)
    db = load_vector_store()
    neural_qa = build_rag_chain(db)
    
    query = "What is Demis Hassabis's view on the potential of AI to solve scientific problems?"
    
    # In modern LangChain, we use 'input' instead of 'query'
    response = neural_qa.invoke({"input": query})
    
    print("\n‚ú® AI RESPONSE:")
    print(response["answer"]) # Note: Key is 'answer' in modern chains

ModuleNotFoundError: No module named 'langchain.chains'

In [5]:
"""
PROJECT: NeuralTranscript: Semantic Search & Q&A for YouTube Content
MODULE: 04_RAG_QUERY_ENGINE (Modern LCEL Version)
"""
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_google_genai import ChatGoogleGenerativeAI

def build_rag_chain(vector_db):
    print("ü§ñ Initializing Modern LCEL RAG Chain with Gemini...")
    
    llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.2)
    
    # 1. Define the Human-Centered Prompt
    template = """You are a research assistant for the NeuralTranscript project. 
    Use the transcript excerpts below to answer the user's question.
    
    Context: {context}
    Question: {question}
    
    Answer:"""
    
    prompt = ChatPromptTemplate.from_template(template)
    
    # 2. Define the LCEL Chain Logic
    # This replaces the 'create_retrieval_chain' with a transparent flow
    rag_chain = (
        {"context": vector_db.as_retriever(), "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )
    
    return rag_chain

# --- EXECUTION ---
if __name__ == "__main__":
    db = load_vector_store()
    neural_qa = build_rag_chain(db)
    
    # In LCEL, we simply pass the string question
    response = neural_qa.invoke("How does Demis view AI's role in science?")
    print(f"\n‚ú® AI RESPONSE:\n{response}")

NameError: name 'load_vector_store' is not defined

In [6]:
"""
PROJECT: NeuralTranscript: Semantic Search & Q&A for YouTube Content
MODULE: 04_RAG_QUERY_ENGINE
-------------------------------------------------------------------------
DESCRIPTION:
Final stage of the pipeline. It takes a user query, retrieves
relevant context from the FAISS vector store, and uses Google Gemini to 
generate a precise, context-aware answer based on the video transcript.

AUTHOR: Engr. Inam Ullah Khan
-------------------------------------------------------------------------
"""

import os
from dotenv import load_dotenv

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser


# --------------------------------------------------
# 1. CONFIGURATION & ENVIRONMENT SETUP
# --------------------------------------------------

load_dotenv()  # Make sure GOOGLE_API_KEY is inside your .env file
INDEX_PATH = "data/faiss_index"


# --------------------------------------------------
# 2. CORE FUNCTIONS
# --------------------------------------------------

def load_vector_store():
    """
    Loads the FAISS index.
    IMPORTANT: The embedding model must match the one used during indexing.
    """
    print("üìÇ Loading Vector Database...")

    embeddings = HuggingFaceEmbeddings(
        model_name="all-MiniLM-L6-v2"
    )

    vector_db = FAISS.load_local(
        INDEX_PATH,
        embeddings,
        allow_dangerous_deserialization=True
    )

    return vector_db


def build_rag_chain(vector_db):
    """
    Builds modern LangChain v1 RAG pipeline using LCEL.
    """
    print("ü§ñ Initializing Google Gemini Pro...")

    # 1. Initialize Gemini LLM
    llm = ChatGoogleGenerativeAI(
        model="gemini-2.5-flash",
        temperature=0.2,
        top_p=0.9
    )

    # 2. Prompt Template (LCEL style)
    prompt = ChatPromptTemplate.from_template("""
You are an AI Assistant specialized in analyzing video content.
Use the following transcript context to answer the question.
If the answer is not contained in the context, say you don't know.
Keep the answer concise and professional.

CONTEXT:
{context}

QUESTION:
{question}

ANSWER:
""")

    # 3. Create Retriever
    retriever = vector_db.as_retriever(search_kwargs={"k": 3})

    # 4. Build RAG Chain using LCEL
    rag_chain = (
        {
            "context": retriever,
            "question": RunnablePassthrough()
        }
        | prompt
        | llm
        | StrOutputParser()
    )

    return rag_chain


# --------------------------------------------------
# 3. EXECUTION PIPELINE
# --------------------------------------------------

if __name__ == "__main__":

    print("\n--- Starting NeuralTranscript Query Engine ---\n")

    # Step 1: Load Vector Database
    db = load_vector_store()

    # Step 2: Build RAG Chain
    neural_qa = build_rag_chain(db)

    # Step 3: User Query
    user_query = "What is the main topic of this video??"

    print(f"\n‚ùì User Query:\n{user_query}")
    print("\n‚è≥ Processing answer...\n")

    # Step 4: Invoke Chain
    response = neural_qa.invoke(user_query)

    # Step 5: Display Result
    print("‚ú® AI RESPONSE:\n")
    print(response)
    print("\n--- Query Completed Successfully ---\n")



--- Starting NeuralTranscript Query Engine ---

üìÇ Loading Vector Database...
ü§ñ Initializing Google Gemini Pro...

‚ùì User Query:
What is the main topic of this video??

‚è≥ Processing answer...

‚ú® AI RESPONSE:

The main topic of this video is a conversation with Demas about solving fundamental mysteries of the universe, including consciousness, life, and gravity, and the search for deeper explanations beyond the standard model of physics, potentially through the application of intelligence and reinforcement learning.

--- Query Completed Successfully ---

