# Assignment 3: Agentic RAG System with Gemini, LangGraph, and MLflow

Assignment 3 

### Core Technologies that will be targeted:

- **LLM & Embeddings**: Gemini 1.5 Pro on Google Cloud Vertex AI
- **Agentic Flows**: LangGraph
- **Vector Database**: Pinecone
- **Experiment Tracking**: MLflow
- **Language**: Python 3.10+

## Step 1: Install Dependencies

First, we need to install all the required Python packages. If you are running this in a new environment, uncomment and run the cell below.

In [None]:
!pip install langgraph google-cloud-aiplatform pinecone-client pydantic mlflow python-dotenv

## Step 2: Setup and Configuration

### A. Import Libraries
We'll import all the necessary libraries for our workflow.

In [None]:
import json
import os
import logging
from typing import List, TypedDict, Optional
from getpass import getpass

# LangGraph
from langgraph.graph import StateGraph, END

# Vertex AI (Gemini)
import vertexai
from vertexai.generative_models import GenerativeModel, Part
from vertexai.language_models import TextEmbeddingModel

# Pinecone
import pinecone

# MLflow
import mlflow

# Environment Variables
from dotenv import load_dotenv

load_dotenv()

### B. Configure Logging
A proper logging setup helps in debugging and monitoring the flow of the agent.

In [None]:
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

In [None]:
# GCP and Vertex AI Initialization
GCP_PROJECT_ID = os.getenv("GCP_PROJECT_ID")
if not GCP_PROJECT_ID:
    GCP_PROJECT_ID = input("Please enter your GCP Project ID: ")
vertexai.init(project=GCP_PROJECT_ID, location="us-central1")

# Pinecone Initialization
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
if not PINECONE_API_KEY:
    PINECONE_API_KEY = getpass("Please enter your Pinecone API Key: ")
PINECONE_ENVIRONMENT = os.getenv("PINECONE_ENVIRONMENT")
if not PINECONE_ENVIRONMENT:
    PINECONE_ENVIRONMENT = input("Please enter your Pinecone Environment (e.g., 'gcp-starter'): ")
    
pinecone.init(
    api_key=PINECONE_API_KEY,
    environment=PINECONE_ENVIRONMENT
)

logging.info("Successfully connected to Vertex AI and Pinecone.")

## Step 3: Preprocessing & Indexing

In this step, we'll load our knowledge base, generate embeddings for each entry, and store them in a Pinecone vector index.

### A. Prepare the Dataset
For this demo, we'll create the `dataset.json` file directly in the notebook.

In [None]:
kb_data = [
    {
        "doc_id": "KB001",
        "question": "What are best practices for debugging Python?",
        "answer_snippet": "When debugging Python, standard best practices include using the built-in PDB (Python Debugger), understanding tracebacks thoroughly, and strategically placing print statements to trace variable states. PDB allows you to step through code line by line.",
        "source": "python_debugging_guide.md"
    },
    {
        "doc_id": "KB002",
        "question": "How do I tune performance in a Python application?",
        "answer_snippet": "Performance tuning in Python starts with profiling. Use tools like cProfile to identify bottlenecks. Common optimization techniques include using efficient data structures, avoiding global variables, and leveraging memoization for expensive function calls.",
        "source": "performance_tuning.md"
    },
    {
        "doc_id": "KB003",
        "question": "What are Python virtual environments?",
        "answer_snippet": "A Python virtual environment is a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages. It helps manage dependencies and avoid conflicts between projects.",
        "source": "virtual_environments.md"
    },
    {
        "doc_id": "KB004",
        "question": "What are some less common debugging techniques in Python?",
        "answer_snippet": "Advanced and less common debugging techniques include using the logging module for robust, configurable output, and 'rubber duck debugging' where you explain your code line-by-line to an inanimate object to spot logical errors.",
        "source": "advanced_debugging.md"
    }
]

with open("dataset.json", "w") as f:
    json.dump(kb_data, f, indent=2)

logging.info("'dataset.json' created successfully.")

### B. Define Helper Functions for Embeddings and LLM Calls

In [None]:
embedding_model = TextEmbeddingModel.from_pretrained("textembedding-gecko@003")
llm_model = GenerativeModel("gemini-1.5-pro-preview-0409")

def get_embedding(text: str) -> List[float]:
    """Generates embeddings for a given text using Vertex AI."""
    # Vertex AI has a limit of 5 texts per call
    embeddings = embedding_model.get_embeddings([text])
    return embeddings[0].values

def call_gemini_llm(prompt: str) -> str:
    """Calls the Gemini LLM with a specific prompt and temperature."""
    response = llm_model.generate_content(
        [Part.from_text(prompt)],
        generation_config={
            "temperature": 0.0, # for consistency
            "max_output_tokens": 1024
        }
    )
    return response.text.strip()

### C. Index the Data in Pinecone

In [None]:
PINECONE_INDEX_NAME = "agentic-rag-index"

def preprocess_and_index(json_path: str):
    """Loads, preprocesses, and indexes the knowledge base in Pinecone."""
    logging.info("--- Starting Preprocessing and Indexing ---")

    # Create index if it doesn't exist
    if PINECONE_INDEX_NAME not in pinecone.list_indexes():
        # The embedding dimension for textembedding-gecko is 768
        pinecone.create_index(name=PINECONE_INDEX_NAME, dimension=768, metric='cosine')
        logging.info(f"Created Pinecone index: {PINECONE_INDEX_NAME}")
    
    index = pinecone.Index(PINECONE_INDEX_NAME)
    
    with open(json_path, 'r') as f:
        knowledge_base = json.load(f)
    
    logging.info(f"Processing {len(knowledge_base)} documents...")
    vectors_to_upsert = []
    for item in knowledge_base:
        embedding = get_embedding(item['answer_snippet'])
        vector = {
            'id': item['doc_id'],
            'values': embedding,
            'metadata': {
                'text': item['answer_snippet'],
                'source': item['source']
             }
        }
        vectors_to_upsert.append(vector)
        
    # Upsert in batches (Pinecone recommends batches of 100 or less)
    if vectors_to_upsert:
        index.upsert(vectors=vectors_to_upsert)
        logging.info(f"Successfully upserted {len(vectors_to_upsert)} vectors into Pinecone.")
    
    logging.info("--- Preprocessing and Indexing Complete ---")
    return index

# Run the indexing process
pinecone_index = preprocess_and_index("dataset.json")

## Step 4: Define the LangGraph Workflow

Now we'll define the state, nodes, and edges for our agentic graph.

### A. Define the Graph State
The state is a dictionary that gets passed between nodes, carrying all the necessary information.

In [None]:
class GraphState(TypedDict):
    """
    Represents the state of our graph.
    
    Attributes:
        question: The user's question.
        snippets: A list of retrieved document snippets.
        answer: The LLM-generated answer.
        critique: The critique of the generated answer.
        refine_count: The number of refinement loops.
    """
    question: str
    snippets: List[dict]
    answer: str
    critique: str
    refine_count: int

### B. Define the Graph Nodes
Each node is a function that performs a specific action (retrieve, generate, critique, refine).

In [None]:
def retrieve_kb(state: GraphState) -> GraphState:
    """Retrieves relevant snippets from the knowledge base."""
    logging.info("--- Node: retrieve_kb ---")
    question = state["question"]
    query_embedding = get_embedding(question)
    
    # Retrieve top 2 snippets initially to encourage refinement loop
    retrieved = pinecone_index.query(query_embedding, top_k=2, include_metadata=True)
    
    snippets = []
    for match in retrieved['matches']:
        snippets.append({
            'doc_id': match['id'], 
            'answer_snippet': match['metadata']['text'],
            'source': match['metadata']['source']
        })
    
    state["snippets"] = snippets
    mlflow.log_text(json.dumps(snippets, indent=2), "1_initial_snippets.json")
    return state

def generate_answer(state: GraphState) -> GraphState:
    """Generates an answer using the retrieved snippets."""
    logging.info("--- Node: generate_answer ---")
    question = state["question"]
    snippets = state["snippets"]
    
    context = "\n".join([f"{s['doc_id']}: {s['answer_snippet']}" for s in snippets])
    prompt = f"""Context snippets:
{context}

Generate a comprehensive answer for the question below, citing the sources using their doc_id (e.g., [KB001]).
Question: {question}
"""
    
    answer = call_gemini_llm(prompt)
    logging.info(f"Generated Answer: {answer}")
    state["answer"] = answer
    mlflow.log_text(answer, f"{2 if state['refine_count'] == 0 else 5}_generated_answer.txt")
    return state

def critique_answer(state: GraphState) -> GraphState:
    """Critiques the generated answer for completeness."""
    logging.info("--- Node: critique_answer ---")
    question = state["question"]
    snippets = state["snippets"]
    answer = state["answer"]

    context = "\n".join([f"{s['doc_id']}: {s['answer_snippet']}" for s in snippets])
    prompt = f"""You are a meticulous quality checker.
Question: {question}
Context Snippets:
{context}

Answer: {answer}

Does the 'Answer' fully address the 'Question' based *only* on the 'Context Snippets'?
If it is complete, respond with only the word "COMPLETE".
If it is not complete, respond with "REFINE:" followed by a concise list of missing keywords or concepts.
"""
    critique = call_gemini_llm(prompt)
    logging.info(f"Critique: {critique}")
    state["critique"] = critique
    mlflow.log_text(critique, "3_critique.txt")
    return state

def refine_answer(state: GraphState) -> GraphState:
    """Refines the answer by retrieving more context and regenerating."""
    logging.info("--- Node: refine_answer ---")
    question = state["question"]
    critique = state["critique"]
    
    missing_info = critique.replace("REFINE:", "").strip()
    refinement_query = f"{question} {missing_info}"
    query_embedding = get_embedding(refinement_query)
    
    # Retrieve one more relevant snippet
    retrieved = pinecone_index.query(query_embedding, top_k=1, include_metadata=True)
    new_snippet = []
    if retrieved['matches']:
        match = retrieved['matches'][0]
        new_snippet.append({
            'doc_id': match['id'], 
            'answer_snippet': match['metadata']['text'],
            'source': match['metadata']['source']
        })
    
    updated_snippets = state["snippets"] + new_snippet
    state["snippets"] = updated_snippets
    mlflow.log_text(json.dumps(updated_snippets, indent=2), "4_refined_snippets.json")
    
    # Regenerate the answer with the richer context
    logging.info("Regenerating answer with additional context...")
    return generate_answer(state)

### C. Define Conditional Edges
This logic decides which node to go to next based on the critique.

In [None]:
def should_refine(state: GraphState) -> str:
    """Determines whether to refine the answer or end the process."""
    logging.info("--- Decision: should_refine ---")
    critique = state["critique"]
    # Limit refinement to one loop to avoid cycles
    if state["refine_count"] > 0:
        logging.info("Decision: Max refinements reached. END")
        return "end"
        
    if critique.startswith("REFINE"):
        logging.info("Decision: REFINE answer.")
        state["refine_count"] += 1
        return "refine_answer"
    else:
        logging.info("Decision: Answer is COMPLETE. END")
        return "end"

### D. Assemble the Graph

In [None]:
workflow = StateGraph(GraphState)

# Add nodes
workflow.add_node("retrieve_kb", retrieve_kb)
workflow.add_node("generate_answer", generate_answer)
workflow.add_node("critique_answer", critique_answer)
workflow.add_node("refine_answer", refine_answer)

# Set entry point
workflow.set_entry_point("retrieve_kb")

# Add edges
workflow.add_edge("retrieve_kb", "generate_answer")
workflow.add_edge("generate_answer", "critique_answer")
workflow.add_conditional_edges(
    "critique_answer",
    should_refine,
    {
        "refine_answer": "refine_answer",
        "end": END
    }
)
workflow.add_edge("refine_answer", END)

# Compile the graph
agentic_rag_app = workflow.compile()

logging.info("LangGraph workflow compiled successfully.")

## Step 5: Execute the Workflow

Finally, we run the agent with a sample question. We will wrap the execution inside an MLflow run to log all the artifacts.

In [None]:
with mlflow.start_run(run_name="Agentic RAG Run") as run:
    logging.info(f"\n{'='*50}")
    logging.info(f" Kicking off Agentic RAG Workflow (MLflow Run ID: {run.info.run_id}) ".center(50, '='))
    logging.info(f"{'='*50}\n")

    # This question is designed to miss the 'advanced' part on the first pass
    user_question = "What are best practices for debugging Python, including less common techniques?"
    mlflow.log_param("user_question", user_question)
    
    initial_state = {
        "question": user_question,
        "refine_count": 0
    }
    
    final_state = agentic_rag_app.invoke(initial_state)
    
    logging.info(f"\n{'='*50}")
    logging.info(" Workflow Complete ".center(50, '='))
    logging.info(f"{'='*50}\n")
    print(f"Initial Question: {final_state['question']}")
    print(f"\nFinal Answer:\n{final_state['answer']}")
    print("\nCited Sources:")
    for snippet in final_state['snippets']:
        print(f"- [{snippet['doc_id']}] {snippet['source']}")