<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/GEMINI_RAG2ARAG_DEMO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [8]:
import time
import json
import os
import re # Import regex for better keyword extraction
import google.generativeai as genai

# 1. Dummy knowledge base for demonstration purposes
# In a real application, this would be a vector database or external API
knowledge_base = [
    {
        'id': 'doc1',
        'title': 'Introduction to AI',
        'content': 'Artificial Intelligence (AI) is a broad field of computer science that enables machines to perform tasks typically requiring human intelligence. This includes learning, problem-solving, perception, and language understanding. Early AI research focused on symbolic AI and expert systems.'
    },
    {
        'id': 'doc2',
        'title': 'AI in Healthcare',
        'content': 'AI is transforming healthcare by assisting with diagnostics, drug discovery, personalized treatment plans, and administrative tasks. Machine learning algorithms can analyze vast amounts of patient data to identify patterns and predict disease outbreaks. Benefits include improved efficiency and accuracy.'
    },
    {
        'id': 'doc3',
        'title': 'Quantum Computing Basics',
        'content': 'Quantum computing is a new type of computing that uses the principles of quantum mechanics, such as superposition and entanglement, to solve complex problems that are intractable for classical computers. It has the potential to revolutionize fields like cryptography, materials science, and drug development.'
    },
    {
        'id': 'doc4',
        'title': 'Applications of Quantum Computing',
        'content': 'Key applications of quantum computing include breaking modern encryption, designing new materials with specific properties, optimizing complex logistical problems, and simulating molecular interactions for drug discovery. While still in its early stages, quantum computing promises significant advancements.'
    },
    {
        'id': 'doc5',
        'title': 'History of AI',
        'content': 'The history of AI dates back to the 1950s with pioneers like Alan Turing. The Dartmouth Workshop in 1956 is often considered the birth of AI as a field. Early periods saw excitement, followed by "AI winters." Recent advancements in deep learning and computational power have led to a resurgence.'
    },
    {
        'id': 'doc6',
        'title': 'Societal Impact of AI',
        'content': 'The impact of AI on society is profound, affecting employment, ethics, privacy, and governance. AI can automate tasks, create new industries, and enhance human capabilities. However, concerns exist regarding job displacement, algorithmic bias, and the need for robust ethical guidelines and regulations.'
    },
    {
        'id': 'doc7',
        'title': 'Benefits of AI',
        'content': 'The benefits of AI are numerous, including increased efficiency, automation of repetitive tasks, enhanced decision-making through data analysis, improved accuracy in various fields (like diagnostics), and the ability to solve problems previously thought impossible. AI can also personalize experiences and drive innovation.'
    }
]

# 2. Configuration for Agent
class AgentConfig:
    # Using 'gemini-2.5-flash' as explicitly requested by the user.
    # Note: As of current public documentation, 'gemini-2.0-flash' is the standard.
    # If this model name causes an API error, please verify its availability or
    # revert to a publicly documented model like 'gemini-2.0-flash'.
    LLM_MODEL_NAME: str = "gemini-2.5-flash"
    MAX_AGENT_RETRIES: int = 2 # Max attempts for the agent to refine its answer

# 3. Google Colab / Gemini API Imports and Configuration
GOOGLE_API_KEY = None
try:
    from google.colab import userdata
    GOOGLE_API_KEY = userdata.get('GEMINI')
    print("Google Generative AI configured successfully using Colab Secrets.")
except (ImportError, KeyError):
    print("Not running in Google Colab or 'GEMINI' secret not found. Attempting to get 'GEMINI' environment variable.")
    GOOGLE_API_KEY = os.getenv('GEMINI')

# Initialize Gemini API
if GOOGLE_API_KEY:
    genai.configure(api_key=GOOGLE_API_KEY)
    print(f"Gemini API configured with model: {AgentConfig.LLM_MODEL_NAME}")
else:
    print("Warning: GOOGLE_API_KEY not found. LLM calls will not work.")

def retrieve_context(query, num_results=2):
    """
    Retrieves relevant documents from the knowledge base based on keywords.
    Improved to use regex for better keyword extraction and assign scores.

    In a production RAG system, this function would typically involve:
    1.  **Embedding Generation:** Converting the query into a vector embedding.
    2.  **Vector Database Search:** Querying a vector database (e.g., Pinecone, Weaviate, Milvus, ChromaDB)
        to find documents with similar embeddings.
    3.  **Re-ranking:** Using a re-ranking model (e.g., cross-encoder) to further refine the relevance
        of the retrieved documents.

    Args:
        query (str): The user query or sub-query.
        num_results (int): The maximum number of results to return.

    Returns:
        list: A list of relevant documents (dictionaries), sorted by relevance score.
    """
    # Extract keywords using regex to handle punctuation and split words
    query_keywords = set(re.findall(r'\b\w+\b', query.lower()))
    # Filter out very common short words (stop words) that might not be useful for scoring
    # This is a very basic list; a real system would use a more comprehensive stop word list
    stop_words = {'a', 'an', 'the', 'is', 'are', 'and', 'or', 'of', 'in', 'to', 'what', 'how', 'tell', 'me', 'about', 'its'}
    query_keywords = {word for word in query_keywords if word not in stop_words and len(word) > 2}

    scored_docs = []

    for doc in knowledge_base:
        score = 0
        doc_content_lower = doc['content'].lower()
        doc_title_lower = doc['title'].lower()

        # Score based on individual keyword presence and frequency
        for keyword in query_keywords:
            # Using re.findall for more accurate word counting, avoiding partial matches within words
            score += len(re.findall(r'\b' + re.escape(keyword) + r'\b', doc_content_lower)) # Count occurrences for content
            score += len(re.findall(r'\b' + re.escape(keyword) + r'\b', doc_title_lower)) * 2 # Title matches weighted higher

        # Add bonus for exact phrase match in content or title
        if query.lower() in doc_content_lower:
            score += 5
        if query.lower() in doc_title_lower:
            score += 10

        if score > 0: # Only consider documents that have at least one match
            scored_docs.append({'doc': doc, 'score': score})

    # Sort by score in descending order and return the top N documents
    # We return only the 'doc' part, discarding the temporary score wrapper
    return [item['doc'] for item in sorted(scored_docs, key=lambda x: x['score'], reverse=True)[:num_results]]

def generate_content_with_llm(prompt):
    """
    Calls the Gemini API to generate content.

    Args:
        prompt (str): The prompt to send to the LLM.

    Returns:
        str: The generated text from the LLM.
    """
    if not GOOGLE_API_KEY:
        return "Error: Gemini API key not configured. Cannot call LLM."

    print(f"\n--- Calling LLM with Prompt ---\n{prompt}\n-------------------------------\n")

    try:
        model = genai.GenerativeModel(AgentConfig.LLM_MODEL_NAME)
        response = model.generate_content(prompt)
        # Access the text from the response.
        # Check if candidates and parts exist before accessing.
        if response.candidates and response.candidates[0].content.parts:
            return response.candidates[0].content.parts[0].text
        else:
            # Log the full response for debugging if content is missing
            print(f"LLM response did not contain expected text content: {response.text if hasattr(response, 'text') else response}")
            return "Error: LLM response was empty or malformed."
    except genai.types.BlockedPromptException as e:
        # Specific error for content blocking
        print(f"Error calling Gemini API: Prompt blocked due to safety concerns or policy. Details: {e}")
        return f"Error: LLM response blocked due to safety concerns or policy. Details: {e}"
    except Exception as e:
        print(f"Error calling Gemini API: {e}")
        return f"Error: Could not generate content from LLM. Details: {e}"


def run_traditional_rag(query):
    """
    Implements Traditional RAG.
    This approach is linear: retrieve context once, then generate a response.
    """
    print("\n--- Running Traditional RAG ---")
    print(f"User Query: {query}")

    # Step 1: Retrieve relevant context
    # In a real system, this would be a single, optimized retrieval call.
    retrieved_docs = retrieve_context(query, 3) # Get up to 3 relevant docs
    context = "\n\n---\n\n".join([f"Title: {doc['title']}\nContent: {doc['content']}" for doc in retrieved_docs])

    print("\nTraditional RAG - Retrieved Context:")
    print(context if context else 'No highly relevant context found in knowledge base.')

    # Step 2: Formulate prompt for LLM
    # Prompt engineering is crucial here to guide the LLM's response based on the context.
    prompt = f"Based on the following context, answer the user's question. If the answer is not in the context, state that you don't have enough information.\n\nContext:\n{context}\n\nQuestion: {query}\n\nAnswer: (Provide a concise, direct answer based ONLY on the provided context.)"

    # Step 3: Generate response using LLM
    response = generate_content_with_llm(prompt)

    print("\nTraditional RAG - Generated Response:")
    print(response)
    print("-----------------------------------\n")

def run_agentic_rag(query):
    """
    Implements Agentic RAG (simulated).
    This approach simulates an AI agent that can reason, plan, and execute multiple steps
    to refine context retrieval and generate a better response for complex queries.
    Includes a basic reflection loop with query refinement.
    """
    print("\n--- Running Agentic RAG ---")
    print(f"User Query: {query}")

    full_context = []
    process_steps = []
    final_response = "Initial response not generated."
    current_query_for_retrieval = query # The query used for retrieval, might be refined by the agent

    for attempt in range(AgentConfig.MAX_AGENT_RETRIES):
        process_steps.append(f"Agent: Attempt {attempt + 1}/{AgentConfig.MAX_AGENT_RETRIES} - Analyzing query intent...")
        print(process_steps[-1])
        time.sleep(0.5) # Simulate agent thinking time

        lower_case_query_original = query.lower() # Use original query for initial complexity check
        is_complex_query = False

        # Agent's simple reasoning logic: determine if query requires multi-hop or specific focus
        # In a more advanced agent, this could involve:
        # - Another LLM call to classify query type (e.g., factual, comparative, multi-hop)
        # - Using a tool to check if the query relates to a specific domain or requires external API calls
        if 'history' in lower_case_query_original or 'impact' in lower_case_query_original or \
           ('explain' in lower_case_query_original and ('and' in lower_case_query_original or 'applications' in lower_case_query_original)):
            is_complex_query = True

        if is_complex_query and attempt == 0: # Only break down complex queries on the first attempt
            process_steps.append("Agent: Detected complex query. Breaking down into sub-queries.")
            print(process_steps[-1])
            time.sleep(0.5)

            sub_queries = []
            # Agent's planning: generating sub-queries based on initial analysis
            if 'history' in lower_case_query_original:
                sub_queries.append('history of AI')
            if 'impact' in lower_case_query_original:
                sub_queries.append('societal impact of AI')
            if 'quantum computing' in lower_case_query_original and 'applications' in lower_case_query_original:
                sub_queries.append('quantum computing basics')
                sub_queries.append('applications of quantum computing')

            if not sub_queries: # Fallback if no specific sub-queries detected for a complex query
                sub_queries.append(current_query_for_retrieval) # Treat as a single complex query for retrieval

            process_steps.append(f"Agent: Executing multiple retrievals for sub-queries: {', '.join(sub_queries)}")
            print(process_steps[-1])
            time.sleep(1.0) # Simulate parallel/sequential retrieval delay

            # Agent's execution: performing multiple retrieval steps
            retrieved_for_this_attempt = []
            for sub_q in sub_queries:
                sub_retrieved_docs = retrieve_context(sub_q, 2) # Get 2 docs per sub-query
                retrieved_for_this_attempt.extend(sub_retrieved_docs)
                process_steps.append(f"Agent: Retrieved context for \"{sub_q}\".")
                print(process_steps[-1])
                time.sleep(0.5)
            full_context.extend(retrieved_for_this_attempt) # Add to overall context

        else: # Simple query or subsequent retry for complex query
            process_steps.append("Agent: Performing direct retrieval.")
            print(process_steps[-1])
            time.sleep(0.5)
            retrieved_for_this_attempt = retrieve_context(current_query_for_retrieval, 3)
            full_context.extend(retrieved_for_this_attempt)

        # Agent's reflection/synthesis: removing duplicates and consolidating context
        # This step is crucial for managing context across multiple retrieval attempts
        unique_context_ids = set()
        unique_context = []
        for doc in full_context:
            if doc['id'] not in unique_context_ids:
                unique_context.append(doc)
                unique_context_ids.add(doc['id'])

        formatted_context = "\n\n---\n\n".join([f"Title: {doc['title']}\nContent: {doc['content']}" for doc in unique_context])

        print("\nAgentic RAG - Retrieved Context (Aggregated):")
        print(formatted_context if formatted_context else 'No highly relevant context found for this attempt.')

        process_steps.append("Agent: Synthesizing retrieved information and preparing prompt.")
        print(process_steps[-1])
        time.sleep(0.5)

        # Agent's final generation prompt, potentially more detailed for complex queries
        prompt = f"Based on the following comprehensive context, answer the user's question thoroughly. If the answer is not fully in the context, state that you don't have enough information.\n\nContext:\n{formatted_context}\n\nQuestion: {query}\n\nAnswer: (Provide a detailed, comprehensive answer based on the provided context. If the context is insufficient, explain what's missing.)"

        current_response = generate_content_with_llm(prompt)

        # Agent's Reflection Loop:
        # This is a critical part of Agentic RAG. A real reflection step would involve:
        # 1.  **LLM-based Evaluation:** Another LLM call where the agent acts as a "critic."
        #     It would evaluate `current_response` against the `original_query` and `formatted_context`
        #     for relevance, completeness, factual accuracy, and coherence.
        #     Prompt example: "Given the question '{query}', the retrieved context '{formatted_context}',
        #     and the generated answer '{current_response}', is the answer satisfactory? If not,
        #     suggest how to improve the answer or what additional information is needed."
        # 2.  **Heuristics:** Basic checks for keywords like "I don't have enough information" or "cannot answer."
        # 3.  **Action Decision:** Based on the evaluation, the agent decides on the next action:
        #     - If satisfactory: Finalize response and exit loop.
        #     - If unsatisfactory:
        #         - **Query Refinement:** Use an LLM to rephrase or expand `current_query_for_retrieval`.
        #         - **Tool Use:** Decide to use an external tool (e.g., web search, database query).
        #         - **Context Expansion:** Try to retrieve more documents or from different sources.
        #         - **Clarification:** Ask the user for more information.
        if "don't have enough information" in current_response.lower() or "cannot answer" in current_response.lower():
            process_steps.append(f"Agent: Reflection - Initial response indicates insufficient information. Attempting to refine retrieval or query.")
            print(process_steps[-1])
            if attempt < AgentConfig.MAX_AGENT_RETRIES - 1:
                # Simulate query refinement for the next attempt
                # In a real agent, this would be an LLM-driven query rewriting tool
                current_query_for_retrieval = f"more detailed context on {query}" # Example of simple query refinement
                process_steps.append(f"Agent: Refined query for next retrieval attempt: '{current_query_for_retrieval}'")
                print(process_steps[-1])
            else:
                # If still no good after max retries, finalize with current best response
                final_response = current_response
                process_steps.append("Agent: Reflection - Max retries reached. Finalizing response with available information.")
                print(process_steps[-1])
                break
        else:
            final_response = current_response
            process_steps.append("Agent: Reflection - Response satisfactory. Task completed.")
            print(process_steps[-1])
            break # Exit loop if response is good

    print("\nAgentic RAG - Generated Response:")
    print(final_response)
    print("-----------------------------------\n")

# --- Demonstration ---
if __name__ == "__main__":
    test_queries = [
        "What are the benefits of AI in healthcare?",
        "Explain quantum computing and its applications.",
        "Tell me about the history of AI and its impact on society."
    ]

    for i, query in enumerate(test_queries):
        print(f"\n\n=========================================")
        print(f"DEMO SCENARIO {i+1}: Query = \"{query}\"")
        print(f"=========================================\n")

        run_traditional_rag(query)
        print("\n" + "="*70 + "\n") # Separator for clarity
        run_agentic_rag(query)
        print("\n\n")

Google Generative AI configured successfully using Colab Secrets.
Gemini API configured with model: gemini-2.5-flash


DEMO SCENARIO 1: Query = "What are the benefits of AI in healthcare?"


--- Running Traditional RAG ---
User Query: What are the benefits of AI in healthcare?

Traditional RAG - Retrieved Context:
Title: AI in Healthcare
Content: AI is transforming healthcare by assisting with diagnostics, drug discovery, personalized treatment plans, and administrative tasks. Machine learning algorithms can analyze vast amounts of patient data to identify patterns and predict disease outbreaks. Benefits include improved efficiency and accuracy.

---

Title: Benefits of AI
Content: The benefits of AI are numerous, including increased efficiency, automation of repetitive tasks, enhanced decision-making through data analysis, improved accuracy in various fields (like diagnostics), and the ability to solve problems previously thought impossible. AI can also personalize experiences and dri