# Aimpoint Digital AI Engineering Assignment
---

## Objective
Your assignment is to design, build, and explain a novel agentic workflow that utilizes a subset of the Wikipedia dataset. As part of this, you will need to define a distinctive GenAI use case that your system is intended to solve. The aim is to showcase not just your technical implementation skills, but also your ability to apply agentic system design innovatively and practically. You will implement your workflow in the Databricks Free Edition, starting from the provided notebook `01_agentic_wikipedia_aimpoint_interview.ipynb`.

To get you started, we pre-installed LangChain and LangGraph which are open source GenAI orchestration frameworks that work well in a Databricks workspace. In addition, we have provided you with a basic setup to access the data source using a LangChain dataloader (https://python.langchain.com/docs/integrations/document_loaders/wikipedia/).

You may use coding assistants for this assignment, but you must provide your own custom prompts and demonstrate your own critical thinking. Large language models must not be used to generate responses for the open-response questions in Part B of this notebook.

Note: This assignment uses serverless clusters. At the time of creating this notebook, all components run successfully. However, you may need to address package dependency issues in the future to ensure your GenAI solution continues to function properly. 

## Deliverables

1. Reference Architecture
    - This should highlight your approach to addressing your use case or problem in either a pdf or image format; include technical agentic workflow details here.

2. Databricks Notebook(s)
    - Includes primary notebook `01_agentic_wikipedia_aimpoint_interview`.ipynb and any supplemental notebooks required to run the agent
    - In the `01_agentic_wikipedia_aimpoint_interview`.ipynb notebook complete the **GenAI Application Development** and **Reflection** sections. The GenAI Application Development section is where you add your own custom logic to create and run your agentic workflow. The Reflection section is writing a markdown response to answer the two questions.
    - To reduce your development time, we created the logic for you to have a FAISS vector store and made the LLM accessible as well.
    - Before finalizing, make sure your code runs correctly by using "Run All" to validate functionality. Then go to "File" → "Export" → "HTML" to download as HTML file. Next, open this HTML file. Finally save as a PDF see instructions below. __Note: In your submissions this must be a PDF file format__

    > **Save HTML as PDF**
    > - Windows: (ctrl + P) → Save as PDF → Save
    > - MacOS: (⌘ + P) → Save as PDF → Save


## Data Source

The Wikipedia Loader ingests documents from the Wikipedia API and converts them into LangChain document objects. The page content includes the first sections of the Wikipedia articles and the metadata is described in detail below.

__Recommendation__: If you are using the LangChain document loader we recommend filtering down to 10k or fewer documents. The `query_terms` argument below can be upated to update the search term used to search wikipedia. Make sure you update this based on the use case you defined.

In the metadata of the LangChain document object; we have the following information:

| Column  | Definition                                                                 |
|---------|-----------------------------------------------------------------------------|
| title   | The Wikipedia page title (e.g., "Quantum Computing").                       |
| summary | A short extract or condensed description from the page content.             |
| source  | The URL link to the original Wikipedia article.                             |

In [0]:
# %pip install -U -qqqq 
# backoff 
# databricks-langchain 
# langgraph==0.5.3 
# uv 
# databricks-agents 
# mlflow-skinny[databricks] 
# chromadb 
# sentence-transformers 
# langchain-huggingface
# langchain-chroma 
# wikipedia 
# faiss-cpu

In [0]:
%pip install -U -q databricks-langchain langchain==0.3.7 faiss-cpu wikipedia langgraph==0.5.3 

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
dbutils.library.restartPython()

## a) GenAI Application Development

__REQUIRED__: This section is where input your custom logic to create and run your agentic workflow. Feel free to add as many codes cells that are needed for this assignment

### Configure LLM  and Embeddings

In [0]:
import json
import re
import time
from dataclasses import dataclass
from enum import Enum
from typing import Any, Dict, List, Optional, Sequence, TypedDict, Annotated, Tuple

import numpy as np
import faiss  

# LangChain core
from langchain_core.documents import Document
from langchain_core.messages import BaseMessage, SystemMessage, AIMessage
from langchain_core.prompts import PromptTemplate
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS

from langchain.document_loaders import WikipediaLoader
from databricks_langchain import ChatDatabricks, DatabricksEmbeddings
from langchain.agents import AgentExecutor, create_react_agent

from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages

In [0]:
# DataLoader Config
TOPICS = ["c-rag", "self-rag", "kg-rag"]
WIKI_QUERY_MAP: Dict[str, str] = {
    "c-rag": "Corrective Retrieval-Augmented Generation",
    "self-rag": "Self-Reflective Retrieval-Augmented Generation",
    "kg-rag": "Knowledge Graph Retrieval-Augmented Generation",
}

MAP_PROMPT = PromptTemplate.from_template(
    "You are summarizing a Wikipedia chunk about Retrieval-Augmented Generation.\n"
    "Chunk:\n{chunk}\n\n"
    "Write a concise summary focusing on factual technical points and definitions:"
)

REDUCE_PROMPT = PromptTemplate.from_template(
    "You are writing a final technical summary from chunk summaries.\n"
    "Chunk summaries:\n{summaries}\n\n"
    "Write a coherent high-level summary (definition, mechanism, hallucination mitigation, typical uses). "
    "Keep it factual and grounded in the summaries:"
)


# Retriever Config
MAX_WIKI_DOCS_PER_TOPIC = 10 #TODO: recommend starting with a smaller number for testing purposes
VECTOR_TOP_K = 10 # number of documents to return
EMBEDDING_MODEL = "databricks-bge-large-en" # Embedding model endpoint name

# LLM Config
LLM_ENDPOINT_NAME = "databricks-meta-llama-3-1-8b-instruct" # Model Serving endpoint name; other option see "Serving" under AI/ML tab (e.g. databricks-gpt-oss-20b)

In [0]:
# Initialize embeddings + LLM
embeddings = DatabricksEmbeddings(endpoint=EMBEDDING_MODEL)
llm = ChatDatabricks(endpoint=LLM_ENDPOINT_NAME, temperature=0.2)
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=120)

In [0]:
def sanity_check_openai_compatible(llm, embeddings):
    # LLM check
    try:
        r = llm.invoke("Reply with exactly: OK")
        print("[SanityCheck] LLM OK:", getattr(r, "content", r))
    except Exception as e:
            "[SanityCheck] Embedding call failed. This will prevent FAISS indexing.\n with {e}"

    # Embedding check (this is what FAISS indexing needs)
    try:
        v = embeddings.embed_query("hello")
        print("[SanityCheck] Embeddings OK. dim =", len(v))
    except Exception as e:
        raise RuntimeError(
            "[SanityCheck] Embedding call failed. This will prevent FAISS indexing.\n with {e}"
        )

In [0]:
sanity_check_openai_compatible(llm, embeddings)

[SanityCheck] LLM OK: OK
[SanityCheck] Embeddings OK. dim = 1024


In [0]:
def load_and_split_wikipedia(topic: str, max_docs: int) -> List[Document]:
    """
    Load Wikipedia pages for a given RAG method and split into chunks.
    Topics: one of {"c-rag", "self-rag", "kg-rag"}
    """
    if topic not in WIKI_QUERY_MAP:
        raise ValueError(f"Unknown topic={topic!r}. Expected: {list(WIKI_QUERY_MAP.keys())}")

    wiki_query = WIKI_QUERY_MAP[topic]
    loader = WikipediaLoader(query=wiki_query, load_max_docs=max_docs)
    docs = loader.load()

    for d in docs:
        md = d.metadata or {}
        source = md.get("source") or md.get("url") or "wikipedia"
        title = md.get("title") or wiki_query
        d.metadata = {
            **md,
            "topic": topic,
            "wiki_query": wiki_query,
            "source": source,
            "title": title,
        }

    return splitter.split_documents(docs)

In [0]:
doc_chunks = load_and_split_wikipedia("c-rag", max_docs=MAX_WIKI_DOCS_PER_TOPIC)
print(len(doc_chunks))



  lis = BeautifulSoup(html).find_all('li')


26


In [0]:
doc_chunks = load_and_split_wikipedia("self-rag", max_docs=MAX_WIKI_DOCS_PER_TOPIC)
print(len(doc_chunks))

50


In [0]:
doc_chunks = load_and_split_wikipedia("kg-rag", max_docs=MAX_WIKI_DOCS_PER_TOPIC)
print(len(doc_chunks))

75


### Tool Agent Definitions

In [0]:
# BaseToolAgent definitions (borrowed & lightly extended)
class TaskStatus(Enum):
    PENDING = "pending"
    RUNNING = "running"
    COMPLETED = "completed"
    FAILED = "failed"
    SKIPPED = "skipped"
    RETRYING = "retrying"


class ToolType(Enum):
    DATA_RETRIEVAL = "data_retrieval"
    ANALYSIS = "analysis"
    GENERATION = "generation"


@dataclass
class PerformanceMetrics:
    execution_time: float
    cost_estimate: float
    memory_usage: float
    success_rate: float


@dataclass
class ToolExecutionResult:
    tool_name: str
    status: TaskStatus
    result: Any
    performance: PerformanceMetrics
    error_code: Optional[str] = None
    error_message: Optional[str] = None
    optimization_suggestions: Optional[List[str]] = None
    retry_count: int = 0

In [0]:
class BaseToolAgent:
    def __init__(self, name: str, tool_type: ToolType):
        self.name = name
        self.tool_type = tool_type
        self.execution_history: List[ToolExecutionResult] = []

    def execute(self, params: Dict[str, Any]) -> ToolExecutionResult:
        start = time.time()
        try:
            result = self._execute_core(params)
            elapsed = time.time() - start
            perf = PerformanceMetrics(
                execution_time=elapsed,
                cost_estimate=self._estimate_cost(params),
                memory_usage=self._get_memory_usage(),
                success_rate=self._calculate_success_rate(),
            )
            out = ToolExecutionResult(
                tool_name=self.name,
                status=TaskStatus.COMPLETED,
                result=result,
                performance=perf,
                optimization_suggestions=self._generate_optimization_suggestions(perf),
            )
            self.execution_history.append(out)
            return out
        except Exception as e:
            elapsed = time.time() - start
            out = ToolExecutionResult(
                tool_name=self.name,
                status=TaskStatus.FAILED,
                result=None,
                performance=PerformanceMetrics(elapsed, 0.0, 0.0, 0.0),
                error_code="EXECUTION_ERROR",
                error_message=str(e),
                optimization_suggestions=[],
            )
            self.execution_history.append(out)
            return out

    def _execute_core(self, params: Dict[str, Any]) -> Any:
        raise NotImplementedError

    def _estimate_cost(self, params: Dict[str, Any]) -> float:
        return 0.01

    def _get_memory_usage(self) -> float:
        return 10.0

    def _calculate_success_rate(self) -> float:
        if not self.execution_history:
            return 1.0
        ok = sum(1 for r in self.execution_history if r.status == TaskStatus.COMPLETED)
        return ok / max(1, len(self.execution_history))

    def _generate_optimization_suggestions(self, perf: PerformanceMetrics) -> List[str]:
        s: List[str] = []
        if perf.execution_time > 5:
            s.append("Consider caching to reduce execution time.")
        if perf.memory_usage > 100:
            s.append("Optimize memory usage (batching / streaming).")
        return s

In [0]:
class WikiRetrieveTool(BaseToolAgent):
    def __init__(self, name: str, retrievers_by_topic: Dict[str, Any], default_k: int = 3):
        super().__init__(name=name, tool_type=ToolType.DATA_RETRIEVAL)
        self.retrievers_by_topic = retrievers_by_topic
        self.default_k = default_k

    def _execute_core(self, params: Dict[str, Any]) -> Any:
        query = str(params.get("query", "") or "")
        k = int(params.get("k", self.default_k))
        topics = params.get("topics")

        if not query:
            raise ValueError("Missing required param: query")
        if not topics:
            raise ValueError("Missing required param: topics (router should fill this)")

        if isinstance(topics, str):
            topics = [topics]
        topics = [t for t in topics if t in self.retrievers_by_topic]
        if not topics:
            raise ValueError(f"No valid topics found in topics={params.get('topics')}")

        out_blocks = []
        for topic in topics:
            retriever = self.retrievers_by_topic[topic]
            if hasattr(retriever, "invoke"):
                docs = retriever.invoke(query)
            else:
                docs = retriever.get_relevant_documents(query)
            docs = list(docs or [])[:k]
            if not docs:
                continue
            joined = "\n\n".join(
                f"[{i+1}] (topic={topic}, title={d.metadata.get('title')}, source={d.metadata.get('source')})\n{d.page_content}"
                for i, d in enumerate(docs)
            )
            out_blocks.append(f"## Topic: {topic}\n{joined}")

        return {"topics": topics, "k": k, "evidence": "\n\n".join(out_blocks)}

    def _estimate_cost(self, params: Dict[str, Any]) -> float:
        return 0.0

    def _get_memory_usage(self) -> float:
        return 25.0

In [0]:
class FakeRetriever:
    def __init__(self, topic: str):
        self.topic = topic

    def get_relevant_documents(self, query: str) -> List[Document]:
        return [
            Document(
                page_content=f"Fake content about {self.topic}. Query was: {query}",
                metadata={"title": f"{self.topic} page", "source": "fake://test"},
            )
        ]

retrievers = {
    "c-rag": FakeRetriever("c-rag"),
    "self-rag": FakeRetriever("self-rag"),
    "kg-rag": FakeRetriever("kg-rag"),
}

test_retriver = WikiRetrieveTool(name="wiki_retrieve", retrievers_by_topic=retrievers, default_k=2)
test_case = test_retriver.execute({"query": "Self-RAG hallucinations", "topics": ["self-rag"], "k": 1})
print(test_case)

ToolExecutionResult(tool_name='wiki_retrieve', status=<TaskStatus.COMPLETED: 'completed'>, result={'topics': ['self-rag'], 'k': 1, 'evidence': '## Topic: self-rag\n[1] (topic=self-rag, title=self-rag page, source=fake://test)\nFake content about self-rag. Query was: Self-RAG hallucinations'}, performance=PerformanceMetrics(execution_time=4.4345855712890625e-05, cost_estimate=0.0, memory_usage=25.0, success_rate=1.0), error_code=None, error_message=None, optimization_suggestions=[], retry_count=0)


In [0]:
class WikiSummarizeTool(BaseToolAgent):
    def __init__(
        self,
        name: str,
        llm,
        chunks_by_topic: Dict[str, List[Document]],
        map_prompt: PromptTemplate,
        reduce_prompt: PromptTemplate,
        default_max_chunks: int = 12,
    ):
        super().__init__(name=name, tool_type=ToolType.GENERATION)
        self.llm = llm
        self.chunks_by_topic = chunks_by_topic
        self.map_prompt = map_prompt
        self.reduce_prompt = reduce_prompt
        self.default_max_chunks = default_max_chunks

    def _execute_core(self, params: Dict[str, Any]) -> Any:
        topics = params.get("topics")
        max_chunks = int(params.get("max_chunks", self.default_max_chunks))

        if not topics:
            raise ValueError("Missing required param: topics (router should fill this)")
        if isinstance(topics, str):
            topics = [topics]
        topics = [t for t in topics if t in self.chunks_by_topic]
        if not topics:
            raise ValueError(f"No valid topics found in topics={params.get('topics')}")

        def _txt(x) -> str:
            return getattr(x, "content", x) if x is not None else ""

        summaries: Dict[str, str] = {}
        for topic in topics:
            chunks = self.chunks_by_topic[topic][:max_chunks]
            chunk_summaries = []
            for d in chunks:
                s = _txt(self.llm.invoke(self.map_prompt.format(chunk=d.page_content))).strip()
                if s:
                    chunk_summaries.append(s)
            combined = "\n".join(f"- {s}" for s in chunk_summaries)
            final = _txt(self.llm.invoke(self.reduce_prompt.format(summaries=combined))).strip()
            summaries[topic] = final

        return {"topics": topics, "max_chunks": max_chunks, "summaries": summaries}

    def _estimate_cost(self, params: Dict[str, Any]) -> float:
        return 0.02

    def _get_memory_usage(self) -> float:
        return 50.0

In [0]:
class FakeLLM:
    def invoke(self, prompt: str):
        class R:
            def __init__(self, content):
                self.content = content
        # Always return something deterministic
        return R("FAKE_SUMMARY")
    
chunks_by_topic = {
    "self-rag": [
        Document(page_content="Self-RAG chunk 1", metadata={"title": "Self-RAG"}),
        Document(page_content="Self-RAG chunk 2", metadata={"title": "Self-RAG"}),
    ]
}
tool = WikiSummarizeTool(
    name="wiki_summarize",
    llm=FakeLLM(),
    chunks_by_topic=chunks_by_topic,
    map_prompt=MAP_PROMPT,
    reduce_prompt=REDUCE_PROMPT,
)
test_case = tool.execute({"topics": ["self-rag"]})
print(test_case)

ToolExecutionResult(tool_name='wiki_summarize', status=<TaskStatus.COMPLETED: 'completed'>, result={'topics': ['self-rag'], 'max_chunks': 12, 'summaries': {'self-rag': 'FAKE_SUMMARY'}}, performance=PerformanceMetrics(execution_time=8.130073547363281e-05, cost_estimate=0.02, memory_usage=50.0, success_rate=1.0), error_code=None, error_message=None, optimization_suggestions=[], retry_count=0)


In [0]:
def build_corpora_and_tools(llm: Any, embeddings: Any) -> Dict[str, Any]:
    chunks_by_topic: Dict[str, List[Document]] = {}
    retrievers_by_topic: Dict[str, Any] = {}
    loaded_topics: List[str] = []

    for t in TOPICS:
        print(f"[WikiLoad] {t}: query={WIKI_QUERY_MAP[t]!r}")
        chunks = load_and_split_wikipedia(t, max_docs=MAX_WIKI_DOCS_PER_TOPIC)
        print(f"[WikiLoad] {t}: chunks={len(chunks)}")
        if not chunks:
            print(f"[WARN] No Wikipedia chunks for {t}, skipping.")
            continue

        loaded_topics.append(t)
        chunks_by_topic[t] = chunks
        vs = FAISS.from_documents(chunks, embeddings)
        retrievers_by_topic[t] = vs.as_retriever(search_kwargs={"k": VECTOR_TOP_K})

    if not loaded_topics:
        raise RuntimeError("No Wikipedia data loaded for any topic. Check WikipediaLoader/network.")

    tools: Dict[str, BaseToolAgent] = {
        "wiki_retrieve": WikiRetrieveTool("wiki_retrieve", retrievers_by_topic=retrievers_by_topic, default_k=VECTOR_TOP_K),
        "wiki_summarize": WikiSummarizeTool(
            "wiki_summarize",
            llm=llm,
            chunks_by_topic=chunks_by_topic,
            map_prompt=MAP_PROMPT,
            reduce_prompt=REDUCE_PROMPT,
            default_max_chunks=12,
        ),
    }
    return {
        "chunks_by_topic": chunks_by_topic,
        "retrievers_by_topic": retrievers_by_topic,
        "tools": tools,
        "loaded_topics": loaded_topics,
    }

### Hybrid Routing

In [0]:
INTENTS = ["fetch", "summarize", "compare"]

INTENT_KEYWORD_RULES: Dict[str, List[str]] = {
    "compare": [r"\bcompare\b", r"\bdifference\b", r"\bvs\b", r"\bversus\b", r"\bcontrast\b", r"\btrade[- ]?off\b"],
    "summarize": [r"\bsummarize\b", r"\bsummary\b", r"\boverview\b", r"\bhigh[- ]level\b", r"\btl;dr\b", r"\brecap\b"],
    "fetch": [r"\bwhat is\b", r"\bdefine\b", r"\bdefinition\b", r"\bmechanism\b", r"\bhow does\b", r"\bexplain\b"],
}

TOPIC_KEYWORD_RULES: Dict[str, List[str]] = {
    "c-rag": [r"\bc[- ]?rag\b", r"\bcorrective\b"],
    "self-rag": [r"\bself[- ]?rag\b", r"\bself[- ]?reflect\b"],
    "kg-rag": [r"\bkg[- ]?rag\b", r"\bknowledge graph\b", r"\bgraph rag\b"],
}

In [0]:
@dataclass
class LabeledExample:
    label: str
    text: str


class EmbeddingLabelIndex:
    """
    embeddings_model must provide: encode(list[str]) -> list[list[float]]
    Similarity: cosine
    """
    def __init__(self, embeddings_model: Any, examples: List[LabeledExample]):
        self.model = embeddings_model
        self.examples = examples
        self.texts = [e.text for e in examples]
        self.labels = [e.label for e in examples]
        self.vecs = self.model.encode(self.texts)

    @staticmethod
    def _cos(a, b) -> float:
        import math
        dot = sum(x * y for x, y in zip(a, b))
        na = math.sqrt(sum(x * x for x in a)) + 1e-12
        nb = math.sqrt(sum(x * x for x in b)) + 1e-12
        return dot / (na * nb)

    def score(self, query: str, labels: List[str], k: int = 8) -> Dict[str, float]:
        qv = self.model.encode([query])[0]
        scored = [(self._cos(qv, v), lab) for v, lab in zip(self.vecs, self.labels)]
        scored.sort(reverse=True, key=lambda x: x[0])
        top = scored[:k]
        out = {lab: 0.0 for lab in labels}
        for sim, lab in top:
            if lab in out:
                out[lab] += float(sim)
        return out


def _keyword_scores(query: str, rules: Dict[str, List[str]], labels: List[str]) -> Dict[str, float]:
    q = (query or "").lower()
    out = {lab: 0.0 for lab in labels}
    for lab in labels:
        for p in rules.get(lab, []):
            if re.search(p, q, flags=re.IGNORECASE):
                out[lab] += 1.0
    return out


def _combine(kw: Dict[str, float], emb: Dict[str, float], alpha_kw: float, beta_emb: float, labels: List[str]) -> Dict[str, float]:
    return {lab: alpha_kw * float(kw.get(lab, 0.0)) + beta_emb * float(emb.get(lab, 0.0)) for lab in labels}


def _argmax(d: Dict[str, float]) -> str:
    return max(d.items(), key=lambda kv: kv[1])[0] if d else ""

In [0]:
class EmbeddingLabelIndex:
    def __init__(self, embeddings_model: Any, examples: List[LabeledExample]):
        self.embeddings = embeddings_model
        self.examples = examples
        self.texts = [e.text for e in examples]
        self.labels = [e.label for e in examples]
        self._vecs = self.embeddings.embed_documents(self.texts)

    @staticmethod
    def _cos(a: List[float], b: List[float]) -> float:
        import math
        dot = 0.0
        na = 0.0
        nb = 0.0
        for x, y in zip(a, b):
            dot += x * y
            na += x * x
            nb += y * y
        if na <= 0 or nb <= 0:
            return 0.0
        return dot / (math.sqrt(na) * math.sqrt(nb))

    def score(self, query: str, labels: List[str]) -> Dict[str, float]:
        qv = self.embeddings.embed_query(query)
        best: Dict[str, float] = {l: 0.0 for l in labels}
        for vec, lab in zip(self._vecs, self.labels):
            if lab not in best:
                continue
            s = self._cos(qv, vec)
            if s > best[lab]:
                best[lab] = float(s)
        return best

In [0]:
class HybridRouter:
    def __init__(self, embeddings_model: Any, alpha_kw: float = 1.0, beta_emb: float = 1.5):
        self.alpha_kw = alpha_kw
        self.beta_emb = beta_emb

        intent_examples = [
            LabeledExample("compare", "Compare C-RAG vs Self-RAG vs KG-RAG and highlight their differences."),
            LabeledExample("compare", "What are the trade-offs among Corrective RAG, Self-Reflective RAG, and KG-RAG?"),
            LabeledExample("summarize", "Give a high-level overview of Self-RAG."),
            LabeledExample("summarize", "Summarize the main idea and workflow of KG-RAG."),
            LabeledExample("fetch", "What is Corrective RAG? Provide its definition and mechanism."),
            LabeledExample("fetch", "Explain how Self-RAG reduces hallucinations."),
        ]
        topic_examples = [
            LabeledExample("c-rag", "Corrective RAG detects retrieval errors and corrects them to reduce hallucinations."),
            LabeledExample("c-rag", "Corrective Retrieval-Augmented Generation (C-RAG)."),
            LabeledExample("self-rag", "Self-RAG uses self-reflection to decide when to retrieve and how to verify answers."),
            LabeledExample("self-rag", "Self-Reflective RAG (Self-RAG) reduces hallucinations through self-critique."),
            LabeledExample("kg-rag", "KG-RAG uses a knowledge graph for relational retrieval and structured grounding."),
            LabeledExample("kg-rag", "Knowledge Graph Retrieval-Augmented Generation (KG-RAG)."),
        ]

        self.intent_index = EmbeddingLabelIndex(embeddings_model, intent_examples)
        self.topic_index = EmbeddingLabelIndex(embeddings_model, topic_examples)

    def route_intent(self, query: str) -> Dict[str, Any]:
        kw = _keyword_scores(query, INTENT_KEYWORD_RULES, INTENTS)
        emb = self.intent_index.score(query, INTENTS)
        hybrid = _combine(kw, emb, self.alpha_kw, self.beta_emb, INTENTS)
        return {"intent": _argmax(hybrid), "intent_scores": hybrid, "intent_keyword_scores": kw, "intent_embedding_scores": emb}

    def route_topic(self, query: str) -> Dict[str, Any]:
        kw = _keyword_scores(query, TOPIC_KEYWORD_RULES, TOPICS)
        emb = self.topic_index.score(query, TOPICS)
        hybrid = _combine(kw, emb, self.alpha_kw, self.beta_emb, TOPICS)
        topic_order = sorted(TOPICS, key=lambda t: hybrid.get(t, 0.0), reverse=True)
        return {"topic": topic_order[0], "topic_scores": hybrid, "topic_order": topic_order, "topic_keyword_scores": kw, "topic_embedding_scores": emb}

    def route(self, query: str) -> Dict[str, Any]:
        return {**self.route_intent(query), **self.route_topic(query)}

In [0]:
import math

import math
import re
from typing import List

class FakeEmbeddings:
    """
    Super simple embedding: bag-of-words hashed into a small vector.
    Deterministic and good enough for a smoke test.
    """
    def __init__(self, dim: int = 64):
        self.dim = dim

    def embed_query(self, text: str) -> List[float]:
        v = [0.0] * self.dim
        for w in re.findall(r"[a-zA-Z0-9\-]+", (text or "").lower()):
            idx = hash(w) % self.dim
            v[idx] += 1.0
        # normalize
        norm = math.sqrt(sum(x * x for x in v)) or 1.0
        return [x / norm for x in v]

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        return [self.embed_query(text) for text in texts]
    
router = HybridRouter(embeddings_model=FakeEmbeddings(), alpha_kw=1.0, beta_emb=1.5)
tests = [
        "What is Self-RAG? Explain the mechanism and how it reduces hallucinations.",
        "Summarize KG-RAG in 5 bullets.",
        "Compare C-RAG vs Self-RAG vs KG-RAG and highlight differences.",
    ]
for q in tests:
        r = router.route(q)
        print("\nQUERY:", q)
        print("intent =", r["intent"])
        print("topic  =", r["topic"])
        # optional: inspect scores
        print("intent_scores =", r["intent_scores"])
        print("topic_scores  =", r["topic_scores"])


QUERY: What is Self-RAG? Explain the mechanism and how it reduces hallucinations.
intent = fetch
topic  = self-rag
intent_scores = {'fetch': 4.011299793694864, 'summarize': 0.4797016118001235, 'compare': 0.35032452487268534}
topic_scores  = {'c-rag': 0.5454545454545454, 'self-rag': 1.7263001870593784, 'kg-rag': 0.40451991747794536}

QUERY: Summarize KG-RAG in 5 bullets.
intent = summarize
topic  = kg-rag
intent_scores = {'fetch': 0.223606797749979, 'summarize': 1.7115124735378853, 'compare': 0.1936491673103709}
topic_scores  = {'c-rag': 0.0, 'self-rag': 0.0, 'kg-rag': 1.3}

QUERY: Compare C-RAG vs Self-RAG vs KG-RAG and highlight differences.
intent = compare
topic  = self-rag
intent_scores = {'fetch': 0.30151134457776363, 'summarize': 0.31980107453341566, 'compare': 3.436140661634507}
topic_scores  = {'c-rag': 1.2261335084333227, 'self-rag': 1.5128225940683708, 'kg-rag': 1.3763089045031909}


### LangGraph ReAct Loop

In [0]:
FENCED_RE = re.compile(r"```json\s*(.*?)\s*```", re.DOTALL | re.IGNORECASE)

def _normalize_one_call(d: Dict[str, Any]) -> Optional[Dict[str, Any]]:
  name = str(d.get("name", "") or "").strip()
  if not name:
      return None
  args = d.get("args", d.get("arguments", {}))
  if args is None:
      args = {}
  if not isinstance(args, dict):
      return None
  return {"name": name, "args": args}

def parse_tool_calls(text: str) -> List[Dict[str, Any]]:
  text = (text or "").strip()
  if not text:
      return []
  m = FENCED_RE.search(text)
  body = (m.group(1).strip() if m else text)

  # full JSON body
  try:
      obj = json.loads(body)
      if isinstance(obj, dict):
          c = _normalize_one_call(obj)
          return [c] if c else []
      if isinstance(obj, list):
          out = []
          for item in obj:
              if isinstance(item, dict):
                  c = _normalize_one_call(item)
                  if c:
                      out.append(c)
          return out
  except Exception:
      pass

  # line-delimited dicts
  out: List[Dict[str, Any]] = []
  for line in body.splitlines():
      line = line.strip().rstrip(",")
      if line.startswith("{") and line.endswith("}"):
          try:
              d = json.loads(line)
          except Exception:
              continue
          if isinstance(d, dict):
              c = _normalize_one_call(d)
              if c:
                  out.append(c)
  return out

def extract_tool_calls(msg: BaseMessage) -> List[Dict[str, Any]]:
    content = getattr(msg, "content", "") or ""
    calls = parse_tool_calls(content)
    if calls:
        return calls
    tool_calls = getattr(msg, "tool_calls", None)
    if isinstance(tool_calls, list):
        out = []
        for c in tool_calls:
            if isinstance(c, dict):
                cc = _normalize_one_call(c)
                if cc:
                    out.append(cc)
        return out
    return []


def has_tool_calls(msg: BaseMessage) -> bool:
    return len(extract_tool_calls(msg)) > 0


def render_available_tools(tools: Dict[str, BaseToolAgent]) -> str:
    return "\n".join([f"- {name} (type={agent.tool_type.value})" for name, agent in tools.items()])

In [0]:

SYSTEM_TEMPLATE = """\
You are an expert in Retrieval-Augmented Generation (RAG).
Follow ReAct: Thought → Action → Observation → Answer.
Only use Observations and do not fabricate information.
If evidence is insufficient, say so explicitly and call another tool.

Long-term memory (may be empty):
{long_term_memory}

Available tools:
{tool_list}

Tool call format rules:
- When calling tools, output ONLY a JSON call (or JSON list of calls) in a ```json fenced block.
- Each call must be: {{"name": "<tool_name>", "args": {{...}}}}
- Do NOT include any extra text outside the fenced block when calling tools.

When you want to call a tool, output EXACTLY one of these:

1) Single call
```json
{{"name": "wiki_retrieve", "args": {{"query": "<text>", "topics": ["self-rag"], "k": 3}}}}
```

2) Multiple calls
```json
[
  {{"name": "wiki_retrieve", "args": {{"query": "<text>", "topics": ["c-rag"], "k": 3}}}},
  {{"name": "wiki_retrieve", "args": {{"query": "<text>", "topics": ["self-rag"], "k": 3}}}}
]
```

When you are ready to answer, output plain text (NO ```json block).
"""

### Defind Nodes and Edges

In [0]:
class AgentState(TypedDict, total=False):
    messages: Annotated[Sequence[BaseMessage], add_messages]
    iteration_count: int
    long_term_memory: str
    pending_tool_calls: List[Dict[str, Any]]
    evidence_topics: List[str]
    compare_topics: List[str]   # which topics we want evidence for (loaded topics)


@dataclass
class GraphContext:
    llm: Any
    router: HybridRouter
    tools: Dict[str, BaseToolAgent]
    system_template: str
    tool_list_text: str
    available_topics: List[str]
    enable_long_term_memory: bool = True
    max_iters: int = 1

In [0]:
from langchain_core.messages import BaseMessage, SystemMessage, AIMessage, HumanMessage

def build_system_message(ctx: GraphContext, long_term_memory: str) -> SystemMessage:
    return SystemMessage(content=ctx.system_template.format(long_term_memory=long_term_memory or "", tool_list=ctx.tool_list_text))


def memory_recall_node(state: AgentState, ctx: GraphContext) -> Dict[str, Any]:
    return {"long_term_memory": (state.get("long_term_memory", "") or "") if ctx.enable_long_term_memory else ""}


def _first_user_query(state: AgentState) -> str:
    for m in state.get("messages", []):
        if hasattr(m, "content"):
            return str(m.content)
        if isinstance(m, tuple) and len(m) == 2:
            return str(m[1])
    return ""


def normalize_messages(msgs: Sequence[Any]) -> List[BaseMessage]:
    out: List[BaseMessage] = []
    for m in msgs:
        if isinstance(m, BaseMessage):
            out.append(m)
        elif isinstance(m, tuple) and len(m) == 2:
            role, content = m
            if role == "user":
                out.append(HumanMessage(content=str(content)))
            else:
                out.append(AIMessage(content=str(content)))
    return out

def has_all_compare_evidence(state: AgentState) -> bool:
    want = set(state.get("compare_topics") or [])
    if not want:
        want = set(TOPICS)
    have = set(state.get("evidence_topics") or [])
    return all(t in have for t in want)

In [0]:
def llm_node(state: AgentState, ctx: GraphContext) -> Dict[str, Any]:
    it = int(state.get("iteration_count", 0))
    if it >= ctx.max_iters:
        return {"messages": [AIMessage(content="Reached max iterations; evidence may be insufficient.")], "iteration_count": it + 1}

    user_query = _first_user_query(state)
    routing = ctx.router.route(user_query)
    intent = routing.get("intent", "fetch")

    system = build_system_message(ctx, state.get("long_term_memory", "") or "")

    extra_msgs: List[BaseMessage] = []
    # Stop tool loops once we have enough observations.
    if intent == "compare" and has_all_compare_evidence(state):
        extra_msgs.append(AIMessage(content="You have observations for all requested topics. Do NOT call tools. Answer now."))
    if intent == "fetch" and (state.get("evidence_topics") or []):
        extra_msgs.append(AIMessage(content="You have sufficient evidence. Answer now without calling tools."))

    msgs = normalize_messages([system] + list(state.get("messages", [])) + extra_msgs)
    resp = ctx.llm.invoke(msgs)

    calls = extract_tool_calls(resp)
    updates: Dict[str, Any] = {"messages": [resp], "iteration_count": it + 1}
    if calls:
        updates["pending_tool_calls"] = calls
    return updates


In [0]:
def _choose_default_tool(intent: str) -> str:
    return "wiki_summarize" if intent in ("summarize", "compare") else "wiki_retrieve"


def tool_node(state: AgentState, ctx: GraphContext) -> Dict[str, Any]:
    pending = list(state.get("pending_tool_calls") or [])
    if not pending:
        pending = extract_tool_calls(state["messages"][-1])

    if not pending:
        return {"messages": [], "pending_tool_calls": []}

    call = pending.pop(0)
    tool_name = str(call.get("name", "") or "").strip()
    args = call.get("args", {}) or {}

    user_query = _first_user_query(state)
    routing = ctx.router.route(user_query)
    intent = routing.get("intent", "fetch")

    if tool_name not in ctx.tools:
        tool_name = _choose_default_tool(intent)

    # Fill in standard arguments
    args.setdefault("query", user_query)

    if not args.get("topics"):
        if intent == "compare":
            args["topics"] = list(ctx.available_topics)
        else:
            routed = routing.get("topic", "")
            args["topics"] = [routed] if routed in ctx.available_topics else [ctx.available_topics[0]]

    agent = ctx.tools.get(tool_name)
    if not agent:
        obs = f"[ToolError] Unknown tool: {tool_name}. Available: {list(ctx.tools.keys())}"
        return {"messages": [AIMessage(content=f"Observation:\n{obs}")], "pending_tool_calls": pending}

    exec_result = agent.execute(args)

    evidence_topics = list(state.get("evidence_topics") or [])
    if exec_result.status == TaskStatus.COMPLETED and tool_name == "wiki_retrieve":
        for t in (args.get("topics") or []):
            if t not in evidence_topics:
                evidence_topics.append(t)

    if exec_result.status == TaskStatus.COMPLETED:
        perf = exec_result.performance
        payload = exec_result.result
        obs = (
            f"Tool={exec_result.tool_name} status=completed\n"
            f"Performance: time={perf.execution_time:.3f}s cost={perf.cost_estimate:.4f} mem={perf.memory_usage:.1f}MB success_rate={perf.success_rate:.2f}\n"
            f"{json.dumps(payload, ensure_ascii=False, indent=2) if isinstance(payload, dict) else str(payload)}"
        )
    else:
        obs = f"Tool={exec_result.tool_name} status=failed\nError: {exec_result.error_code} {exec_result.error_message}"

    return {"messages": [AIMessage(content=f"Observation:\n{obs}")], "pending_tool_calls": pending, "evidence_topics": evidence_topics}


In [0]:
def route_edge(state: AgentState) -> str:
    if state.get("pending_tool_calls"):
        return "tools"
    msgs = state.get("messages", [])
    if not msgs:
        return "end"
    last = msgs[-1]
    return "tools" if has_tool_calls(last) else "end"

### Build Graph

In [0]:
def make_graph(ctx: GraphContext):
    g = StateGraph(AgentState)
    g.add_node("memory_recall", lambda s: memory_recall_node(s, ctx))
    g.add_node("llm", lambda s: llm_node(s, ctx))
    g.add_node("tools", lambda s: tool_node(s, ctx))
    g.set_entry_point("memory_recall")
    g.add_edge("memory_recall", "llm")
    g.add_conditional_edges("llm", route_edge, {"tools": "tools", "end": END})
    g.add_edge("tools", "llm")
    return g

In [0]:
built = build_corpora_and_tools(llm=llm, embeddings=embeddings)
tools: Dict[str, BaseToolAgent] = built["tools"]
loaded_topics: List[str] = built["loaded_topics"]
router = HybridRouter(embeddings_model=embeddings, alpha_kw=1.0, beta_emb=1.5)

ctx = GraphContext(
    llm=llm,
    router=router,
    tools=tools,
    system_template=SYSTEM_TEMPLATE,
    tool_list_text=render_available_tools(tools),
    available_topics=loaded_topics,
    enable_long_term_memory=True,
    max_iters=1,
)

[WikiLoad] c-rag: query='Corrective Retrieval-Augmented Generation'




  lis = BeautifulSoup(html).find_all('li')


[WikiLoad] c-rag: chunks=26
[WikiLoad] self-rag: query='Self-Reflective Retrieval-Augmented Generation'
[WikiLoad] self-rag: chunks=50
[WikiLoad] kg-rag: query='Knowledge Graph Retrieval-Augmented Generation'
[WikiLoad] kg-rag: chunks=75


In [0]:
print("=== ROUTER TESTS ===")
fetch_tests = [
    ("c-rag", "What is C-RAG? Explain the mechanism and how it reduces hallucinations."),
    ("self-rag", "What is Self-RAG? Explain the mechanism and how it reduces hallucinations."),
    ("kg-rag", "What is KG-RAG? Explain the mechanism and how it reduces hallucinations."),
]
for expected_topic, q in fetch_tests:
    r = router.route(q)
    print("\n---")
    print("Query:", q)
    print("Expected topic:", expected_topic)
    print("Pred intent:", r["intent"], "| Pred topic:", r["topic"])
    print("Intent scores:", r["intent_scores"])
    print("Topic scores:", r["topic_scores"])

=== ROUTER TESTS ===

---
Query: What is C-RAG? Explain the mechanism and how it reduces hallucinations.
Expected topic: c-rag
Pred intent: fetch | Pred topic: c-rag
Intent scores: {'fetch': 4.241512272076761, 'summarize': 0.9013956853516873, 'compare': 0.9298304392172447}
Topic scores: {'c-rag': 2.1259171646037993, 'self-rag': 1.1350037344369532, 'kg-rag': 0.7592284196685379}

---
Query: What is Self-RAG? Explain the mechanism and how it reduces hallucinations.
Expected topic: self-rag
Pred intent: fetch | Pred topic: self-rag
Intent scores: {'fetch': 4.429376027794736, 'summarize': 1.0729543251518905, 'compare': 0.9785469003749365}
Topic scores: {'c-rag': 1.1322950696902692, 'self-rag': 2.317654830564059, 'kg-rag': 0.7191608928133016}

---
Query: What is KG-RAG? Explain the mechanism and how it reduces hallucinations.
Expected topic: kg-rag
Pred intent: fetch | Pred topic: kg-rag
Intent scores: {'fetch': 4.245850564218703, 'summarize': 1.0533061631304168, 'compare': 0.972868536328061

In [0]:
print("=== INTENT TESTS ===")
q = "Compare C-RAG vs Self-RAG vs KG-RAG and highlight differences."
print("=== TEST ===")
print("Query:", q)
print("Expected topic:", "self-rag")
print("Pred intent:", router.route(q)["intent"], "| Pred topic:", router.route(q)["topic"])
print("Intent scores:", router.route(q)["intent_scores"])
print("Topic scores:", router.route(q)["topic_scores"])

=== INTENT TESTS ===
=== TEST ===
Query: Compare C-RAG vs Self-RAG vs KG-RAG and highlight differences.
Expected topic: self-rag
Pred intent: compare | Pred topic: self-rag
Intent scores: {'fetch': 0.9991316985066052, 'summarize': 1.09009301228323, 'compare': 3.4758081892094017}
Topic scores: {'c-rag': 1.9489114322072685, 'self-rag': 1.9725814449363481, 'kg-rag': 1.9200912843596363}


In [0]:
print("=== TOOL TESTS ===")
for expected_topic, q in fetch_tests:
    r = router.route(q)

    # Force the topic for deterministic testing (do not trust LLM args)
    topics = [expected_topic]

    # Make the retrieval query a bit more "wikipedia-friendly"
    tool_query = {
        "c-rag": "Corrective Retrieval-Augmented Generation hallucination reduction",
        "self-rag": "Self-Reflective Retrieval-Augmented Generation hallucination reduction",
        "kg-rag": "Knowledge Graph Retrieval-Augmented Generation hallucination reduction",
    }[expected_topic]

    exec_result = tools["wiki_retrieve"].execute({"query": tool_query, "topics": topics, "k": 10})

    print("\n---")
    print("Topic:", expected_topic)
    print("Router predicted:", r["intent"], r["topic"])
    print("Tool status:", exec_result.status.value)

    if exec_result.status.value == "completed":
        payload = exec_result.result or {}
        ev = payload.get("evidence", "")
        print("Evidence chars:", len(ev))
        print("Evidence preview:\n", ev[:800])
    else:
        print("Error:", exec_result.error_code, exec_result.error_message)

=== TOOL TESTS ===

---
Topic: c-rag
Router predicted: fetch c-rag
Tool status: completed
Evidence chars: 5126
Evidence preview:
 ## Topic: c-rag
[1] (topic=c-rag, title=Large language model, source=https://en.wikipedia.org/wiki/Large_language_model)
LLMs evolved from earlier statistical and recurrent neural network approaches to language modeling. The transformer architecture, introduced in 2017, replaced recurrence with self-attention, allowing efficient parallelization, longer context handling, and scalable training on unprecedented data volumes. This innovation enabled models like GPT, BERT, and their successors, which demonstrated emergent behaviors at scale, such as few-shot learning and compositional reasoning.

[2] (topic=c-rag, title=Large language model, source=https://en.wikipedia.org/wiki/Large_language_model)
Benchmark evaluations for LLMs have evolved from narrow linguistic assessments toward comprehens

---
Topic: self-rag
Router predicted: fetch self-rag
Tool status: co

In [0]:
graph = make_graph(ctx)
app = graph.compile()

In [0]:
query = "What is Self-RAG?"
state: AgentState = {
    "messages": [("user", query)],
    "iteration_count": 0,
    "long_term_memory": "",
    "evidence_topics": [],
    "compare_topics": loaded_topics,
}
res = app.invoke(state)
print("\nFINAL ANSWER:\n")
print(res["messages"])


FINAL ANSWER:

[HumanMessage(content='What is Self-RAG?', additional_kwargs={}, response_metadata={}, id='6aa96b60-f3e4-4758-b657-f55fffd0caa5'), AIMessage(content='```json\n[\n  {"name": "wiki_retrieve", "args": {"query": "Self-RAG", "topics": ["self-rag"], "k": 3}},\n  {"name": "wiki_retrieve", "args": {"query": "Retrieval-Augmented Generation", "topics": ["self-rag"], "k": 3}}\n]\n```', additional_kwargs={}, response_metadata={'usage': {'prompt_tokens': 315, 'completion_tokens': 78, 'total_tokens': 393}, 'prompt_tokens': 315, 'completion_tokens': 78, 'total_tokens': 393, 'model': 'meta-llama-3.1-8b-instruct-110524', 'model_name': 'meta-llama-3.1-8b-instruct-110524', 'finish_reason': 'stop'}, id='run--c93ff209-231d-4c7b-af3f-6eebf237fe4b-0'), AIMessage(content='Observation:\nTool=wiki_retrieve status=completed\nPerformance: time=0.099s cost=0.0000 mem=25.0MB success_rate=1.00\n{\n  "topics": [\n    "self-rag"\n  ],\n  "k": 3,\n  "evidence": "## Topic: self-rag\\n[1] (topic=self-rag,

In [0]:
query = "Summarize KG-RAG: definition, mechanism, and typical use cases in 6-8 bullets."
state: AgentState = {
    "messages": [("user", query)],
    "iteration_count": 0,
    "long_term_memory": "",
    "evidence_topics": [],
    "compare_topics": loaded_topics,
}
res = app.invoke(state)
print("\nFINAL ANSWER:\n")
print(res["messages"])


FINAL ANSWER:

[HumanMessage(content='Summarize KG-RAG: definition, mechanism, and typical use cases in 6-8 bullets.', additional_kwargs={}, response_metadata={}, id='3e564273-52aa-4940-af55-d756f3209029'), AIMessage(content='```json\n[\n  {"name": "wiki_retrieve", "args": {"query": "KG-RAG definition", "topics": ["self-rag"], "k": 3}},\n  {"name": "wiki_retrieve", "args": {"query": "KG-RAG mechanism", "topics": ["self-rag"], "k": 3}},\n  {"name": "wiki_retrieve", "args": {"query": "KG-RAG typical use cases", "topics": ["self-rag"], "k": 3}}\n]\n```\n\nPlease wait for the observations.', additional_kwargs={}, response_metadata={'usage': {'prompt_tokens': 331, 'completion_tokens': 121, 'total_tokens': 452}, 'prompt_tokens': 331, 'completion_tokens': 121, 'total_tokens': 452, 'model': 'meta-llama-3.1-8b-instruct-110524', 'model_name': 'meta-llama-3.1-8b-instruct-110524', 'finish_reason': 'stop'}, id='run--8e7007ee-d93c-4ff5-8ac9-fba51a434815-0'), AIMessage(content='Observation:\nTool=wi

In [0]:
query = "Compare c-rag, self-rag, and kg-rag. For each: definition, mechanism, and how it reduces hallucinations. Use evidence from tools."
state: AgentState = {
    "messages": [("user", query)],
    "iteration_count": 0,
    "long_term_memory": "",
    "evidence_topics": [],
    "compare_topics": loaded_topics,
}
res = app.invoke(state)
print("\nFINAL ANSWER:\n")
print(res["messages"][-1].content)


FINAL ANSWER:

Reached max iterations; evidence may be insufficient.


In [0]:
print("\n[TEST] Graph end-to-end")
memory = "Preference: ground factual claims in tool outputs; if unsure, say insufficient evidence."
tests = [
    ("FETCH", "What is Self-RAG? Explain the mechanism and how it reduces hallucinations."),
    ("SUMMARIZE", "Summarize KG-RAG: definition, mechanism, and typical use cases in 6-8 bullets."),
    ("COMPARE", "Compare c-rag, self-rag, and kg-rag. For each: definition, mechanism, and how it reduces hallucinations. Use evidence from tools."),
]
for tag, q in tests:
    print("\n" + "=" * 80)
    print(tag, "QUERY:\n", q)
    res = app.invoke({"messages": [("user", q)], "iteration_count": 0, "long_term_memory": memory, "evidence_topics": []})




[TEST] Graph end-to-end

FETCH QUERY:
 What is Self-RAG? Explain the mechanism and how it reduces hallucinations.

FINAL ANSWER:

Answer:
Self-RAG is a type of Retrieval-Augmented Generation model that uses a self-supervised approach to reduce hallucinations. The mechanism involves training the model on a large corpus of text, where the model generates text and then retrieves relevant information from the corpus to correct its own errors. This process allows the model to learn from its own mistakes and improve its performance over time. The self-RAG mechanism reduces hallucinations by providing the model with a way to fact-check its own generated text, ensuring that the output is more accurate and reliable.

SUMMARIZE QUERY:
 Summarize KG-RAG: definition, mechanism, and typical use cases in 6-8 bullets.

FINAL ANSWER:



COMPARE QUERY:
 Compare c-rag, self-rag, and kg-rag. For each: definition, mechanism, and how it reduces hallucinations. Use evidence from tools.


com.databricks.backend.common.rpc.CommandCancelledException
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$5(SequenceExecutionState.scala:132)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$3(SequenceExecutionState.scala:132)
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$3$adapted(SequenceExecutionState.scala:129)
	at scala.collection.immutable.Range.foreach(Range.scala:158)
	at com.databricks.spark.chauffeur.SequenceExecutionState.cancel(SequenceExecutionState.scala:129)
	at com.databricks.spark.chauffeur.ExecContextState.cancelRunningSequence(ExecContextState.scala:715)
	at com.databricks.spark.chauffeur.ExecContextState.$anonfun$cancel$1(ExecContextState.scala:435)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.spark.chauffeur.ExecContextState.cancel(ExecContextState.scala:435)
	at com.databricks.spark.chauffeur.ExecutionContextManagerV1.can

In [0]:
tests = [
    # # 1) FETCH (Self-RAG)
    # ("FETCH", "What is Self-RAG? Explain the mechanism and how it reduces hallucinations."),

    # 2) SUMMARIZE (KG-RAG)
    ("SUMMARIZE", "Summarize KG-RAG: definition, mechanism, and typical use cases in 6-8 bullets."),

    # # 3) COMPARE (all three)
    # ("COMPARE",
    #  "Compare c-rag, self-rag, and kg-rag.\n"
    #  "For each: definition, mechanism, and how it reduces hallucinations.\n"
    #  "Use evidence from tools."),
]

memory = "Preference: ground factual claims in tool outputs; if unsure, say insufficient evidence."

for tag, q in tests:
    print("\n" + "=" * 80)
    print(tag, "QUERY:\n", q)
    res = app.invoke({"messages": [("user", q)], "iteration_count": 0, "long_term_memory": memory})
    print("\nFINAL ANSWER:\n")
    print(res["messages"][-1].content)
    print(res["messages"])



SUMMARIZE QUERY:
 Summarize KG-RAG: definition, mechanism, and typical use cases in 6-8 bullets.

FINAL ANSWER:


[HumanMessage(content='Summarize KG-RAG: definition, mechanism, and typical use cases in 6-8 bullets.', additional_kwargs={}, response_metadata={}, id='84e98300-34a3-4ff3-8ff8-e969e21f5934'), AIMessage(content='```json\n[\n  {"name": "wiki_retrieve", "args": {"query": "KG-RAG definition", "topics": ["self-rag"], "k": 3}},\n  {"name": "wiki_retrieve", "args": {"query": "KG-RAG mechanism", "topics": ["self-rag"], "k": 3}},\n  {"name": "wiki_retrieve", "args": {"query": "KG-RAG typical use cases", "topics": ["self-rag"], "k": 3}}\n]\n```\n\nPlease wait for the tool output...', additional_kwargs={}, response_metadata={'usage': {'prompt_tokens': 346, 'completion_tokens': 122, 'total_tokens': 468}, 'prompt_tokens': 346, 'completion_tokens': 122, 'total_tokens': 468, 'model': 'meta-llama-3.1-8b-instruct-110524', 'model_name': 'meta-llama-3.1-8b-instruct-110524', 'finish_reason': 

In [0]:
question = (
        "Compare c-rag, self-rag, and kg-rag.\n"
        "For each: definition, mechanism, and how it reduces hallucinations.\n"
        "Use evidence from tools."
    )

long_term_memory = (
    "Preference: cover all three methods (c-rag, self-rag, kg-rag) explicitly; "
    "ground factual claims in tool outputs; when unsure, say evidence is insufficient."
)

result = app.invoke(
    {
        "messages": [("user", question)],
        "iteration_count": 0,
        "long_term_memory": long_term_memory,
    }
)

print(result["messages"])

[HumanMessage(content='Compare c-rag, self-rag, and kg-rag.\nFor each: definition, mechanism, and how it reduces hallucinations.\nUse evidence from tools.', additional_kwargs={}, response_metadata={}, id='f11b6111-81ca-47d7-96e5-3f9c9f2f9b50'), AIMessage(content='To compare c-rag, self-rag, and kg-rag, I will first retrieve relevant information from the web and then summarize the key points.\n\n```json\n[\n  {"name":"wiki_retrieve","args":{"query":"c-rag definition","topics":["c-rag"]}},\n  {"name":"wiki_retrieve","args":{"query":"self-rag definition","topics":["self-rag"]}},\n  {"name":"wiki_retrieve","args":{"query":"kg-rag definition","topics":["kg-rag"]}}\n]\n```\n\nAfter retrieving the information, I will summarize the definitions, mechanisms, and how each method reduces hallucinations.\n\nPlease wait for the tool outputs...', additional_kwargs={}, response_metadata={'usage': {'prompt_tokens': 233, 'completion_tokens': 139, 'total_tokens': 372}, 'prompt_tokens': 233, 'completion_t

✅ Uses render_text_description_and_args(tools) (same as your code)

✅ Uses ```json ... ``` to simulate tool calls

✅ Parses tool call JSON, executes tool, returns Observation via ToolMessage

✅ Loops until the model stops calling tools (plain text = final answer)

## b) Reflection

__REQUIRED:__ Provide a detailed reflection addressing  these two questions:
1. If you had more time, which specific improvements or enhancements would you make to your agentic workflow, and why?
2. What concrete steps are required to move this workflow from prototype to production?


> Enter your reflection here

