# [STARTER] Udaplay Project

## Part 02 - Agent

In this part of the project, you'll use your VectorDB to be part of your Agent as a tool.

You're building UdaPlay, an AI Research Agent for the video game industry. The agent will:
1. Answer questions using internal knowledge (RAG)
2. Search the web when needed
3. Maintain conversation state
4. Return structured outputs
5. Store useful information for future use

### Setup

In [1]:
# Only needed for Udacity workspace

import importlib.util
import sys

# Check if 'pysqlite3' is available before importing
if importlib.util.find_spec("pysqlite3") is not None:
    import pysqlite3
    sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [2]:
import os, sys
## Add current directory to sys.path to import the local "lib" folder
sys.path.append(os.path.join(os.getcwd()))

In [3]:
# TODO: Import the necessary libs
# For example:
import json
import re
from typing import List, Optional

import chromadb
from dotenv import load_dotenv
from pydantic import BaseModel, RootModel
from tavily import TavilyClient

from lib.evaluation import EvaluationResult, JudgeEvaluation
from lib.llm import LLM
from lib.messages import AIMessage, SystemMessage, ToolMessage, UserMessage
from lib.parsers import PydanticOutputParser
from lib.rag import RAG
from lib.state_machine import Run, Snapshot
from lib.tooling import tool
from lib.vector_db import VectorStore

In [4]:
# TODO: Load environment variables
load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")

In [5]:
# Custom Pydantic Models for RAG and WEB search results
class GameDoc(BaseModel):
    Platform: str
    Name: str
    YearOfRelease: int
    Description: str

class GameDocList(RootModel[list[GameDoc]]):
    root: list[GameDoc]

class WebSearchResult(BaseModel):
    url: str
    title: str
    content: str
    score: float
    raw_content: Optional[str] = None

class WebSearchResultList(RootModel[list[WebSearchResult]]):
    root: list[WebSearchResult]

### Tools

Build at least 3 tools:
- retrieve_game: To search the vector DB
- evaluate_retrieval: To assess the retrieval performance
- game_web_search: If no good, search the web


#### Retrieve Game Tool

In [6]:
# TODO: Create retrieve_game tool
# It should use chroma client and collection you created
# chroma_client = chromadb.PersistentClient(path="chromadb")
# collection = chroma_client.get_collection("udaplay")
# Tool Docstring:
#    Semantic search: Finds most results in the vector DB
#    args:
#    - query: a question about game industry. 
#
#    You'll receive results as list. Each element contains:
#    - Platform: like Game Boy, Playstation 5, Xbox 360...)
#    - Name: Name of the Game
#    - YearOfRelease: Year when that game was released for that platform
#    - Description: Additional details about the game
GAME_DOC_PATTERN = re.compile(
    r"""
    ^\[
        (?P<Platform>.+?)
    \]                      # closing ]
    \s+
    (?P<Name>.+?)           # game name (lazy)
    \s+
    \(
        (?P<YearOfRelease>\d{4})
    \)                      # closing )
    \s*-\s*
    (?P<Description>.+)     # rest of the line
    $""",
    re.VERBOSE | re.DOTALL,
)

def _parse_game_doc(doc: str):
    """
    Parse strings like:
    "[Nintendo 64] Super Mario 64 (1996) - A groundbreaking 3D platformer..."
    into a structured dict with keys:
    Platform, Name, YearOfRelease, Description.
    """
    m = GAME_DOC_PATTERN.match(doc.strip())
    if not m:
        # Fallback if format doesn't match
        return {
            "Platform": None,
            "Name": None,
            "YearOfRelease": None,
            "Description": doc.strip(),
        }

    data = m.groupdict()
    # Cast year to int
    data["YearOfRelease"] = int(data["YearOfRelease"])
    return data

@tool
def retrieve_game(query) -> str:
    """
    Semantic search: Finds most results in the vector DB
    args:
    - query: a question about game industry. 

    You'll receive results as list. Each element contains:
    - Platform: like Game Boy, Playstation 5, Xbox 360...)
    - Name: Name of the Game
    - YearOfRelease: Year when that game was released for that platform
    - Description: Additional details about the game
    """
    chroma_client = chromadb.PersistentClient(path="chromadb")
    collection = chroma_client.get_collection("udaplay")
    vector_db = VectorStore(collection)
    
    results= vector_db.query(query_texts=[query])
    raw_docs = results["documents"][0] if results["documents"] else []
    parsed = [_parse_game_doc(doc) for doc in raw_docs]
    return json.dumps(parsed, ensure_ascii=False)


user_query = "What is the first 3D platformer Mario game?"
retrieved_docs = retrieve_game(user_query)
retrieved_docs

'[{"Platform": "Nintendo 64", "Name": "Super Mario 64", "YearOfRelease": 1996, "Description": "A groundbreaking 3D platformer that set new standards for the genre, featuring Mario\'s quest to rescue Princess Peach."}, {"Platform": "Super Nintendo Entertainment System (SNES)", "Name": "Super Mario World", "YearOfRelease": 1990, "Description": "A classic platformer where Mario embarks on a quest to save Princess Toadstool and Dinosaur Land from Bowser."}, {"Platform": "Nintendo Switch", "Name": "Mario Kart 8 Deluxe", "YearOfRelease": 2017, "Description": "An enhanced version of Mario Kart 8, featuring new characters, tracks, and improved gameplay mechanics."}]'

In [7]:
user_query = "What is the first 3D platformer Mario game?"
retrieved_docs = retrieve_game(user_query)
retrieved_docs

'[{"Platform": "Nintendo 64", "Name": "Super Mario 64", "YearOfRelease": 1996, "Description": "A groundbreaking 3D platformer that set new standards for the genre, featuring Mario\'s quest to rescue Princess Peach."}, {"Platform": "Super Nintendo Entertainment System (SNES)", "Name": "Super Mario World", "YearOfRelease": 1990, "Description": "A classic platformer where Mario embarks on a quest to save Princess Toadstool and Dinosaur Land from Bowser."}, {"Platform": "Nintendo Switch", "Name": "Mario Kart 8 Deluxe", "YearOfRelease": 2017, "Description": "An enhanced version of Mario Kart 8, featuring new characters, tracks, and improved gameplay mechanics."}]'

#### Evaluate Retrieval Tool

In [8]:
# TODO: Create evaluate_retrieval tool
# You might use an LLM as judge in this tool to evaluate the performance
# You need to prompt that LLM with something like:
# "Your task is to evaluate if the documents are enough to respond the query. "
# "Give a detailed explanation, so it's possible to take an action to accept it or not."
# Use EvaluationReport to parse the result
# Tool Docstring:
#    Based on the user's question and on the list of retrieved documents, 
#    it will analyze the usability of the documents to respond to that question. 
#    args: 
#    - question: original question from user
#    - retrieved_docs: retrieved documents most similar to the user query in the Vector Database
#    The result includes:
#    - useful: whether the documents are useful to answer the question
#    - description: description about the evaluation result

@tool
def evaluate_retrieval(question: str, retrieved_docs: GameDocList) -> JudgeEvaluation :
    """
    Based on the user's question and on the list of retrieved documents, 
    it will analyze the usability of the documents to respond to that question. 
    
    args: 
    - question: original question from user
    - retrieved_docs: retrieved documents most similar to the user query in the Vector Database
    
    The result includes:
    - useful: whether the documents are useful to answer the question
    - description: description about the evaluation result
    """
    judge = LLM(api_key=OPENAI_API_KEY)
    judge_response = judge.invoke(
        [
            SystemMessage(
                content="""
                You are a retrieval-quality judge for a video game knowledge system.

                Your task: Evaluate whether the retrieved documents are sufficient and useful to answer the user's question accurately.

                Evaluation criteria:
                1. RELEVANCE: Does at least one document directly contain information that answers the question?
                2. SUFFICIENCY: Is the information complete enough (e.g., correct game name, platform, year) without needing external search?
                3. ACCURACY: Do the documents support a factual, unambiguous answer?

                Output requirements:
                - task_completed: True if the documents allow a confident, correct answer; False if information is missing, conflicting, or irrelevant.
                - format_correct: True if the retrieved docs are well-formed (Platform, Name, YearOfRelease, Description).
                - instructions_followed: True if the evaluation is coherent and actionable.
                - explanation: Give a concise rationale. When citing a matching document, wrap its game name in angle brackets, e.g. <Super Mario 64> or <Pokémon Gold and Silver>. If documents are insufficient, state what is missing and recommend web search."""
            ),
            UserMessage(
                content=f"Question: {question}\nRetrieved Documents: {retrieved_docs}"
            )
        ], 
        response_format=JudgeEvaluation
    )
     # Parse the structured response
    parser = PydanticOutputParser(model_class=JudgeEvaluation)
    return parser.parse(judge_response)

In [9]:
### Writy utility function to extract citation for the documents matching the
### user query from the Eval Agent explanation

def _extract_and_match_citations(explanation: str, retrieved_docs: GameDocList) -> list[dict]:
    """
    Extract citations from explanation (marked by <NAME>) and match them to retrieved documents.
    
    Args:
        explanation: Text containing citations in angle brackets, e.g. "The document <Super Mario 64> answers..."
        retrieved_docs: List of dicts with keys Platform, Name, YearOfRelease, Description
        
    Returns:
        List of dicts: {"citation": str, "matched_doc": dict | None}
    """
    # Extract citations from explanation as per the eval agent instructions
    pattern = re.compile(r"<([^>]+)>")

    citations = [m.strip() for m in pattern.findall(explanation)]
    
    results = []
    if not retrieved_docs:
        return []
    try:
        docs = GameDocList.model_validate(retrieved_docs)
    except Exception as e:
        try:
            docs = GameDocList.model_validate(json.loads(retrieved_docs))
        except Exception as e:
            print(e)
            return []
    
    for cite in citations:
        matched = None
        for doc in docs.root:
            name = doc.Name.strip()
            if name and (cite.lower() == name.lower()):
                results.append({"citation": cite, "matched_doc": doc})
        
    return results

In [10]:
## Test Eval Agent
eval_result = evaluate_retrieval(user_query, retrieved_docs)
eval_result

JudgeEvaluation(task_completed=True, format_correct=True, instructions_followed=True, explanation='The document containing <Super Mario 64> directly answers the question by identifying it as the first 3D platformer Mario game, released in 1996 on the Nintendo 64. The information is complete and accurate, providing the necessary details.')

In [11]:
explanation = eval_result.explanation
print("Explanation: ", explanation)
citations = _extract_and_match_citations(eval_result.explanation, retrieved_docs)
print("Citations: ", citations)

Explanation:  The document containing <Super Mario 64> directly answers the question by identifying it as the first 3D platformer Mario game, released in 1996 on the Nintendo 64. The information is complete and accurate, providing the necessary details.
Citations:  [{'citation': 'Super Mario 64', 'matched_doc': GameDoc(Platform='Nintendo 64', Name='Super Mario 64', YearOfRelease=1996, Description="A groundbreaking 3D platformer that set new standards for the genre, featuring Mario's quest to rescue Princess Peach.")}]


#### Game Web Search Tool

In [12]:
# TODO: Create game_web_search tool
# Please use Tavily client to search the web
# Tool Docstring:
#    Semantic search: Finds most results in the vector DB
#    args:
#    - question: a question about game industry. 


@tool
def game_web_search(question)-> str:
    """
    Semantic search: Finds most results in the vector DB
    args:
    - question: a question about game industry. 
    """
    # Write a TAVILY search query
    tavily_client = TavilyClient(api_key=TAVILY_API_KEY)    
    results= tavily_client.search(question)
    tavily_results = results["results"] if results else []
    return json.dumps(tavily_results, ensure_ascii=False)

In [13]:
def _get_web_citations(documents: str, threshold: float = 0.9) -> list[dict]:
    """
    Extract cited documents with score >= threshold, ranked from highest to lowest.

    Args:
        documents: List of dicts with at least a 'score' key.
        threshold: Minimum score (default 0.9). Documents below this are excluded.

    Returns:
        List of documents with score >= threshold, sorted by score descending.
    """
    try:
        print(">> Documents")
        # Remove the extra quotes that could make the parsing fail
        if documents.startswith('"') and documents.endswith('"'):
            documents = documents[1:-1].encode().decode('unicode_escape')  # handles \" → "
        web_docs = WebSearchResultList.model_validate(json.loads(documents))
        
    except Exception as e:
        print(e)
        return []

    return sorted(
        (d for d in web_docs.root if (d.score or 0) >= threshold),
        key=lambda d: d.score,
        reverse=True,
    )

In [14]:
web_result = game_web_search(user_query)
web_result

'[{"url": "https://medium.com/super-jump/exploring-super-marios-35th-anniversary-anthology-a3dc624c2555", "title": "Exploring Super Mario\'s 35th Anniversary Anthology - Medium", "content": "*Super Mario 64* is the first 3D Mario game. But it’s not just that this was Mario’s first foray into 3D — *Super Mario 64* became the benchmark for *all* 3D games in the mid-90s. *Super Mario 64* set the standard, and its DNA is present across all 3D Mario games. *Super Mario Sunshine* is regarded — understandably — as the black sheep of the 3D Mario games. Partly because the Wii U sold so poorly, relative to other Nintendo machines, I’d wager that *Super Mario 3D World* is probably the least-played 3D Mario game at this point. Aside from being a great Mario game, *Super Mario 3D World* is particularly unique in the sense that you can play the entire experience cooperatively on the same screen. Even if these games (at least those prior to *Super Mario 3D World*) are only slightly improved to be a 

### Agent

In [15]:
# TODO: Create your Agent abstraction using StateMachine
# Equip with an appropriate model
# Craft a good set of instructions 
# Plug all Tools you developed
from lib.agents import Agent

games_agent = Agent(
    model_name="gpt-4o-mini",
    instructions=(
        "You are a helpful agent that can answer questions about the video game industry."
        "You can use the following tools to help you answer the questions. "
        "To answer question you must following this order: \n"
        "1. Start by searching information from the vector DB"
        "2. Evaluate if the retrieved documents are enough to answer the question"
        "3. If the documents are not enough, search the web for more information"
    ),
    tools=[
        retrieve_game,
        evaluate_retrieval,
        game_web_search
    ]
)


In [20]:
# TODO: Invoke your agent
# - When Pokémon Gold and Silver was released?
# - Which one was the first 3D platformer Mario game?
# - Was Mortal Kombat X realeased for Playstation 5?

In [21]:
def _get_snapshot_citations(run: Run) -> list[dict]:
    """
    Extract citations from the snapshot messages.
    Look at the last user message in the last run snapshot's available
    and extract all relevent citations captured by the RAG and Web Search tools.
    For the RAG citations, use the evaluation agent to filter the most relevant ones (if any)
    For the Web search citations, use the threshold of 0.75 to filter the most relevant ones (if any)

    arg:
        - run: Run object containing the snapshot messages
    Returns:
        - list of RAG citations
        - list of Web search citations
    """

    # Get the last snapshot messages containing all calls of the run session
    snapshot_messages = run.snapshots[-1].state_data["messages"]
    
    current_retrieved_docs=[]
    # RAG citation confirmed by the evaluation agent
    rag_citations= []

    # Web citation from the web search tool exceeding a certain threshold of relevance
    web_citations= []

    # Find the last user message index, to get citation of the last call only (if any)
    last_user_msg_indx = -1
    
    for  indx, s in enumerate(snapshot_messages):
        if isinstance(s, UserMessage):
            last_user_msg_indx = indx

    if last_user_msg_indx == -1:
        return [], []

    for s in snapshot_messages[last_user_msg_indx+1:]:
        
        if not isinstance(s, ToolMessage):
            # Ignore all non-tool messages
            continue

        if s.name == "retrieve_game":
            # Get the list of retrieved documents
            # The retreive step should happen before the evaluation one
            current_retrieved_docs = GameDocList.model_validate_json(json.loads(s.content)).root
    
        if s.name == "evaluate_retrieval":
            # Get the valid citations based on the evaluation result
            # and last retreive results
            rag_citations = _extract_and_match_citations(s.content, current_retrieved_docs)

        if s.name == "game_web_search":
            # Get the web search results
            print(">> Web Search Result")
            web_citations = _get_web_citations(s.content, threshold=0.75)

    return rag_citations, web_citations
    
def display_citations(run: Run)-> None:
    """
    Display the citations from the last snapshot of the run
    Args:
        - run: Run object containing the snapshot messages
    """             
    rag_citations, web_citations = _get_snapshot_citations(run)
    print("-"*100)
    if rag_citations:
        print(">> RAG Citations:\n", rag_citations)
        print("-"*100)

    if web_citations: 
        print("@@ Web Citations:\n", web_citations)
        print("-"*100)

def invoke_agent(agent: Agent, query: str, session_id: str)-> Run:
    """
    Invoke the agent and return the run
    Display the agent answer and the appropriate citations (if any)
    Args:
        - agent: Agent object
        - query: str, the question to ask the agent
        - session_id: str, the session id
    Returns:
        - Run object containing the snapshot messages
    """

    run_results = agent.invoke(query=query, session_id=session_id) 
    print("-"*100)
    print(">> Agent answer: ", run_results.get_final_state()["messages"][-1].content)
    display_citations(run_results)
    return run_results


## Testing Agent Runs

In [22]:
## Session 1: "When Pokémon Gold and Silver was released?"
## Session 1 (show agent short term memory): "When Pokémon Gold and Silver was released?" 
## Session 2: "Which one was the first 3D platformer Mario game?"
## Session 3: "Was Mortal Kombat X realeased for Playstation 5?"
question_1 = "When Pokémon Gold and Silver was released?"
question_2 = "Which one was the first 3D platformer Mario game?"
question_3 = "Was Mortal Kombat X realeased for Playstation 5?"

### Test Session 1 
Run three different queries give different results with appropriate citations (RAG or WebSearchResult)

In [24]:
# Session 1, Question 1: "When Pokémon Gold and Silver was released?"
invoke_agent(games_agent, query=question_1, session_id="id_1")

[StateMachine] Starting: __entry__
[StateMachine] Executing step: message_prep
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Terminating: __termination__
----------------------------------------------------------------------------------------------------
>> Agent answer:  Pokémon Gold and Silver was released in 1999 for the Game Boy Color.
----------------------------------------------------------------------------------------------------
>> RAG Citations:
 [{'citation': 'Pokémon Gold and Silver', 'matched_doc': GameDoc(Platform='Game Boy Color', Name='Pokémon Gold and Silver', YearOfRelease=1999, Description='Second-generation Pokémon games introducing new regions, Pokémon, and gameplay mechanics.')}]
----------------------------------------------------------------------------------------------

Run('145941e2-49f1-4298-8b57-b0aae90df215')

#### Note: 
- The agent went through 5 steps involving the use the (tool_executor) RAG tool and 
the evaluation tool (judge) to answer the question, and

- The Citation correspond to the RAG extraction, filtered by the Eval Judge 

In [25]:
# Session 1, Question 2: 'Which one was the first 3D platformer Mario game?'
invoke_agent(games_agent, query=question_2, session_id="id_1")

[StateMachine] Starting: __entry__
[StateMachine] Executing step: message_prep
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Terminating: __termination__
----------------------------------------------------------------------------------------------------
>> Agent answer:  The first 3D platformer Mario game is **Super Mario 64**, which was released in 1996 for the Nintendo 64.
----------------------------------------------------------------------------------------------------
>> RAG Citations:
 [{'citation': 'Super Mario 64', 'matched_doc': GameDoc(Platform='Nintendo 64', Name='Super Mario 64', YearOfRelease=1996, Description="A groundbreaking 3D platformer that set new standards for the genre, featuring Mario's quest to rescue Princess Peach.")}]
-------------------------------------------------

Run('287c6784-c666-4f61-b1c6-bb0364ad5290')

#### Note: 
- Same as session 1, Question 1 since this is a new question and the solution 
in the agent knowledge base

In [26]:
# Session 1, Question 3: 'Was Mortal Kombat X realeased for Playstation 5?'
invoke_agent(games_agent, query=question_3, session_id="id_1")

[StateMachine] Starting: __entry__
[StateMachine] Executing step: message_prep
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Terminating: __termination__
----------------------------------------------------------------------------------------------------
>> Agent answer:  Mortal Kombat X was not originally released for the PlayStation 5; it was developed for PlayStation 4, Xbox One, and other platforms. However, it is playable on the PlayStation 5 through backward compatibility, which allows players to run the PlayStation 4 version of the game on the newer console. 

Additionally, players have reported that they can play Mortal Kombat X on PS5 without issues.
>> Web Search Result
>> Documents
--------------

Run('01884c35-6eb8-4257-ab45-b6e4b34b967e')

#### Notes:
- Same as previous run but with more steps (states) 
- The additional steps reflects the fact the agent did not the result in the knowledge base and went to the web using an additional web search tool this time
- Note that the citation correspond to the type of search (@@ Web Citations)

In [27]:
## Session 1, Question 1 (bis): 'When Pokémon Gold and Silver was released?'
# Ask again the same question that was asked in the begining of the same session (id_1)
invoke_agent(games_agent, question_1, "id_1")

[StateMachine] Starting: __entry__
[StateMachine] Executing step: message_prep
[StateMachine] Executing step: llm_processor
[StateMachine] Terminating: __termination__
----------------------------------------------------------------------------------------------------
>> Agent answer:  Pokémon Gold and Silver was released in 1999 for the Game Boy Color.
----------------------------------------------------------------------------------------------------


Run('8feb37e0-c777-44fc-9501-94a4caef7c33')

#### Note:
- The agent went through 2 steps this time, whothout invoking any tool, when facing "question_1" which was asked previously

- This demontstrates that the agent was able to recognize this past question and use its short memory to respond rather than calling the tools

In [28]:
## Session 2, Question 1: 'When Pokémon Gold and Silver was released?'
# Ask the same question but with a new session (id_2)
invoke_agent(games_agent, question_1, "id_2")

[StateMachine] Starting: __entry__
[StateMachine] Executing step: message_prep
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Executing step: tool_executor
[StateMachine] Executing step: llm_processor
[StateMachine] Terminating: __termination__
----------------------------------------------------------------------------------------------------
>> Agent answer:  Pokémon Gold and Silver was released in 1999 for the Game Boy Color.
----------------------------------------------------------------------------------------------------
>> RAG Citations:
 [{'citation': 'Pokémon Gold and Silver', 'matched_doc': GameDoc(Platform='Game Boy Color', Name='Pokémon Gold and Silver', YearOfRelease=1999, Description='Second-generation Pokémon games introducing new regions, Pokémon, and gameplay mechanics.')}]
----------------------------------------------------------------------------------------------

Run('7ba64b37-a823-4645-b7c6-b8671d1dbbd0')

#### Note:
- Like with the first run in the first session the agent went through 5 steps with two tool executors (RAG and Judge)
- This desmonstrate that there is no contamination between the sessions as the agent act as if it sees the question for the first time.
- The citation correspond to the RAG extraction, filtered by the Eval Judge 

### (Optional) Advanced