# Agentic TripAdvisor

## Core Concepts

### Agents
Agents are autonomous decision-makers that use language models to determine which actions to take and in what order. Rather than following a fixed chain of operations, agents can reason about problems, choose appropriate tools, and adapt their approach based on intermediate results. Think of them as the "brain" that orchestrates your application's workflow.

### Tools
Tools are functions or capabilities that agents can invoke to perform specific tasks. These might include:
-   Database queries (like MongoDB operations)
-   API calls
-   Web searches
-   Calculations
-   File operations

Tools extend the LLM's capabilities beyond text generation, allowing it to interact with external systems. Each tool has a name, description, and defined input schema that helps the agent understand when and how to use it.


### Memory
Memory enables applications to maintain context across interactions. There are several types:

- Short-term memory: Maintains conversation history within a session (stored in-memory or temporarily)
- Long-term memory: Persists important information across sessions (often in databases like MongoDB)


### Additional Key Concepts
**Chains**: Sequential operations where the output of one step feeds into the next. Simpler than agents but more predictable.

**Retrievers**: Components that fetch relevant documents from vector stores or databases based on semantic similarity.

**Embeddings**: Vector representations of text used for semantic search in MongoDB's vector search capabilities.

**Prompts**: Templates that structure inputs to the LLM, including system messages, few-shot examples, and dynamic variables.

In [None]:
!pip install -q -r requirements.txt

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()

if not os.environ.get("MONGO_CONNECTION_STRING"):
    print("Connection string for MONGO is not set. Please check your .env file.")
else:
    print("MONGO_CONNECTION_STRING loaded successfully.")

if not os.environ.get("OPENAI_API_KEY"):
    print("API KEY for OPENAI is not set. Please check your .env file.")
else:
    print("OPENAI_API_KEY loaded successfully.")

if not os.environ.get("GROQ_API_KEY"):
    print("API key for Groq is not set. Please check your .env file.")
else:
    print("API key loaded successfully.")

print(os.getenv("MONGO_CONNECTION_STRING"))
print(os.getenv("OPENAI_API_KEY"))
print(os.environ.get("GROQ_API_KEY"))

In [None]:
import pymongo

MONGO_CONNECTION_STRING = os.environ.get("MONGO_CONNECTION_STRING")
mongo_client = pymongo.MongoClient(MONGO_CONNECTION_STRING)

try:
    mongo_client.admin.command('ping')
    print("✅ Connected successfully!")
except Exception as e:
    print("❌ Connection failed:", e)

db = mongo_client["sample_airbnb"]
collection = db["listingsAndReviews"]


In [None]:
from openai import OpenAI

# model = "text-embedding-3-small"
embedding_model_ada_002 = "text-embedding-ada-002"
openai_client = OpenAI()

def get_embedding(text, embedding_model):
    """Generates vector embeddings for the given text."""

    embedding = openai_client.embeddings.create(input=[text], model=embedding_model).data[0].embedding
    return embedding

```python
# generate embeddings for descriptions
import time

for property_doc in collection.find(
        {"$and":[
            {"description": {"$exists": True , "$ne": ""}},
            {"embedding": {"$exists": False}}
        ]
        }, {"_id": 1, "description": 1}):
    text = property_doc.get("description", "")

    embedding = get_embedding(text, embedding_model_ada_002)
    if embedding:
        collection.update_one(
            {"_id": property_doc["_id"]},
            {"$set": {"embedding": embedding}}
        )
    else:
        print(f"⚠️ Skipped doc {property_doc['_id']} (no description or embedding failed)")

    time.sleep(0.2)
````

```python
from pymongo.operations import SearchIndexModel
import time

# Create your index model, then create the search index
search_index_model = SearchIndexModel(
    definition={
        "fields": [
            {
                "type": "vector",
                "path": "embedding",
                "similarity": "cosine",
                "numDimensions": 1536
            }
        ]
    },
    name="vector_index",
    type="vectorSearch"
)
result = collection.create_search_index(model=search_index_model)

# Wait for initial sync to complete
print("Polling to check if the index is ready. This may take up to a minute.")
predicate = None
if predicate is None:
    predicate = lambda index: index.get("queryable") is True

while True:
    indices = list(collection.list_search_indexes(result))
    if len(indices) and predicate(indices[0]):
        break
    time.sleep(5)
print(result + " is ready for querying.")
```

In [None]:
from pydantic import BaseModel
from typing import Annotated, Sequence
from langchain_core.messages import BaseMessage
from enum import StrEnum
import operator

class Phase(StrEnum):
    DISCOVERY = "discovery"
    SERACH = "search"
    REFINEMENT = "refinement"

class DiscoveryState(BaseModel):

    messages: Annotated[Sequence[BaseMessage], operator.add]


    # User preferences
    location: str | None
    travel_purpose: str | None
    budget_min: float | None
    budget_max: float | None
    property_type: str | None

    # Conversation control
    search_results: list[dict] | None
    current_phase: Phase

### Hybrid search

**vector_score** measures semantic similarity between the listing and the user query, based on vector embeddings.

**metadata_score** reflects listing quality using review ratings, number of reviews, and host status.

**$vectorSearch** stage is used to find the listings whose description_embedding vectors are most similar to the user’s query embedding. If the parameter ```numCandidates``` is present Approximate Nearest Neighbor (ANN) Search is performed to find the top-K closest vectors quickly. If the parameter ```exact``` is present the database computes the similarity between the query vector and every stored embedding.

In [None]:
@tool
def search_properties_hybrid(
        query_text: str,
        location: str = None,
        property_type: str = None,
        min_price: float = None,
        max_price: float = None,
        top_k: int = 5
) -> list[dict]:
    """
    Hybrid search combining vector similarity with metadata filters.
    """
    query_embedding = get_embedding(query_text, embedding_model_ada_002)

    match_conditions = []

    if location:
        match_conditions.append({
            "$or": [
                {"address.market": {"$regex": location, "$options": "i"}},
                {"address.country": {"$regex": location, "$options": "i"}},
                {"address.suburb": {"$regex": location, "$options": "i"}}
            ]
        })

    if property_type:
        match_conditions.append({"property_type": property_type})

    if min_price or max_price:
        price_condition = {}
        if min_price:
            price_condition["$gte"] = min_price
        if max_price:
            price_condition["$lte"] = max_price
        match_conditions.append({"price": price_condition})

    pipeline = [
        {
            "$vectorSearch": {
                "index": "property_vector_index",
                "path": "description_embedding",
                "queryVector": query_embedding,
                "numCandidates": 200,
                "limit": 50
            }
        }
    ]

    if match_conditions:
        pipeline.append({"$match": {"$and": match_conditions}})

    pipeline.extend([
        {
            "$addFields": {
                "vector_score": {"$meta": "vectorSearchScore"},
                "metadata_score": {
                    "$add": [
                        {"$multiply": [{"$ifNull": ["$review_scores.review_scores_rating", 0]}, 0.2]},
                        {"$min": [{"$multiply": [{"$ifNull": ["$number_of_reviews", 0]}, 0.1]}, 10]},
                        {"$cond": [{"$eq": ["$host.host_is_superhost", True]}, 5, 0]}
                    ]
                }
            }
        },
        {
            "$addFields": {
                "final_score": {
                    "$add": [
                        {"$multiply": ["$vector_score", 0.6]},
                        {"$multiply": ["$metadata_score", 0.4]}
                    ]
                }
            }
        },
        {"$sort": {"final_score": -1}},
        {"$limit": top_k}
    ])

    # Build projection
    fields = ["_id", "name", "property_type", "bedrooms", "beds", "price", "address", "amenities", "review_scores",
              "final_score", "vector_score", "metadata_score"]
    projection = {field: 1 for field in fields}
    pipeline.extend({"$project": projection})

    # Execute the search
    try:
        results = list(collection.aggregate(pipeline))
    except Exception as e:
        return [{"error": f"Vector search failed: {str(e)}"}]

In [None]:
from langchain.chat_models import init_chat_model

llm_discovery = init_chat_model("llama-3.1-8b-instant", model_provider="groq")
llm_extract_preferences = init_chat_model("openai/gpt-oss-120b", model_provider="groq")

In [None]:
from langchain_core.tools import tool

@tool
def vector_search(query_text, location=None, limit=5 ):
    """
    Executes a vector similarity search in MongoDB Atlas using a defined search index.
    """
    # Build projection
    fields = ["_id", "name", "property_type", "bedrooms", "beds", "price", "address", "amenities", "review_scores"]
    projection = {field: 1 for field in fields}
    projection["score"] = {"$meta": "vectorSearchScore"}

    query_vector = get_embedding(query_text, embedding_model_ada_002)

    # Build aggregation pipeline
    pipeline = [
        {
            "$vectorSearch": {
                "index": "vector_index",
                "queryVector": query_vector,
                "path": "embedding",
                "limit": limit,
                "numCandidates": 100
            }
        },
        {"$project": projection}
    ]

    # Add location filter if provided
    # case-insensitive substring search
    if location:
        pipeline.insert(1, {
            "$match": {
                "$or": [
                    {"address.market": {"$regex": location, "$options": "i"}},
                    {"address.country": {"$regex": location, "$options": "i"}},
                    {"address.suburb": {"$regex": location, "$options": "i"}}
                ]
            }
        })

    pipeline.append({"$project": projection})

    # Execute the search
    try:
        results = list(collection.aggregate(pipeline))
    except Exception as e:
        return [{"error": f"Vector search failed: {str(e)}"}]

    return results

In [None]:
DISCOVERY_SYSTEM_PROMPT = """You are a friendly and helpful AirBnB property discovery assistant.

Your goal is to understand the user's travel needs through natural conversation, then help them find the perfect property.

DISCOVERY PHASE GUIDELINES:
1. Ask questions naturally, one at a time (don't overwhelm with multiple questions)
2. Be conversational and warm, not robotic
3. Adapt questions based on previous answers
4. Once you have enough information to search, synthesize their preferences into a search query

KEY INFORMATION TO GATHER:
- Location (city, neighborhood, country)
- Travel purpose (work, leisure, family, etc.)
- Budget range per night
- Property preferences (entire place, private room, shared)
- Group composition (solo, couple, family, group size)
- Must-have amenities
- Desired vibe/atmosphere

CURRENT STATE: You are gathering information. When you have location + at least 3 other preference points, you can proceed to search.

Be enthusiastic and make the user excited about their trip!
"""

In [None]:
EXTRACTION_PROMPT  = """Based on the conversation, extract the following information in a structured format:

    EXTRACT (use "null" if not mentioned):
    - location: The city/country/neighborhood they want to stay
    - travel_purpose: Why they're traveling (work, leisure, family visit, etc.)
    - budget_min: Minimum price per night (number only)
    - budget_max: Maximum price per night (number only)
    - property_type: Type preference (entire home/apt, private room, shared room)
    - group_composition: Who's traveling (solo, couple, family with X kids, group of X, etc.)
    - amenities: List of must-have amenities mentioned
    - vibe_preference: Desired atmosphere/vibe of property or neighborhood

    Return ONLY a JSON object with these fields. Be precise and extract only explicitly mentioned information.
"""

In [None]:
from langchain_core.messages import SystemMessage

def discovery_agent(state: DiscoveryState) -> DiscoveryState:
    """
    Agent node asking questions and gathering preferences.
    """
    messages = [SystemMessage(content=DISCOVERY_SYSTEM_PROMPT)] + state.messages
    current_phase = state.current_phase

    # Check if we have enough information
    info_collected = sum([
        bool(state.location),
        bool(state.travel_purpose),
        bool(state.property_type),
        bool(state.budget_min),
        bool(state.budget_max),
    ])

    # Add context about what we know
    context_msg = f"\nInformation collected so far: {info_collected}/5 key points."
    if state.location:
        context_msg += f"\n- Location: {state.location}"
    if state.travel_purpose:
        context_msg += f"\n- Purpose: {state.travel_purpose}"
    if state.property_type:
        context_msg += f"\n- Property type: {state.property_type}"
    if state.budget_min:
        context_msg += f"\n- Budget min: {state.budget_min}"
    if state.budget_max:
        context_msg += f"\n- Budget max: {state.budget_max}"

    print("info collected:", info_collected)

    if info_collected >= 3:
        context_msg += "\n\nYou have enough information to search! Synthesize their preferences and offer to find properties."
        current_phase = Phase.SERACH

    messages.append(SystemMessage(content=context_msg))

    # Generate response
    response = llm_discovery.invoke(messages)

    return {
        "messages": [response],
        "current_phase": current_phase
    }

In [None]:
from langchain_core.messages import HumanMessage

def extract_preferences(state: DiscoveryState) -> DiscoveryState:
    """
    Extract structured preferences from conversation using LLM.
    """
    messages = state.messages

    result = llm_extract_preferences.invoke(messages + [HumanMessage(content=EXTRACTION_PROMPT)])

    # Parse the JSON response (simplified - add error handling in production)
    import json
    try:
        extracted = json.loads(result.content)

        return {
            "location": extracted.get("location"),
            "travel_purpose": extracted.get("travel_purpose"),
            "property_type": extracted.get("property_type"),
            "budget_min": extracted.get("budget_min"),
            "budget_max": extracted.get("budget_max")
        }
    except:
        return {}

In [None]:
from langgraph.graph import StateGraph, END

def create_discovery_graph():
    """Build the LangGraph workflow for new user discovery."""

    workflow = StateGraph(DiscoveryState)

    # Add nodes
    workflow.add_node("discovery", discovery_agent)
    workflow.add_node("extract_preferences", extract_preferences)


    workflow.set_entry_point("discovery")
    workflow.add_edge("discovery", "extract_preferences")
    workflow.add_edge("extract_preferences", END)

    return workflow.compile()

In [None]:
from langchain_core.messages import AIMessage

app = create_discovery_graph()

# Initialize state
initial_state = DiscoveryState(
    messages = [],
    location = None,
    property_type= None,
    travel_purpose = None,
    budget_min = None,
    budget_max = None,
    search_results =  None,
    current_phase = Phase.DISCOVERY
)

# Run the conversation
print("🏠 AirBnB Discovery Agent")
print("=" * 50)
print("Agent: Hi! I'm here to help you find the perfect place to stay. Where would you like to go?\n")

state = initial_state
while True:
        # Get user input
        # example: "Hi! I'm looking for a place to stay in Barcelona"
        user_input = input("You: ").strip()

        if user_input.lower() in ["quit", "exit"]:
            print("\nAgent: Thanks for chatting! Your preferences have been saved. Happy travels! 🌍")
            break

        if not user_input:
            continue

        # Add user message to state
        state.messages.append(HumanMessage(content=user_input))

        # Run agent
        try:
            # Execute graph
            result = app.invoke(state)

            new_messages = result.pop("messages", [])
            other_updates = result
            state = state.model_copy(update=other_updates)
            state = state.model_copy(update={"messages": state.messages + new_messages})

            print("state current phase:", state.current_phase)

            # Display agent response
            last_message = state.messages[-1]
            if isinstance(last_message, AIMessage):
                print(f"\nAgent: {last_message.content}\n")

        except Exception as e:
            print(f"\nAgent: I encountered an error: {str(e)}")
            print("Let's continue our conversation.\n")


print(result)

In [None]:
from IPython.display import Image, display

graph = create_discovery_graph()
print(graph.get_graph().draw_mermaid())
# try at https://mermaid.live/
#display(Image(graph.get_graph().draw_mermaid_png()))

## Exercise 1

Modify the state of the LangGraph agent that searches listings to include new contextual search criteria.

Add the following fields to the agent’s state and ensure they are incorporated into the exploration step:
group_composition, amenities, vibe_preference.

## Exercise 2

Build an intelligent movie search agent using the sample_mflix MongoDB dataset.
The agent should use Retrieval-Augmented Generation (RAG) and short-term memory to help users find a specific movie through natural-language dialogue. The user interacts conversationally. The agent should remember the context from previous turns, retrieve relevant documents from MongoDB, combine retrieval results with LLM reasoning, and produce a refined natural-language answer.

## Exercise 3

Design an intelligent agent that converts natural-language questions into executable SQL queries, retrieves results from a relational database, and stores user feedback on the generated queries and responses in a MongoDB collection.

The agent should: Execute the query and summarize the results in natural language; Allow the user to rate the response or correct the generated query. Add conditional edges: if a question is found in the long term histoy, the agen should use the history database to return the results.


