### SG Transport Query Agent - Agentic Workflow Demo

This project demonstrates the design and implementation of an agentic workflow for answering user queries about Singapore public transport.
The system is built using LangGraph to model reasoning, planning, tool execution as separate steps. 

#### 1. Problem Statement, Scope

##### Objective
Design an **agent based system** that can answer natural language user questions about Singapore public bus transport. Agent will dynamically decide **what information it needs, which tools to invoke, how to present results to individual queries**, rather than relying on a single LLM call.

##### Agent Responsibilities
- Interpret user queries
- Identify user intent, relevant entities (e.g. bus stop)
- Retrieve transport data from structured datasets
- Incorporate contextual constraints such as time of day, weather conditions, holidays/events.
- Generate LLM response

#### 2. Tools, Data Source Selection

##### Primary Data Source: LTA DataMall Datasets
The agent uses JSON datasets provided by LTA DataMall to simulate real-time public transport information. In a production system, these dataset readers can be replaced with authenticated, rate-limited APIs without changing the agent workflow.

##### Selected Datasets
- Bus stops dataset - `BusStops.json`
- Bus arrival dataset - `BusArrival.json`

**The BusArrival dataset represents a snapshot of arrival information for a single queried bus stop, mirroring the behavior of the real LTA Bus Arrival API.**

#### 3. Agentic Workflow
The system is implemented as a **stateful, agentic workflow** rather than a single LLM prompt. Each step in the workflow has a single responsibility, and transitions between steps are explicit.

##### Agent State 
Agent operates on a shared state object containing:
- **user_query** – raw input from the user
- **intent** – inferred intent
- **entities** – extracted entities (bus stop, service number)
- **context** – time, weather, holiday constraints
- **planned_tools** – tools selected for execution
- **tool_results** – outputs from dataset-based tools
- **final_response** – LLM-generated answer

##### Workflow Nodes
1. **Input Interpretation**
Analyzes the user query to infer intent and extract entities.

2. **Context Enrichment**
Adds non-user signals such as time of day, weather, and events.

3. **Planning**
Determines which tools need to be invoked and in what order.

4. **Tool Execution**
Executes dataset-based tools and stores results in state.

5. **Response Synthesis**
Uses an LLM to generate a grounded, user-facing response based on:
- Tool outputs
- Contextual constraints
- The original query

## IMPORTS AND SETUP

In [1]:
# IMPORTS
import json
import subprocess
from typing import TypedDict, Any, Optional
from pathlib import Path
from langgraph.graph import StateGraph, END
from datetime import datetime


## AGENT STATE DEFINITION

In [2]:
# AGENT STATE DEFINITION

# using typed dict to make value types consistent
class AgentState(TypedDict):
    user_query: str
    intent: str
    entities: dict[str, Any]
    context: dict[str, Any]
    planned_tools: list[str]
    tool_results: dict[str, Any]
    final_response: str

## DATASET LOADER 

In [3]:
# LOAD DATASET HELPER FUNCTION
DATA_DIR_PATH = Path("dataset")

# load json dataset from disk
def load_json_dataset(filename: str) -> dict:
    path = DATA_DIR_PATH / filename
    with open(path, "r") as f:
        return json.load(f)

## DATASET BASED TOOLS

In [4]:
# DATASET BASED TOOLS

# BUS STOP SEARCH TOOL
def search_bus_stop(query: str) -> Optional[dict[str, Any]]:
    data = load_json_dataset("BusStops.json")
    stops = data.get("value", [])

    query_lower = query.lower()

    for stop in stops:
        desc = stop.get("Description", "").lower()
        road = stop.get("RoadName", "").lower()

        if desc in query_lower or road in query_lower:
            return {
                "bus_stop_code": stop["BusStopCode"],
                "description": stop["Description"],
                "road_name": stop["RoadName"]
            }

    return None



# BUS ARRIVAL LOOKUP TOOL

def get_bus_arrival(bus_stop_code: str, service_no: Optional[str] = None) -> dict[str, Any]:
    data = load_json_dataset("BusArrival.json")

    if data.get("BusStopCode") != bus_stop_code:
        return {"error": "Bus stop code not found in dataset"}

    services = data.get("Services", [])

    if service_no:
        services = [
            s for s in services
            if s.get("ServiceNo") == service_no
        ]

    if not services:
        return {"error": "No matching services found"}

    results = []

    for service in services:
        entry = {
            "service_no": service.get("ServiceNo"),
            "operator": service.get("Operator"),
            "next_buses": []
        }

        for key in ["NextBus", "NextBus2", "NextBus3"]:
            bus = service.get(key)
            if bus and bus.get("EstimatedArrival"):
                entry["next_buses"].append({
                    "estimated_arrival": bus["EstimatedArrival"],
                    "load": bus.get("Load"),
                    "feature": bus.get("Feature"),
                    "type": bus.get("Type")
                })

        results.append(entry)

    return {
        "bus_stop_code": bus_stop_code,
        "services": results
    }

## CONTEXT TOOLS

In [5]:
# CONTEXT TOOLS
# TOOLS TO MODEL CONTEXT

# TIME OF DAY
def get_time_context() -> dict[str, Any]:
    hour = datetime.now().hour

    if 7 <= hour <= 10 or 17 <= hour <= 20:
        return {"time_of_day": "peak"}
    elif hour >= 22 or hour <= 5:
        return {"time_of_day": "late_night"}
    else:
        return {"time_of_day": "off_peak"}

# WEATHER CONTEXT
# simulated, hardcoded for now
def get_weather_context() -> dict[str, Any]:
    return {"weather": "clear"}

# event/holiday context
def get_event_context() -> dict[str, Any]:
    return {"is_holiday": False}


## LLM UTILITIES

In [6]:
# LLM UTILITIES

# ADD LLM BASED RESPONSE GENERATION
def build_response_prompt(state: AgentState) -> str:
    return f"""
You are a transport assistant for Singapore public transport.

User query:
{state['user_query']}

Context:
- Time of day: {state['context'].get('time_of_day')}
- Weather: {state['context'].get('weather')}
- Holiday: {state['context'].get('is_holiday')}

Bus arrival data:
{state['tool_results'].get('bus_arrival')}

Generate a helpful, concise response for the user.
If data is missing, explain the uncertainty.
"""

# use ollama - local llm
# can also use ollama library - ollama.generate()
def call_llm(prompt: str) -> str:
    result = subprocess.run(
        ["ollama", "run", "mistral"],
        input=prompt,
        text=True,
        capture_output=True
    )
    return result.stdout.strip()


## AGENT NODES

In [7]:
# AGENT NODES 

# node to interpret user input
def interpret_input(state: AgentState) -> AgentState:
    query = state["user_query"]
    query_lower = query.lower()
    tokens = query.replace(",", "").split()

    entities = {}

    # intent
    intent = "bus_arrival_query" if "bus" in query_lower else "unknown"

    # bus stop code
    for token in tokens:
        if token.isdigit() and len(token) == 5:
            entities["bus_stop_code"] = token

    # bus stop name fallback
    if "bus_stop_code" not in entities:
        stop = search_bus_stop(query)
        if stop:
            entities["bus_stop_code"] = stop["bus_stop_code"]
            entities["bus_stop_description"] = stop["description"]

    for token in tokens:
        if token.isdigit() and len(token) <= 3:
            entities["service_no"] = token

    return {
        **state,
        "intent": intent,
        "entities": entities
    }


# node to enrich context - time ofday, weather etc.
def enrich_context(state: AgentState) -> AgentState:
    context = {}
    context.update(get_time_context())
    context.update(get_weather_context())
    context.update(get_event_context())

    return {
        **state,
        "context": context
    }

# planning node - decide which tools, apis to be invoked
# decision made based on interpreted intent, entities
def plan_tools(state: AgentState) -> AgentState:
    planned_tools = []

    if state["intent"] == "bus_arrival_query":
        planned_tools.append("bus_arrival_api")

    return {
        **state,
        "planned_tools": planned_tools
    }

# tool execution node
def execute_tools(state: AgentState) -> AgentState:
    # print("DEBUG entities:", state["entities"])

    tool_results = {}

    if "bus_arrival_api" in state["planned_tools"]:
        bus_stop_code = state["entities"].get("bus_stop_code")
        service_no = state["entities"].get("service_no")

        # print("DEBUG bus_stop_code:", bus_stop_code)

        if bus_stop_code:
            tool_results["bus_arrival"] = get_bus_arrival(
                bus_stop_code=bus_stop_code,
                service_no=service_no
            )
        else:
            tool_results["bus_arrival"] = {"error": "Missing bus stop code"}

    # print("DEBUG tool_results:", tool_results)

    return {
        **state,
        "tool_results": tool_results
    }


# final response node
def synthesize_response(state: AgentState) -> AgentState:
    prompt = build_response_prompt(state)
    response = call_llm(prompt)

    return {
        **state,
        "final_response": response
    }

## GRAPH CONSTRUCTION

In [8]:
# LANGGRAPH CONSTRUCTION

graph = StateGraph(AgentState)

graph.add_node("interpret_input", interpret_input)
graph.add_node("enrich_context", enrich_context)
graph.add_node("plan_tools", plan_tools)
graph.add_node("execute_tools", execute_tools)
graph.add_node("synthesize_response", synthesize_response)

graph.set_entry_point("interpret_input")

graph.add_edge("interpret_input", "enrich_context")
graph.add_edge("enrich_context", "plan_tools")
graph.add_edge("plan_tools", "execute_tools")
graph.add_edge("execute_tools", "synthesize_response")
graph.add_edge("synthesize_response", END)

agent = graph.compile()


## USER QUERIES - MULTIPLE INTENTS

In [9]:
# SIMULATE USER QUERIES

simulated_queries = [
    # simple arrival query
    "When is the next bus arriving at Hotel Grand Pacific?",
    "When is bus 176 arriving at Hotel Grand Pacific?",

    # location reference
    "Next bus arrival on Victoria Street",

    # query made during/about peak hours
    "How long will I wait for the next bus at Hotel Grand Pacific during rush hour?",

    # accessibility 
    "Is the next bus at Hotel Grand Pacific wheelchair accessible?",

    # bus capacity
    "Is the next bus crowded at Hotel Grand Pacific?",

    "Are there any buses arriving late night at Hotel Grand Pacific?",

    # 8. Missing service clarification
    "When is the next bus arriving at this stop?",
    "Next bus arrival at stop 01012",
    "Can you tell me when the next bus comes near Hotel Grand Pacific?"
]


## EXECUTION - RUN SIMULATION

In [10]:
# EXECUTION
# INITIALIZE AGENT STATE PER USER
# INVOKE AGENT, CAPTURE RESPONSE

def run_simulation(agent, queries):
    results = []

    for i, query in enumerate(queries, start=1):
        initial_state: AgentState = {
            "user_query": query,
            "intent": "",
            "entities": {},
            "context": {},
            "planned_tools": [],
            "tool_results": {},
            "final_response": ""
        }

        result = agent.invoke(initial_state)

        results.append({
            "user_id": i,
            "query": query,
            "response": result["final_response"]
        })

    return results

# run sim, print results

simulation_results = run_simulation(agent, simulated_queries)

for r in simulation_results:
    print(f"User {r['user_id']} Query:")
    print(r["query"])
    print("\nAgent Response:")
    print(r["response"])
    print("-" * 60)

User 1 Query:
When is the next bus arriving at Hotel Grand Pacific?

Agent Response:
The next bus to arrive at Hotel Grand Pacific (Bus Stop Code 01012) during off-peak hours with clear weather conditions is:

1. Bus Service 176 (Operator SMRT), estimated arrival at 3:27 PM.
2. Bus Service 30 (Operator SBST), estimated arrival at 3:26 PM.
3. Bus Service 78 (Operator TTS), estimated arrival at 3:29 PM.

All three buses are scheduled to arrive with low passenger load (SEA). Please note that these are estimates and the actual arrival times may vary.
------------------------------------------------------------
User 2 Query:
When is bus 176 arriving at Hotel Grand Pacific?

Agent Response:
The next Bus 176 from SMRT is estimated to arrive at Hotel Grand Pacific (Bus Stop 01012) during off-peak hours, around 3:27 PM on August 22nd. Another bus is expected at approximately 3:42 PM and another one at about 3:49 PM. Please note that these are estimated times, so slight delays might occur due to

#### DEPLOYMENT CONSIDERATIONS, REAL WORLD INTEGRATIONS
In a production system, I'd make the following changes:

##### 1. Replace Static Datasets with Live APIs
- Dataset-backed tools would be replaced with authenticated LTA DataMall API clients
- Tool interfaces (get_bus_arrival, search_bus_stop) would remain unchanged
- Only the underlying data access layer would change
- Make workflow api agnostic

##### 2. Agent Service Architecture
The agent would be deployed as a **stateless backend service**, e.g. - REST API, and one agent invocation per request

Each request would:
- Initialize a fresh agent state
- Execute the LangGraph workflow
- Return the final LLM-generated response

Stateless execution enables:
- Horizontal scaling
- Simpler failure recovery
- Easier observability

##### 3. Performance and Scaling
To support higher traffic:
- Frequently accessed bus stop data can be cached
- Arrival data can be reused for short time windows 

- Planner-driven tool invocation ensures that:
- Unnecessary external calls are avoided
- Latency remains predictable