# Financial AI Agent
***
**An AI-powered conversational assistant that helps retail banking customers understand their financial behavior through natural language queries.**
***

This project implements a production-grade Financial AI Agent that:
* Understands natural language financial questions
* Executes queries against transaction data with accurate calculations
* Explains answers in simple, conversational language
* Logs complete reasoning trails for compliance and observability
***
‚ö†Ô∏è Note on Data: 
* transactions.csv contains partial, synthetically generated data for demonstration purposes only. It is not real banking data. In production, this would connect to actual transaction databases.

In [1]:
# import project_structure as ps
# ps.show_project_structure()

In [2]:
# ============================================================================
# STANDARD LIBRARY IMPORTS
# ============================================================================

import os
import sys
import json
from pathlib import Path
from datetime import date, datetime, timedelta
from typing import Dict, Any, List, Optional
from pprint import pprint

# ============================================================================
# THIRD-PARTY IMPORTS
# ============================================================================

import pandas as pd
from dotenv import load_dotenv
from langchain_anthropic import ChatAnthropic
from langgraph.graph import StateGraph

# ============================================================================
# PROMPTS (LLM-1 & LLM-2)
# ============================================================================

from prompts.llm1_prompt import OPTIMIZED_ROUTER_SYSTEM_PROMPT
from prompts.llm2_prompt import llm2_prompt_builder, BASE_LLM2_SYSTEM_PROMPT

# print("‚úÖ Prompts imported successfully")

# ============================================================================
# SCHEMAS & MODELS
# ============================================================================

from schemas.router_models import (
    GraphState,
    RouterOutput,
    ConversationSummary,
    PreferenceEntry,
    ExecutionResult,
    BackofficeLog,
    DataSources,
    ClarificationStep
)

from schemas.executor_models_llm2 import (
    TransactionQuerySpec,
    TransactionQueryResult,
    TransactionRecord,
)

# print("‚úÖ Schemas & models imported successfully")

# ============================================================================
# GRAPH DEFINITION (LangGraph Nodes & Edges)
# ============================================================================

from graph_definition import (
    build_graph,
    router_node,
    vague_handler_node,
    executor_node,
    summary_update_node,
    build_router_payload,
    build_executor_payload
)

# print("‚úÖ Graph definition imported successfully")

# ============================================================================
# TOOLS
# ============================================================================

from schemas.transactions_tool import (
    query_transactions_lc_tool, 
    query_transactions_tool
)

# print("‚úÖ Tools imported successfully")

# ============================================================================
# BACK-OFFICE LOGGING
# ============================================================================

import backoffice_logging as bolog

# print("‚úÖ Back-office logging imported successfully")

# ============================================================================
# TESTS
# ============================================================================

# import tests.llm1_tests as tst_llm1
# import tests.llm2_tests as tst_llm2
# import tests.pipeline_no_rag_tests as tst_no_rag_pipeline

# print("‚úÖ Test modules imported successfully")

# Configurations 

In [3]:
# Enable autoreload for development
%load_ext autoreload
%autoreload 2

In [4]:
"""
Create prompts/__init__.py
Makes prompts/ directory a proper Python package
"""

from pathlib import Path

# Define the complete content
INIT_CONTENT = """\"\"\"
Prompt modules for LLM-1 (Router & Clarifier) and LLM-2 (Executor)
\"\"\"

from .llm1_prompt import OPTIMIZED_ROUTER_SYSTEM_PROMPT
from .llm2_prompt import llm2_prompt_builder, BASE_LLM2_SYSTEM_PROMPT

__all__ = [
    'OPTIMIZED_ROUTER_SYSTEM_PROMPT',
    'llm2_prompt_builder',
    'BASE_LLM2_SYSTEM_PROMPT',
]
"""

# Create the file
project_root = Path.cwd()
init_file = project_root / "prompts" / "__init__.py"

print("=" * 80)
print("Creating prompts/__init__.py")
print("=" * 80)

# Write file
with open(init_file, 'w', encoding='utf-8') as f:
    f.write(INIT_CONTENT)

# # Verify
# if init_file.exists():
#     file_size = init_file.stat().st_size
#     print(f"\n‚úÖ SUCCESS: File created")
#     print(f"   Location: {init_file}")
#     print(f"   Size: {file_size} bytes")
    
#     # Show contents
#     print("\nüìÑ File Contents:")
#     print("-" * 80)
#     with open(init_file, 'r', encoding='utf-8') as f:
#         print(f.read())
#     print("-" * 80)
    
#     print("\n" + "=" * 80)
#     print("‚úÖ DONE! Now restart kernel and re-run imports")
#     print("=" * 80)
# else:
#     print("\n‚ùå ERROR: File was not created")


Creating prompts/__init__.py


In [5]:
# Load autoreload extension
%load_ext autoreload

# Set autoreload mode
# Mode 2: Reload all modules (except those excluded) every time before executing code
%autoreload 2
    
print("‚úÖ Autoreload enabled (mode 2)")
print("   All modules will be automatically reloaded when changed")
print("\nüìù Watched modules:")
print("   ‚Ä¢ All modules in schemas/")
print("   ‚Ä¢ All modules in prompts/")
print("   ‚Ä¢ All modules in tests/")
print("   ‚Ä¢ Changes will be reflected immediately without kernel restart")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
‚úÖ Autoreload enabled (mode 2)
   All modules will be automatically reloaded when changed

üìù Watched modules:
   ‚Ä¢ All modules in schemas/
   ‚Ä¢ All modules in prompts/
   ‚Ä¢ All modules in tests/
   ‚Ä¢ Changes will be reflected immediately without kernel restart


# ***************************************************************************************

# TEST-1: Schema Validation for all Pydantic models 
***
* Validate all Pydantic models load correctly before running LLM or RAG tests.
* No LLM calls, no API keys needed - runs in <1 second.
***
**Schemas Tested:**
* **PreferenceEntry** - User preferences with override tracking (previous_value, previous_turn_id)
* **ConversationSummary** - Session-scoped preferences (time_window, amount_threshold_large)
* **RouterOutput** - LLM-1 structured output (clarity, core_use_cases, summary_update)
* **BackofficeLog** - Compliance logging with reasoning_steps and data_sources
* **GraphState** - LangGraph state with raw_messages for multi-turn
***

In [6]:
import tests.test_schemas as ts
ts.test_all_schemas()

üß™ SCHEMA VALIDATION TESTS

‚úÖ PreferenceEntry
   Fields: value, source, turn_id, original_query, previous_value, previous_turn_id

‚úÖ ResolvedDates
   Fields: start_date, end_date, interpretation

‚úÖ ConversationSummary
   Fields: time_window, amount_threshold_large, account_scope, category_preferences

‚úÖ RouterOutput
   Fields: clarity, core_use_cases, uc_operations, primary_use_case, ...
   Fields: complexity_axes, needed_tools, clarifying_question, missing_info, ...
   Fields: summary_update, resolved_dates, resolved_trn_categories, resolved_amount_threshold

‚úÖ DataSources
   Fields: tables_used, fields_accessed, filters_applied, aggregations_used

‚úÖ ClarificationStep
   Fields: question, user_answer, turn_id

‚úÖ BackofficeLog
   Fields: user_query, resolved_query, answer, analysis, reasoning_steps, ...
   Fields: data_sources, transactions_analyzed, preferences_used, ...
   Fields: clarification_history, confidence, rag_used, router_output_snapshot

‚úÖ ExecutionResult

[('PreferenceEntry', True, None),
 ('ResolvedDates', True, None),
 ('ConversationSummary', True, None),
 ('RouterOutput', True, None),
 ('DataSources', True, None),
 ('ClarificationStep', True, None),
 ('BackofficeLog', True, None),
 ('ExecutionResult', True, None),
 ('GraphState', True, None)]

# TEST-2: LLM-1 Multi-Turn (VAGUE Queries only)
***
Test LLM-1 router_node and summary_update_node behavior with multi-turn flow.
***
**What This Validates:**
* **Turn 1 VAGUE** - Clarifying question generated for missing time_window or amount_threshold
* **Turn 2 CLEAR** - Correct core_use_cases and primary_use_case after user answers
* **summary_update** - LLM-1 produces preference updates from user clarification
* **conversation_summary** - Preferences merged correctly via summary_update_node
* **raw_messages** - Full conversation history passed to LLM-1 in Turn 2
***
***
**LangGraph Coverage:**
* ‚úÖ router_node() execution
* ‚úÖ summary_update_node() execution  
* ‚úÖ State passed correctly between nodes
* ‚ùå Full graph.compile() - tested in pipeline tests
* ‚ùå Conditional edges (VAGUE‚Üívague_handler, CLEAR‚Üíexecutor) - tested in pipeline tests
* ‚ùå app.invoke() end-to-end - tested in pipeline tests
***
**Why Partial Coverage of LangGraph:** 
These tests isolate LLM-1 behavior by calling nodes directly. 
Full orchestration requires pipeline_tests.py which tests app.invoke() flow.
***

In [7]:
import tests.llm1_tests as tst_llm1

# # Test VAGUE queries 

# # 1: All queries (default)
vague_results = tst_llm1.llm1_test_all_vague_queries_multiturn()

# # 2: Random N queries
# vague_results = tst_llm1.llm1_test_all_vague_queries_multiturn(num_examples_to_check=2)
# clear_results = tst_llm1.llm1_test_all_clear_queries(num_examples_to_check=3)

# 3: Specific queries by ID
# vague_results = tst_llm1.llm1_test_all_vague_queries_multiturn(query_ids=[14, 15])

üìã Testing all 5 VAGUE queries
üß™ LLM-1 MULTI-TURN TEST: VAGUE QUERIES

Each conversation is INDEPENDENT with fresh ConversationSummary

VAGUE Query #11: "What are my coffee shop expenses?"
Missing: timeframe

‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ TURN 1: User Query (VAGUE)                                                                       ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

üë§ User: "What are my coffee shop expenses?"
üìù ConversationSummary: (empt

            id = uuid7()
Future versions will require UUID v7.
  input_data = validator(cls_, input_data)


LLM-1 wants to use 1 tool(s)
Executing tool: search_transaction_categories with args: {'terms': ['coffee shop']}

--- LLM-1 Router Iteration 2 ---
LLM-1 responded without tool calls - parsing RouterOutput

‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ TURN 1: LLM-1 Response (Clarification)                                                           ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

ü§ñ LLM-1 Classification:
   clarity: VAGUE
   clarifying_question: For what time per

# *********************************** END TO END PIPELINE TESTS _WITH_ RAG *********************************
# LLM-1+LLM-2 for Use-Cases 04 (must RAG usage): 
---

WHAT THE TEST SHOWS:

üîπ RAG TOOL EXECUTION:
   - Was the tool called?
   - What category was found?
   - Expected vs Actual comparison

üîπ LLM-1 OUTPUT:
   - Clarity correct?
   - UC-04 detected?
   - resolved_trn_categories populated?

üîπ LLM-2 INPUT CHECK:
   - Did LLM-2 receive the category?
   - Did LLM-2 receive the dates?

üîπ LLM-2 GROUNDING CHECK:
   - Did LLM-2 use ONLY parameters from LLM-1?
   - No hallucinated filters?

üîπ LLM-2 TOOL USAGE:
   - Which tables accessed?
   - What filters applied (SQL-like)?
   - What aggregations used?

üîπ BACKOFFICE LOG:
   - Complete audit trail
   - Reasoning steps
   - Data sources


In [10]:
from tests.pipeline_rag_tests import test_rag_pipeline

# # 1. Test ALL UC-04 queries (default):
# test_rag_pipeline()
test_rag_pipeline([3, 4, 7, 8, 9, 10, 16, 17])

# # 2. Test ALL queries in silent mode:
# test_rag_pipeline(silent=True)

# 3. Test specific queries (verbose - see all LLM iterations):
# test_rag_pipeline([3, 7, 10])

# # 4. Test with shorter wait time:
# test_rag_pipeline(wait_seconds=5)



‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
üß™ RAG PIPELINE TEST - UC-04 QUERIES WITH CATEGORY RESOLUTION
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
üìã Running 8 selected queries

üìÇ Loading Q&A mapping...
‚úÖ Q&A mapping loaded

üìä Initializing dynamic expected calculator...
‚úÖ Calculator ready (reference date: 2025-12-01)

üìù QUERIES TO TEST:
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
   [

[{'id': 3,
  'status': '‚úÖ PASS',
  'passed': True,
  'checks': {'rag_called': True,
   'category_match': True,
   'clarity_correct': True,
   'uc04_detected': True,
   'categories_populated': True,
   'correct_category_passed': True,
   'dates_passed': True,
   'category_grounded': True,
   'start_date_grounded': True,
   'end_date_grounded': True,
   'fully_grounded': True,
   'tables_accessed': True,
   'filters_applied': True,
   'aggregations_used': True,
   'answer_validated': True}},
 {'id': 4,
  'status': '‚úÖ PASS',
  'passed': True,
  'checks': {'rag_called': True,
   'category_match': True,
   'clarity_correct': True,
   'uc04_detected': True,
   'categories_populated': True,
   'correct_category_passed': True,
   'dates_passed': True,
   'category_grounded': True,
   'start_date_grounded': True,
   'end_date_grounded': True,
   'fully_grounded': True,
   'tables_accessed': True,
   'filters_applied': True,
   'aggregations_used': True,
   'answer_validated': True}},
 {'id'

# **************************** END TO END PIPELINE TESTS _WITHOUT_ RAG *************************
# Full pipeline tests WITHOUT RAG (for UC01, UC02, UC03): LLM-1+LLM-2
***

**Silent Mode (silent=True (Default) => CLEAN OUTPUT)**

***

* Usage:
    * ‚úÖ Normal testing - you just want to see results
    * ‚úÖ Multiple queries - clean output is easier to read
    * ‚úÖ Production-like testing - focus on LLM outputs

* Functions:
    * Test CLEAR queries (no category RAG needed)
        * test_clear_queries_no_rag_trn_categories([1, 2],  wait_seconds=10)
        * All queries: [1, 2, 5, 6]
      
    * Test VAGUE queries (no category RAG needed)
        * pipeline_tests import test_vague_queries_no_rag_trn_categories()
        * All queries: [12, 13]
***

**Verbose mode (silent=False => see all tool calls)**

***

* Usage:
    * üêõ Debugging - something is failing and you need to see why
    * üîç Learning - you want to understand what happens internally
    * üõ†Ô∏è Development - testing new tools or graph nodes
* Function:
    * test_clear_queries_no_rag_trn_categories([12], wait_seconds=10)


In [11]:
from tests.pipeline_no_rag_tests import test_clear_queries_no_rag_trn_categories  # Test CLEAR queries (no category RAG needed)
from tests.pipeline_no_rag_tests import test_vague_queries_no_rag_trn_categories  # Test VAGUE queries (no category RAG needed)

### Silent Mode ("clean output")

In [12]:
# # Silent Mode: CLEAR non-category queries, test 2 queries:
test_clear_queries_no_rag_trn_categories([1, 2], wait_seconds=10)



üß™ TEST: CLEAR QUERIES WITHOUT RAG TRANSACTION CATEGORIES


Testing 2 queries: [1, 2]
Wait time: 10 seconds between queries
Silent mode: ON

üìÇ Loading Q&A mapping...
‚úÖ Q&A mapping loaded
üîß Building graph...
‚úÖ Graph compiled and ready



----------------------------------------------------------------------------------------------------

üìù QUERY 1/2 (ID: 1)

----------------------------------------------------------------------------------------------------

üë§ USER: "What is my current balance?"

üîπ LLM-1 ROUTER OUTPUT:
   ‚úì Clarity: CLEAR
   ‚úì Core UCs: ['UC-01']
   ‚úì Primary UC: UC-01
   ‚úì UC Operations:
      ‚Ä¢ UC-01: ['get_account_balance']
   ‚úì Complexity Axes: []
   ‚úì Resolved Dates: None (not temporal)
   ‚úì Resolved Categories: None (no categories)
   ‚úì Needed Tools: []
   ‚úì Confidence: high
   ‚úì Clarity Reason: Query requests a single direct value (current account balance) with no ambiguity. 'Current' is deterministic (means now/latest

[{'id': 1, 'status': '‚úÖ PASS', 'error': None},
 {'id': 2, 'status': '‚úÖ PASS', 'error': None}]

In [13]:
# # Silent Mode: VAGUE non-category queries, test 2 queries:
test_vague_queries_no_rag_trn_categories([12, 13], wait_seconds=10)



üß™ TEST: VAGUE QUERIES WITHOUT RAG TRANSACTION CATEGORIES


Testing 2 queries: [12, 13]
Wait time: 10 seconds between queries
Silent mode: ON

üìÇ Loading Q&A mapping...
‚úÖ Q&A mapping loaded
üîß Building graph...
‚úÖ Graph compiled and ready



----------------------------------------------------------------------------------------------------

üìù QUERY 1/2 (ID: 12)

----------------------------------------------------------------------------------------------------

üë§ USER: "Show me recent transactions"

üîπ LLM-1 ROUTER OUTPUT (VAGUE):
   ‚úì Clarity: VAGUE
   ‚úì Core UCs: ['UC-03', 'UC-01']
   ‚úì Primary UC: UC-03
   ‚ùì Clarifying Question:
      "What time period do you mean by 'recent'? For example: last 7 days, last 30 days, or this month?"
   ‚ùì Missing Info: ['time_window']
   ‚úì Clarity Reason: Query contains subjective temporal term 'recent' without definition in conversation_summary. This is a user-preference dependent term that requires clarification.

‚ú

[{'id': 12, 'status': '‚úÖ PASS', 'error': None},
 {'id': 13, 'status': '‚úÖ PASS', 'error': None}]

### Verbose mode (see ALL internal logs, including Tool Calling Loop)

In [None]:
# # Silent Mode: CLEAR non-category queries, test 2 queries:
# test_clear_queries_no_rag_trn_categories([1], silent=False) #, wait_seconds=10)

# # Silent Mode: VAGUE non-category queries, test 2 queries:
# test_vague_queries_no_rag_trn_categories([12, 13],  silent=False, wait_seconds=10)