# Reasoning Agent HITL Verification

This notebook verifies **Stage 3 Step 3: The Code/Reasoning Agent**.

### Verification Goals:
1. **Deterministic Math:** Does it use Python to calculate deltas or percentages?
2. **Self-Awareness:** Does it warn the user if data is missing or if it's guessing?
3. **Nuance & Citations:** Does it cite `row_id` and `chunk_id` correctly?

In [1]:
%load_ext autoreload
%autoreload 2

import sys
import os
import nest_asyncio
from dotenv import load_dotenv

sys.path.append(os.path.abspath("../src"))
load_dotenv("../.env")

from venra.navigator import Navigator
from venra.retriever import DualRetriever
from venra.assembler import ContextAssembler
from venra.agent import ReasoningAgent

nest_asyncio.apply()

In [2]:
PDF_PATH = "../data/10K_TD_test.pdf"
file_prefix = os.path.basename(PDF_PATH).replace(".pdf", "")

nav = Navigator(file_prefix=file_prefix)
retriever = DualRetriever(file_prefix=file_prefix)
assembler = ContextAssembler()
agent = ReasoningAgent()

2026-02-03 17:52:42,287 - venra - INFO - Retriever loaded UFL with 253 rows.


## 2. Complex Query (Math + Text)

"How much the sale excluding acquisition increase compare to last year and what it was due to?"

In [3]:
query = "How much the sale excluding acquisition increase compare to last year and what it was due to?"

# 1. Navigate
plan = await nav.navigate(query)

# 2. Retrieve
results = await retriever.retrieve(plan, k=3, include_all_chunks_for_ufl=True, include_all_ufl_for_chunks=True)

# 3. Assemble
context = assembler.assemble(results)

# 4. Answer
full_result = await agent.answer(query, context)
response = full_result["final_response"]
reasoning = full_result["reasoning"]
code_result = full_result["code_result"]

print("=== AGENT THOUGHT PROCESS ===")
print(f"PLAN: {reasoning.plan}")
if reasoning.requires_math:
    print(f"\n--- EXECUTED PYTHON ---")
    print(reasoning.python_code)
    print(f"\n--- CODE OUTPUT ---")
    print(code_result['output'] if code_result else 'None')

print("\n=== FINAL AGENT RESPONSE ===")
print(f"ANSWER:    {response.answer}")
print(f"NUANCES:   {response.nuances}")
print(f"CITATIONS: {response.citations}")
print(f"GROUNDED:  {response.groundedness_score}")
if response.is_self_aware_warning:
    print("!!! WARNING: Agent is not fully confident in this answer.")

2026-02-03 17:38:11,716 - venra - INFO - Navigating query: How much the sale excluding acquisition increase compare to last year and what it was due to?
2026-02-03 17:38:12,459 - venra - INFO - Plan generated. Reasoning: The user is asking for the increase in Net Sales excluding Acquisition Sales compared to last year and what it was due to. This requires looking at the Net Sales and Acquisition Sales metrics for both 2025 and 2024, and then calculating the increase in Net Sales excluding Acquisition Sales. The nuance focus is null, indicating that the user is looking for the raw data without any adjustments.
2026-02-03 17:38:12,459 - venra - INFO - Starting retrieval for query: Net Sales excluding Acquisition Sales Increase Com... (k=3)
2026-02-03 17:38:13,390 - venra - INFO - Retrieval complete: 22 UFL rows, 5 text chunks.
2026-02-03 17:38:13,402 - venra - INFO - Agent processing query (Pass 1: Kimi Reasoning): How much the sale excluding acquisition increase compare to last year and

## 2. Missing Data Test

"What was the CEO's favorite color in 2025?" (Should trigger self-awareness/not found)

In [4]:
query = "What was the CEO's favorite color in 2025?"
plan = await nav.navigate(query)
results = await retriever.retrieve(plan, k=2)
context = assembler.assemble(results)
full_result = await agent.answer(query, context)
response = full_result["final_response"]

print("=== AGENT RESPONSE (MISSING DATA) ===")
print(f"ANSWER:    {response.answer}")
print(f"GROUNDED:  {response.groundedness_score}")

2026-02-03 17:40:04,187 - venra - INFO - Navigating query: What was the CEO's favorite color in 2025?
2026-02-03 17:40:04,679 - venra - INFO - Plan generated. Reasoning: The question asks for a non-financial, non-quantitative piece of information about the CEO, which is not typically found in a 10-K document. This suggests that the answer may not be available in the provided schema.
2026-02-03 17:40:04,680 - venra - INFO - Starting retrieval for query: CEO's favorite color in 2025... (k=2)
2026-02-03 17:40:05,541 - venra - INFO - Retrieval complete: 7 UFL rows, 2 text chunks.
2026-02-03 17:40:05,546 - venra - INFO - Agent processing query (Pass 1: Kimi Reasoning): What was the CEO's favorite color in 2025?
2026-02-03 17:40:25,266 - venra - INFO - Reasoning Plan: 1. Search the UFL table for any metric related to 'CEO', 'favorite', or 'color' - no such entries found. 2. Search the Source Text Chunks (f1bb65a4-9938-465d-8a10-db1c7f16795f and 0e2eb9eb-0bce-4201-a212-1b4fc07e6d97) for any m

In [3]:
query = "How much the sale decrease overal compare to last year and why it happend?"

# 1. Navigate
plan = await nav.navigate(query)

# 2. Retrieve
results = await retriever.retrieve(plan, k=3, include_all_chunks_for_ufl=True, include_all_ufl_for_chunks=True)

# 3. Assemble
context = assembler.assemble(results)

# 4. Answer
full_result = await agent.answer(query, context)
response = full_result["final_response"]
reasoning = full_result["reasoning"]
code_result = full_result["code_result"]

print("=== AGENT THOUGHT PROCESS ===")
print(f"PLAN: {reasoning.plan}")
if reasoning.requires_math:
    print(f"\n--- EXECUTED PYTHON ---")
    print(reasoning.python_code)
    print(f"\n--- CODE OUTPUT ---")
    print(code_result['output'] if code_result else 'None')

print("\n=== FINAL AGENT RESPONSE ===")
print(f"ANSWER:    {response.answer}")
print(f"NUANCES:   {response.nuances}")
print(f"CITATIONS: {response.citations}")
print(f"GROUNDED:  {response.groundedness_score}")
if response.is_self_aware_warning:
    print("!!! WARNING: Agent is not fully confident in this answer.")

2026-02-03 17:52:47,704 - venra - INFO - Navigating query: How much the sale decrease overal compare to last year and why it happend?
2026-02-03 17:52:48,266 - venra - INFO - Plan generated. Reasoning: The user is asking for the overall decrease in sales compared to last year and the reason behind it. This requires looking at the 'Net Sales' metric for both the current and prior years, and then finding the reason for the decrease.
2026-02-03 17:52:48,267 - venra - INFO - Starting retrieval for query: Net Sales Decrease of 5.6% Compared to Last Year.... (k=3)
2026-02-03 17:52:48,945 - venra - INFO - Keyword Boost Search: 'Net Sales Decrease Last Year Reason' (k=5)
2026-02-03 17:52:49,699 - venra - INFO - Retrieval complete: 23 UFL rows, 5 text chunks.
2026-02-03 17:52:49,717 - venra - INFO - Agent processing query (Pass 1: Kimi Reasoning): How much the sale decrease overal compare to last year and why it happend?
2026-02-03 17:54:05,389 - venra - INFO - Reasoning Plan: 1. Identify the c