# Context Assembler HITL Verification

This notebook verifies **Stage 3 Step 2: Context Assembler**.

### Verification Workflow:
1. **Retrieve:** Use the `DualRetriever` with expansion hyperparameters.
2. **Assemble:** Use the `ContextAssembler` to deduplicate and format the context.
3. **Verify:** Ensure the output is a clean Markdown string ready for the Reasoning Agent.

In [1]:
%load_ext autoreload
%autoreload 2

import sys
import os
import nest_asyncio
from dotenv import load_dotenv

# Add src to path
sys.path.append(os.path.abspath("../src"))
load_dotenv("../.env")

from venra.navigator import Navigator
from venra.retriever import DualRetriever
from venra.assembler import ContextAssembler
from venra.logging_config import logger

nest_asyncio.apply()

## 1. Setup Data Context

We assume `../data/10K_TD_test.pdf` has already been ingested.

In [2]:
PDF_PATH = "../data/10K_TD_test.pdf"
file_prefix = os.path.basename(PDF_PATH).replace(".pdf", "")

nav = Navigator(file_prefix=file_prefix)
retriever = DualRetriever(file_prefix=file_prefix)
assembler = ContextAssembler()

2026-02-03 15:21:21,833 - venra - INFO - Retriever loaded UFL with 253 rows.


## 2. Execute Full Retrieval Chain

We will run the query that previously showed missing data to see if expansion helps.

In [3]:
query = "How much the sale excluding acquisition increase compare to last year and what it was due to?"
plan = await nav.navigate(query)

print(f"--- Clues Generated --- Metrics: {plan.ufl_query.metric_keywords if plan.ufl_query else 'None'}")

# Expansion Settings
results = await retriever.retrieve(
    plan, 
    k=3, 
    include_all_chunks_for_ufl=True, # Option 1
    include_all_ufl_for_chunks=True  # Option 2
)

print(f"Retrieved {len(results['ufl_rows'])} rows and {len(results['text_chunks'])} chunks.")

2026-02-03 15:21:32,897 - venra - INFO - Navigating query: How much the sale excluding acquisition increase compare to last year and what it was due to?
2026-02-03 15:21:33,544 - venra - INFO - Plan generated. Reasoning: The user is asking for the increase in Net Sales excluding Acquisition Sales compared to last year and what it was due to. The entity ID is ID_TDG, which corresponds to TransDigm Group Incorporated. The metric keywords are Net Sales, Acquisition Sales, and Increase in Organic Sales. The years are 2025 and 2024. The nuance focus is null, indicating no specific nuance filtering is required.
--- Clues Generated --- Metrics: ['Net Sales', 'Acquisition Sales', 'Increase in Organic Sales']
2026-02-03 15:21:33,545 - venra - INFO - Starting retrieval for query: Net Sales excluding Acquisition Sales Increase Com...
2026-02-03 15:21:34,185 - venra - INFO - Retrieval complete: 20 UFL rows, 4 text chunks.
Retrieved 20 rows and 4 chunks.


## 3. Assemble and Inspect Output

This is what the final LLM will see.

In [4]:
final_context = assembler.assemble(results)

print("=== FINAL ASSEMBLED CONTEXT ===")
print(final_context)

=== FINAL ASSEMBLED CONTEXT ===

# VERIFIABLE FACT LEDGER (UFL) ROWS
| row_id                           | metric_name                          |        value | unit    | period             | nuance_note   | source_chunk_id                      |
|:---------------------------------|:-------------------------------------|-------------:|:--------|:-------------------|:--------------|:-------------------------------------|
| 7d339b31d9fab3372f267495734f8685 | Net Sales                            |    5e+08     | USD     | 2025               |               | 9d7492d3-c7c0-44b7-b666-f112cb7ff135 |
| c7ff6dac0728f1b924f80b5f1f504c56 | Net Sales                            |    5e+08     | USD     | 2024               |               | 9d7492d3-c7c0-44b7-b666-f112cb7ff135 |
| 9253ddcec69200773381fc5df22b0a84 | Acquisition Sales                    |  nan         |         | 2025               |               | 9d7492d3-c7c0-44b7-b666-f112cb7ff135 |
| 8c51683e431208c414e06943746ab253 | Acquisiti

## 4. Verification Checklist
- [ ] Are there any duplicate `CHUNK_ID` headers?
- [ ] Is the UFL table formatted as readable Markdown?
- [ ] Do the `source_chunk_id` values in the table match the headers in the text section?
- [ ] Did the expansion provide the text chunk containing the '$615 million' answer?