# Knowledge Synthesis Demo (Stage 2)

This notebook demonstrates the transformation of the raw DOM (Stage 1) into the Universal Fact Ledger (UFL).
It covers:
1. **Entity Resolution:** Identifying 'TransDigm' from the cover page.
2. **Table Melting:** Converting Markdown tables to structured UFLRows.
3. **Text Extraction:** Using Llama-3 via Instructor to extract facts from text.

In [1]:
%load_ext autoreload
%autoreload 2

import sys
import os
import nest_asyncio

# Add src to path
sys.path.append(os.path.abspath("../src"))

from venra.ingestion import StructuralParser
from venra.synthesis import EntityResolver, TableMelter, TextSynthesizer
from venra.models import BlockType
from venra.logging_config import logger

nest_asyncio.apply()



## 1. Load Data (Stage 1 Output)

In [2]:
DOM_PATH = "../data/processed/10k_test_dom.pkl"
blocks = StructuralParser.load_dom(DOM_PATH)
print(f"Loaded {len(blocks)} blocks.")

Loaded 56 blocks.


## 2. Entity Resolution

In [3]:
# Initialize Resolver
resolver = EntityResolver()

# Run Resolution (Simulated call to Groq if key is present)
entity_meta = await resolver.resolve_entity(blocks)
print(f"Canonical ID: {entity_meta.canonical_id}")
print(f"Official Name: {entity_meta.official_name}")
print(f"Aliases: {entity_meta.aliases}")

2026-01-30 23:30:01,674 - venra - INFO - Resolving Entity from Cover Page context...
2026-01-30 23:30:02,287 - venra - INFO - Resolved Entity: ID_TDG (TransDigm Group Incorporated)
Canonical ID: ID_TDG
Official Name: TransDigm Group Incorporated
Aliases: ['The Company', 'TransDigm']


## 3. Table Melting

In [4]:
melter = TableMelter(entity_id=entity_meta.canonical_id)
table_rows = []

for block in blocks:
    if block.block_type == BlockType.TABLE:
        print(f"Melting table in section: {block.section_path}")
        rows = melter.melt(block)
        table_rows.extend(rows)
        # Print first few for demo
        if rows:
            for r in rows[:3]:
                print(f"  - {r.metric_name} ({r.period}): {r.value} {r.unit}")

print(f"Total Table Rows Extracted: {len(table_rows)}")

Melting table in section: ['Securities registered pursuant to Section 12(b) of the Act:']
  - Common Stock, $0.01 par value (Trading Symbol:): None USD
Melting table in section: ['Securities registered pursuant to Section 12(g) of the Act:']
  - Accelerated Filer (☒): None USD
  - Non-Accelerated Filer (☒): None USD
  - Smaller Reporting Company (☒): None USD
Melting table in section: ['COMPARISON OF 5 YEAR CUMULATIVE TOTAL RETURN*']
Melting table in section: ['COMPARISON OF 5 YEAR CUMULATIVE TOTAL RETURN*']
  - TransDigm Group Inc. (9/30/2020): 100.0 USD
  - TransDigm Group Inc. (9/30/2021): 131.46 USD
  - TransDigm Group Inc. (9/30/2022): 113.58 USD
Melting table in section: ['Results of Operations']
  - Fiscal Years Ended September 30, (Unnamed: 1): 2025.0 USD
  - Net sales (Unnamed: 1): 8831.0 USD
  - Cost of sales (Unnamed: 1): 3520.0 USD
Melting table in section: ['Total Company']
  - Organic sales (Fiscal Years Ended): 8510.0 USD
  - Acquisition sales (Fiscal Years Ended): 321.0

## 4. Text Extraction (SLM)

In [5]:
text_synth = TextSynthesizer(entity_id=entity_meta.canonical_id)
text_rows = []

# Only process a few text blocks for demo speed
demo_blocks = [b for b in blocks if b.block_type == BlockType.TEXT and len(b.content) > 100][:3]

for block in demo_blocks:
    print(f"Extracting from text in: {block.section_path}")
    print(f"Preview: {block.content[:100]}...")
    
    facts = await text_synth.extract_facts(block)
    text_rows.extend(facts)
    
    for f in facts:
        print(f"  -> Extracted: {f.metric_name}: {f.value} (Conf: {f.confidence})")
        if f.nuance_note:
            print(f"     Nuance: {f.nuance_note}")
            
print(f"Total Text Facts Extracted: {len(text_rows)}")

Extracting from text in: ['41-2101738']
Preview: (I.R.S. Employer Identification No.)

1350 Euclid Avenue, Suite 1600, Cleveland, Ohio 44115

(Addres...
  -> Extracted: Address of principal executive offices: None (Conf: 1.0)
     Nuance: 1350 Euclid Avenue, Suite 1600, Cleveland, Ohio 44115
  -> Extracted: Zip Code: None (Conf: 1.0)
     Nuance: 44115
Extracting from text in: ['(216) 706-2960']
Preview: (Registrant’s telephone number, including area code)

(Former name, former address and former fiscal...
  -> Extracted: Former Name: None (Conf: 1.0)
     Nuance: Former name, former address and former fiscal year, if changed since last report.
  -> Extracted: Former Address: None (Conf: 1.0)
     Nuance: Former name, former address and former fiscal year, if changed since last report.
  -> Extracted: Former Fiscal Year: None (Conf: 1.0)
     Nuance: Former name, former address and former fiscal year, if changed since last report.
  -> Extracted: Registrant’s Telephone Number: None (Co