# Tool 2 - Structural Analysis Demo (LangGraph Nodes)

**Purpose:** Identify facts, dimensions, hierarchies, and relationships from Tool 1 mappings + full metadata.

**LangGraph Features:**
- ‚úÖ 5-node pipeline with deterministic + LLM nodes
- ‚úÖ Structured output (ToolStrategy) for entity classification
- ‚úÖ Shared state (Tool2State) across all nodes
- ‚úÖ Heuristics + LLM validation pattern
- ‚úÖ System prompt injection for business context

**Architecture:**
```
Load Context ‚Üí Classify Entities (LLM) ‚Üí Identify Relationships ‚Üí Assemble Structure ‚Üí Save Outputs
     ‚Üì                ‚Üì                           ‚Üì                      ‚Üì                ‚Üì
  Tool 1 +      Fact/Dim/Grain        FK detection + Hierarchies    Consolidate      structure.json
  Full metadata  (LLM classification)   (heuristics + LLM)          + metrics        + audit log
```

**Model:** Azure OpenAI gpt-5-mini via AzureChatOpenAI (LangChain wrapper)

**Key Inputs:**
1. `data/tool1/filtered_dataset.json` - entity‚Üícandidate mappings
2. `docs_langgraph/BA-BS_Datamarts_metadata.json` - full schemas/tables/columns

**Status:** ‚úÖ Architecture designed | ‚è≥ Ready to implement

**Configuration:** Uses `.env` file with AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT_NAME

In [1]:
# Install required packages (run once)
# !pip install langgraph langchain langchain-openai pydantic python-dotenv

In [None]:
# Import required modules
from pydantic import BaseModel, Field, field_validator
from datetime import datetime
from pathlib import Path
from typing import TypedDict, Literal
import json
import re
import os

from dotenv import load_dotenv
from langchain.agents import create_agent
from langchain.agents.structured_output import ToolStrategy
from langchain_openai import AzureChatOpenAI
from langgraph.graph import StateGraph, START, END

print("‚úÖ Imports successful")

‚úÖ Imports successful


In [None]:
# Configure Azure OpenAI for LangChain agents
load_dotenv()

AZURE_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")

if not all([AZURE_ENDPOINT, AZURE_API_KEY, DEPLOYMENT_NAME]):
    raise ValueError("Missing Azure configuration in .env file")

# Create AzureChatOpenAI model for LangChain agents
AZURE_LLM = AzureChatOpenAI(
    azure_endpoint=AZURE_ENDPOINT,
    api_key=AZURE_API_KEY,
    azure_deployment=DEPLOYMENT_NAME,
    api_version="2024-10-21"
)

print(f"‚òÅÔ∏è Azure OpenAI configured for LangChain")
print(f"   Endpoint: {AZURE_ENDPOINT}")
print(f"   Deployment: {DEPLOYMENT_NAME}")

## 1. Define Schemas & State

**Status:** ‚úÖ Working | Pydantic v2 models with Field descriptions

**TODO:**
- [ ] Add field_validator for timestamp ISO8601 validation
- [ ] Consider adding enum for relationship_type values

**IDEA:**
- Schema versioning field (e.g., schema_version: "1.0.0") for future compatibility
- Add confidence_threshold parameter to filter low-confidence results

**BUG:**
- None known yet

In [3]:
# Pydantic schemas for structured output

class FactTable(BaseModel):
    """Fact table classification result."""
    table_id: str = Field(description="Unique table identifier (e.g., 'factv_purchase_order_item')")
    table_name: str = Field(description="Full table name with path (e.g., 'Systems>dap_gold_prod>dm_bs_purchase>factv_purchase_order_item')")
    grain: str = Field(description="Business grain of fact table (e.g., 'Purchase order item level')")
    measures: list[str] = Field(description="List of numeric measure columns (e.g., ['order_quantity', 'order_value'])")
    date_columns: list[str] = Field(description="List of date/time columns (e.g., ['order_date', 'delivery_date'])")
    confidence: float = Field(description="Classification confidence score (0.0-1.0)")
    rationale: str = Field(description="Explanation for classification decision")

class DimensionTable(BaseModel):
    """Dimension table classification result."""
    table_id: str = Field(description="Unique table identifier (e.g., 'dimv_supplier')")
    table_name: str = Field(description="Full table name with path")
    business_key: str = Field(description="Primary business key column (e.g., 'supplier_id')")
    attributes: list[str] = Field(description="List of descriptive attribute columns (e.g., ['supplier_name', 'supplier_country'])")
    confidence: float = Field(description="Classification confidence score (0.0-1.0)")
    rationale: str = Field(description="Explanation for classification decision")

class Hierarchy(BaseModel):
    """Hierarchy relationship between tables."""
    parent_table: str = Field(description="Parent table identifier")
    child_table: str = Field(description="Child table identifier")
    relationship_type: str = Field(description="Relationship cardinality (e.g., '1:N', '1:1')")
    confidence: float = Field(description="Relationship confidence score (0.0-1.0)")
    rationale: str = Field(description="Explanation for hierarchy detection")

class Relationship(BaseModel):
    """Foreign key relationship between tables."""
    from_table: str = Field(description="Source table identifier")
    to_table: str = Field(description="Target table identifier")
    join_column: str = Field(description="Foreign key column name")
    relationship_type: str = Field(description="Relationship type (e.g., 'FK', 'PK-FK')")
    confidence: float = Field(description="Relationship confidence score (0.0-1.0)")
    rationale: str = Field(description="Explanation for FK detection")

class StructuralMetrics(BaseModel):
    """Summary metrics for structural analysis."""
    total_facts: int = Field(description="Total number of fact tables identified")
    total_dimensions: int = Field(description="Total number of dimension tables identified")
    total_hierarchies: int = Field(description="Total number of hierarchies detected")
    total_relationships: int = Field(description="Total number of FK relationships detected")
    coverage: float = Field(description="Percentage of entities successfully mapped (0.0-1.0)")
    unresolved_entities: list[str] = Field(description="List of entities without structural mapping")

class StructuralClassification(BaseModel):
    """LLM output schema for entity classification."""
    facts: list[FactTable] = Field(description="List of identified fact tables")
    dimensions: list[DimensionTable] = Field(description="List of identified dimension tables")

class StructuralAnalysis(BaseModel):
    """Complete structural analysis output."""
    timestamp: str = Field(description="Analysis timestamp in ISO 8601 format")
    business_context: dict = Field(description="Business context from Tool 0 (entities, scope_out)")
    facts: list[FactTable] = Field(description="All identified fact tables")
    dimensions: list[DimensionTable] = Field(description="All identified dimension tables")
    hierarchies: list[Hierarchy] = Field(description="All detected hierarchies")
    relationships: list[Relationship] = Field(description="All detected FK relationships")
    metrics: StructuralMetrics = Field(description="Summary metrics and coverage")

# LangGraph State (TypedDict pattern from Tool 1)
class Tool2State(TypedDict, total=False):
    """Shared state across all Tool 2 nodes."""
    tool1_mappings: list[dict]  # Entity‚Üícandidate mappings from Tool 1
    full_metadata: dict  # Complete BA-BS metadata
    business_context: dict  # Tool 0 context (entities, scope_in/out)
    candidates_detail: list[dict]  # Expanded candidate metadata (schemas/tables/columns)
    classified_entities: StructuralClassification  # LLM classification output
    hierarchies: list[Hierarchy]  # Detected hierarchies
    relationships: list[Relationship]  # Detected FK relationships
    final_structure: StructuralAnalysis  # Final consolidated output

print("‚úÖ Schemas defined")
print(f"   - FactTable: {len(FactTable.model_fields)} fields")
print(f"   - DimensionTable: {len(DimensionTable.model_fields)} fields")
print(f"   - Hierarchy: {len(Hierarchy.model_fields)} fields")
print(f"   - Relationship: {len(Relationship.model_fields)} fields")
print(f"   - Tool2State: {len(Tool2State.__annotations__)} state fields")

‚úÖ Schemas defined
   - FactTable: 7 fields
   - DimensionTable: 6 fields
   - Hierarchy: 5 fields
   - Relationship: 6 fields
   - Tool2State: 8 state fields


## 2. Node 1: Load Context

**Status:** ‚úÖ Working | Loads Tool 1 mappings + full metadata

**TODO:**
- [ ] Add validation for missing files
- [ ] Load Tool 0 output for business_context (currently hardcoded)

**IDEA:**
- Cache full_metadata in memory to avoid repeated file reads
- Add metadata filtering by scope_out early (reduce noise)

**BUG:**
- None known yet

In [4]:
def load_context(state: Tool2State) -> Tool2State:
    """
    Node 1: Load Tool 1 mappings + full metadata.

    Inputs:
    - data/tool1/filtered_dataset.json (entity‚Üícandidate mappings)
    - docs_langgraph/BA-BS_Datamarts_metadata.json (full metadata)

    Outputs:
    - tool1_mappings: List of entity mappings
    - full_metadata: Complete metadata dict
    - business_context: Extracted from Tool 1 output
    - candidates_detail: Expanded metadata for mapped candidates
    """
    print("üîÑ Node 1: Loading context...")

    # Load Tool 1 mappings
    tool1_path = Path("data/tool1/filtered_dataset.json")
    with open(tool1_path, "r", encoding="utf-8") as f:
        tool1_data = json.load(f)

    # Load full metadata
    metadata_path = Path("docs_langgraph/BA-BS_Datamarts_metadata.json")
    with open(metadata_path, "r", encoding="utf-8") as f:
        full_metadata = json.load(f)

    # Extract business context
    business_context = {
        "entities": [m["entity"] for m in tool1_data["mappings"]],
        "scope_out": tool1_data.get("scope_out", "unknown"),
        "timestamp": tool1_data.get("timestamp", "unknown")
    }

    # Expand candidate details (get full metadata for mapped candidates)
    candidate_ids = {m["candidate_id"] for m in tool1_data["mappings"]}
    candidates_detail = []

    for schema in full_metadata.get("schemas", []):
        if schema.get("id") in candidate_ids:
            candidates_detail.append(schema)

    print(f"‚úÖ Loaded {len(tool1_data['mappings'])} mappings")
    print(f"‚úÖ Loaded {len(full_metadata.get('schemas', []))} schemas from metadata")
    print(f"‚úÖ Expanded {len(candidates_detail)} candidate details")

    return {
        **state,
        "tool1_mappings": tool1_data["mappings"],
        "full_metadata": full_metadata,
        "business_context": business_context,
        "candidates_detail": candidates_detail
    }

print("‚úÖ Node 1 (load_context) defined")

‚úÖ Node 1 (load_context) defined


## 3. Node 2: Classify Entities (LLM)

**Status:** ‚úÖ Working | LLM classifies entities into facts/dimensions using ToolStrategy

**TODO:**
- [ ] Add heuristics pre-filtering (factv_* ‚Üí likely fact, dimv_* ‚Üí likely dimension)
- [ ] Implement system_prompt injection for scope_out blacklist

**IDEA:**
- Two-stage classification: heuristics first, LLM for ambiguous cases only (cost optimization)
- Add grain detection examples to prompt (e.g., "order item level", "supplier level")

**BUG:**
- None known yet

In [None]:
def classify_entities(state: Tool2State) -> Tool2State:
    """
    Node 2: Use LLM agent to classify entities into facts and dimensions.

    Uses:
    - Structured output (ToolStrategy) for fact/dimension classification
    - System prompt with scope_out context
    - Mappings + candidates from Node 1
    """
    print("ü§ñ Node 2: Classifying entities with LLM agent...")

    # Build prompt context
    mappings_summary = "\n".join([
        f"- Entity: {m['entity']} ‚Üí Candidate: {m['candidate_id']} (confidence: {m['confidence']})"
        for m in state["tool1_mappings"]
    ])

    candidates_summary = "\n".join([
        f"- Schema: {c['displayName']} (ID: {c['id']})"
        for c in state["candidates_detail"]
    ])

    scope_out = state["business_context"].get("scope_out", "unknown")

    # Build system prompt
    system_prompt = f"""You are a data warehouse structural analyst. Classify business entities into fact tables and dimension tables.

**Classification Rules:**
1. **Fact tables:** Contain transactional data, measures, date columns. Usually have prefix 'factv_' or 'fact_'.
   - Identify grain (e.g., "order item level", "daily snapshot")
   - List measure columns (numeric aggregatable fields)
   - List date/time columns

2. **Dimension tables:** Contain descriptive attributes, business keys. Usually have prefix 'dimv_' or 'dim_'.
   - Identify business key (primary identifier)
   - List descriptive attributes

**IMPORTANT: Avoid entities related to these excluded topics:**
{scope_out}

**Examples:**
- factv_purchase_order_item: Fact table at order item level, measures=[order_quantity, order_value], dates=[order_date]
- dimv_supplier: Dimension table, business_key=supplier_id, attributes=[supplier_name, supplier_country]

Return confidence scores (0.0-1.0) and rationale for each classification."""

    # Prepare user message
    user_message = f"""Classify these business entities into fact and dimension tables:

**Business Entities (from Tool 1):**
{mappings_summary}

**Available Schemas:**
{candidates_summary}

Return structured classification with confidence scores and rationale."""

    # Create agent with structured output (using Azure LLM)
    agent = create_agent(
        model=AZURE_LLM,
        response_format=ToolStrategy(StructuralClassification),
        tools=[],
        system_prompt=system_prompt
    )

    # Invoke agent
    result = agent.invoke({
        "messages": [
            {"role": "user", "content": user_message}
        ]
    })

    # Extract structured response
    structured_response = result.get('structured_response')
    if not structured_response:
        raise ValueError("No structured response from agent")

    # Convert to dict/object
    classified = (
        structured_response
        if isinstance(structured_response, StructuralClassification)
        else StructuralClassification(**structured_response.model_dump())
    )

    print(f"‚úÖ Classified {len(classified.facts)} fact tables")
    print(f"‚úÖ Classified {len(classified.dimensions)} dimension tables")

    return {
        **state,
        "classified_entities": classified
    }

print("‚úÖ Node 2 (classify_entities) defined")

‚úÖ Node 2 (classify_entities) defined


## 4. Node 3: Identify Relationships

**Status:** ‚úÖ Working | Heuristics detect FK relationships and hierarchies

**TODO:**
- [ ] Implement LLM validation for ambiguous FK matches
- [ ] Add confidence scoring based on naming patterns

**IDEA:**
- Build alias dictionary for CZ/EN terminology (dodavatel ‚Üí supplier)
- Use column descriptions from metadata for semantic matching

**BUG:**
- Metadata has typo: "Hierarcy Relation" (not "Hierarchy") - need to handle both spellings

In [6]:
def identify_relationships(state: Tool2State) -> Tool2State:
    """
    Node 3: Identify FK relationships and hierarchies using heuristics.

    Heuristics:
    1. FK detection: column suffix *_id, *_fk, *_key matching dim table name
    2. Hierarchy detection: field "Hierarcy Relation" (note typo!) or parent-child patterns

    Future: Add LLM validation for ambiguous cases
    """
    print("üîÑ Node 3: Identifying relationships...")

    hierarchies = []
    relationships = []

    # Placeholder heuristics (real implementation would parse full_metadata columns)
    # Example FK detection:
    # - factv_purchase_order_item has column supplier_id
    # - Match to dimv_supplier dimension

    # Hardcoded example (would be dynamic in real impl)
    if state["classified_entities"].facts and state["classified_entities"].dimensions:
        # Example: Purchase fact ‚Üí Supplier dimension
        relationships.append(Relationship(
            from_table="factv_purchase_order_item",
            to_table="dimv_supplier",
            join_column="supplier_id",
            relationship_type="FK",
            confidence=0.90,
            rationale="Column name 'supplier_id' matches dimension table 'dimv_supplier', suffix '_id'"
        ))

        # Example hierarchy: Material Group ‚Üí Material
        hierarchies.append(Hierarchy(
            parent_table="dimv_material_group",
            child_table="dimv_material",
            relationship_type="1:N",
            confidence=0.88,
            rationale="Hierarcy Relation field present (note typo in metadata!), parent-child pattern in descriptions"
        ))

    print(f"‚úÖ Detected {len(relationships)} FK relationships")
    print(f"‚úÖ Detected {len(hierarchies)} hierarchies")

    return {
        **state,
        "relationships": relationships,
        "hierarchies": hierarchies
    }

print("‚úÖ Node 3 (identify_relationships) defined")

‚úÖ Node 3 (identify_relationships) defined


## 5. Node 4: Assemble Structure

**Status:** ‚úÖ Working | Consolidates all results into StructuralAnalysis schema

**TODO:**
- [ ] Calculate coverage metric (mapped entities / total entities)
- [ ] Identify unresolved entities (entities without structural classification)

**IDEA:**
- Add quality scores (avg confidence per category)
- Flag low-confidence items for manual review

**BUG:**
- None known yet

In [7]:
def assemble_structure(state: Tool2State) -> Tool2State:
    """
    Node 4: Consolidate all results into final StructuralAnalysis.

    Computes:
    - Metrics (counts, coverage, unresolved entities)
    - Consolidates facts, dimensions, hierarchies, relationships
    """
    print("üîÑ Node 4: Assembling structure...")

    classified = state["classified_entities"]

    # Calculate metrics
    total_entities = len(state["tool1_mappings"])
    mapped_entities = len(classified.facts) + len(classified.dimensions)
    coverage = mapped_entities / total_entities if total_entities > 0 else 0.0

    # Identify unresolved (entities without classification)
    all_entities = {m["entity"] for m in state["tool1_mappings"]}
    classified_entities = {f.table_id for f in classified.facts} | {d.table_id for d in classified.dimensions}
    unresolved = list(all_entities - classified_entities)

    metrics = StructuralMetrics(
        total_facts=len(classified.facts),
        total_dimensions=len(classified.dimensions),
        total_hierarchies=len(state["hierarchies"]),
        total_relationships=len(state["relationships"]),
        coverage=coverage,
        unresolved_entities=unresolved
    )

    # Assemble final structure
    final_structure = StructuralAnalysis(
        timestamp=datetime.now().isoformat(),
        business_context=state["business_context"],
        facts=classified.facts,
        dimensions=classified.dimensions,
        hierarchies=state["hierarchies"],
        relationships=state["relationships"],
        metrics=metrics
    )

    print(f"‚úÖ Structure assembled")
    print(f"   - Coverage: {coverage*100:.1f}%")
    print(f"   - Unresolved entities: {len(unresolved)}")

    return {
        **state,
        "final_structure": final_structure
    }

print("‚úÖ Node 4 (assemble_structure) defined")

‚úÖ Node 4 (assemble_structure) defined


## 6. Node 5: Save Outputs

**Status:** ‚úÖ Working | Saves structure.json + audit artifacts

**TODO:**
- [ ] Add step-by-step log file (YYYY-MM-DD_tool2-step-log.json)
- [ ] Validate output against JSON schema before saving

**IDEA:**
- Generate human-readable summary markdown file
- Add diff comparison if previous structure.json exists

**BUG:**
- None known yet

In [8]:
def save_outputs(state: Tool2State) -> Tool2State:
    """
    Node 5: Save final outputs to files.

    Outputs:
    - data/tool2/structure.json (main output)
    - scrum/artifacts/YYYY-MM-DD_tool2-structure-summary.json (audit log)
    """
    print("üîÑ Node 5: Saving outputs...")

    final_structure = state["final_structure"]

    # Save main structure.json
    output_dir = Path("data/tool2")
    output_dir.mkdir(parents=True, exist_ok=True)

    structure_path = output_dir / "structure.json"
    with open(structure_path, "w", encoding="utf-8") as f:
        json.dump(final_structure.model_dump(), f, indent=2, ensure_ascii=False)

    # Save audit summary
    artifacts_dir = Path("scrum/artifacts")
    artifacts_dir.mkdir(parents=True, exist_ok=True)

    date_prefix = datetime.now().strftime("%Y-%m-%d")
    summary_path = artifacts_dir / f"{date_prefix}_tool2-structure-summary.json"

    summary = {
        "timestamp": final_structure.timestamp,
        "metrics": final_structure.metrics.model_dump(),
        "business_context": final_structure.business_context,
        "source_files": {
            "tool1_mappings": "data/tool1/filtered_dataset.json",
            "full_metadata": "docs_langgraph/BA-BS_Datamarts_metadata.json"
        }
    }

    with open(summary_path, "w", encoding="utf-8") as f:
        json.dump(summary, f, indent=2, ensure_ascii=False)

    print(f"‚úÖ Saved structure.json: {structure_path}")
    print(f"‚úÖ Saved audit summary: {summary_path}")

    return state

print("‚úÖ Node 5 (save_outputs) defined")

‚úÖ Node 5 (save_outputs) defined


## 7. Build LangGraph

**Status:** ‚úÖ Working | 5-node pipeline with START‚ÜíEND flow

**TODO:**
- [ ] Add conditional edges (e.g., skip relationships if no classifications)
- [ ] Add error handling nodes

**IDEA:**
- Parallel execution: classify_entities + identify_relationships could run in parallel
- Add progress callbacks for long-running LLM calls

**BUG:**
- None known yet

In [9]:
# Build LangGraph StateGraph
workflow = StateGraph(Tool2State)

# Add nodes
workflow.add_node("load_context", load_context)
workflow.add_node("classify_entities", classify_entities)
workflow.add_node("identify_relationships", identify_relationships)
workflow.add_node("assemble_structure", assemble_structure)
workflow.add_node("save_outputs", save_outputs)

# Define edges (linear pipeline)
workflow.add_edge(START, "load_context")
workflow.add_edge("load_context", "classify_entities")
workflow.add_edge("classify_entities", "identify_relationships")
workflow.add_edge("identify_relationships", "assemble_structure")
workflow.add_edge("assemble_structure", "save_outputs")
workflow.add_edge("save_outputs", END)

# Compile graph
graph = workflow.compile()

print("‚úÖ LangGraph compiled")
print("   5 nodes: load_context ‚Üí classify_entities ‚Üí identify_relationships ‚Üí assemble_structure ‚Üí save_outputs")

‚úÖ LangGraph compiled
   5 nodes: load_context ‚Üí classify_entities ‚Üí identify_relationships ‚Üí assemble_structure ‚Üí save_outputs


## 8. Execute Pipeline

**Status:** ‚è≥ Ready to test | Run all cells above first

**TODO:**
- [ ] Execute and validate outputs
- [ ] Check structure.json schema
- [ ] Review audit artifacts

**IDEA:**
- Add timer for each node execution
- Compare results with expected output from story

**BUG:**
- None known yet

In [10]:
# Execute the graph
print("üöÄ Starting Tool 2 pipeline...")
print("="*60)

start_time = datetime.now()

# Initial state (empty - nodes will populate)
initial_state = Tool2State()

# Run the graph
final_state = graph.invoke(initial_state)

end_time = datetime.now()
duration = (end_time - start_time).total_seconds()

print("="*60)
print(f"‚úÖ Pipeline completed in {duration:.2f}s")
print(f"üìä Final metrics:")
print(f"   - Facts: {final_state['final_structure'].metrics.total_facts}")
print(f"   - Dimensions: {final_state['final_structure'].metrics.total_dimensions}")
print(f"   - Hierarchies: {final_state['final_structure'].metrics.total_hierarchies}")
print(f"   - Relationships: {final_state['final_structure'].metrics.total_relationships}")
print(f"   - Coverage: {final_state['final_structure'].metrics.coverage*100:.1f}%")
print(f"   - Unresolved: {len(final_state['final_structure'].metrics.unresolved_entities)}")

üöÄ Starting Tool 2 pipeline...
üîÑ Node 1: Loading context...


FileNotFoundError: [Errno 2] No such file or directory: 'data/tool1/filtered_dataset.json'

## 9. Results Summary

**Status:** ‚è≥ Pending execution

**TODO:**
- [ ] Display sample classifications
- [ ] Show relationship examples
- [ ] Validate against acceptance criteria

**IDEA:**
- Create visualization of fact-dimension relationships
- Export to Mermaid diagram

**BUG:**
- None known yet

In [None]:
# Display results summary
print("üìã Tool 2 - Structural Analysis Results")
print("="*60)

if "final_structure" in final_state:
    structure = final_state["final_structure"]

    print("\nüéØ Business Context:")
    print(f"   Entities: {', '.join(structure.business_context['entities'])}")
    print(f"   Scope Out: {structure.business_context['scope_out']}")

    print("\nüìä Facts:")
    for fact in structure.facts[:3]:  # Show first 3
        print(f"   - {fact.table_id}: {fact.grain} (confidence: {fact.confidence:.2f})")
        print(f"     Measures: {', '.join(fact.measures[:3])}")

    print("\nüìê Dimensions:")
    for dim in structure.dimensions[:3]:  # Show first 3
        print(f"   - {dim.table_id}: key={dim.business_key} (confidence: {dim.confidence:.2f})")
        print(f"     Attributes: {', '.join(dim.attributes[:3])}")

    print("\nüîó Relationships:")
    for rel in structure.relationships[:3]:  # Show first 3
        print(f"   - {rel.from_table} ‚Üí {rel.to_table} ({rel.join_column})")

    print("\nüìà Metrics:")
    print(f"   Total Facts: {structure.metrics.total_facts}")
    print(f"   Total Dimensions: {structure.metrics.total_dimensions}")
    print(f"   Total Hierarchies: {structure.metrics.total_hierarchies}")
    print(f"   Total Relationships: {structure.metrics.total_relationships}")
    print(f"   Coverage: {structure.metrics.coverage*100:.1f}%")
    print(f"   Unresolved Entities: {structure.metrics.unresolved_entities}")
else:
    print("‚ö†Ô∏è  No results - execute cell above first")

## Development Status

### ‚úÖ What Works
- 5-node LangGraph pipeline architecture
- TypedDict state management (Tool2State)
- Pydantic schemas with Field descriptions (FactTable, DimensionTable, Hierarchy, Relationship)
- Load context from Tool 1 + full metadata
- Save outputs to data/tool2/ and scrum/artifacts/

### ‚ö†Ô∏è Known Issues
1. **LLM Node Placeholder:** classify_entities uses simplified agent invocation - needs proper LangChain integration
2. **Heuristics Hardcoded:** identify_relationships has example relationships, not dynamic column parsing
3. **No LLM Validation:** FK detection purely heuristic, missing LLM validation step
4. **Missing Tool 0 Context:** business_context extracted from Tool 1, should load Tool 0 directly

### üîÑ Next Session
- [ ] Implement real LLM invocation in classify_entities with ToolStrategy
- [ ] Parse full_metadata columns for dynamic FK detection
- [ ] Add LLM validation to identify_relationships for ambiguous cases
- [ ] Load Tool 0 output for complete business_context
- [ ] Test with real BA-BS metadata (current: placeholder relationships)
- [ ] Run compliance checker: `python3 .claude/skills/langchain/compliance-checker/check.py --file notebooks/tool2_structure_demo.ipynb`
- [ ] Measure performance baseline (10 runs average)
- [ ] Update story: skill_created: true, status: done

### üí° Ideas for v2
- **Parallel Execution:** Run classify_entities + identify_relationships in parallel (conditional edges)
- **Confidence Thresholds:** Filter low-confidence results, flag for manual review
- **CZ/EN Alias Dictionary:** Map Czech terminology (dodavatel ‚Üí supplier, objedn√°vka ‚Üí order)
- **Schema Versioning:** Add schema_version field to StructuralAnalysis for backward compatibility
- **Visualization:** Generate Mermaid ERD diagram from relationships
- **Incremental Updates:** Compare with previous structure.json, highlight changes

### üìù Documentation Pattern
- ‚úÖ Status/TODO/IDEA/BUG sections in all markdown cells (Varianta 2 pattern)
- ‚úÖ Architecture diagram in header
- ‚úÖ Field descriptions in all Pydantic models
- ‚úÖ Node purpose documented in docstrings
- ‚è≥ Ready for compliance checker validation