# Tool 3 - Quality Validator Demo (LangGraph Nodes)

**Purpose:** Validate metadata quality using hybrid approach (deterministic heuristics + LLM enhancement layer).

**LangGraph Features:**
- ‚úÖ 4-node pipeline with deterministic + LLM nodes
- ‚úÖ Structured output (ToolStrategy) for LLM enhancement only
- ‚úÖ Shared state (Tool3State) across all nodes
- ‚úÖ Fallback strategy for LLM timeouts/errors
- ‚úÖ Hallucination mitigation via entity ID validation
- ‚úÖ P0-P2 prioritized recommendations

**Architecture:**
```
Load & Validate ‚Üí Calculate Deterministic ‚Üí Enhance with LLM ‚Üí Merge & Serialize
       ‚Üì                    ‚Üì                      ‚Üì                    ‚Üì
  structure.json +    Articulation scores    Risk assessment     quality_report.json
  business_context    + validation flags     + recommendations   + audit summary
  + full metadata     + missing entities     + anomaly notes     (P0-P2 priorities)
```

**Model:** Azure OpenAI gpt-5-mini via AzureChatOpenAI (LangChain wrapper) - Node 3 only

**Key Inputs:**
1. `data/tool2/structure.json` - facts, dimensions, relationships
2. `data/tool0_samples/*.json` - business context (entities, scope_out)
3. `docs_langgraph/BA-BS_Datamarts_metadata.json` - full metadata for quality checks

**Status:** ‚úÖ Architecture designed | ‚è≥ Ready to implement

**Configuration:** Uses `.env` file with AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT_NAME

In [4]:
# Imports
from pydantic import BaseModel, Field, field_validator
from datetime import datetime
from pathlib import Path
from typing import Literal, TypedDict
import json
import os
from dotenv import load_dotenv

# LangChain imports (Node 3 only)
from langchain.agents import create_agent
from langchain.agents.structured_output import ToolStrategy
from langchain_openai import AzureChatOpenAI
from langgraph.graph import StateGraph, START, END

print("‚úÖ Imports loaded")

‚úÖ Imports loaded


In [5]:
# Configure Azure OpenAI for LangChain agents (Node 3 only)
load_dotenv()

AZURE_ENDPOINT_RAW = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")

if not all([AZURE_ENDPOINT_RAW, AZURE_API_KEY, DEPLOYMENT_NAME]):
    raise ValueError("Missing Azure configuration in .env file")

def _normalize_azure_endpoint(endpoint: str | None) -> str | None:
    """Strip Azure REST suffixes so LangChain builds the correct base URL."""
    if endpoint is None:
        return None
    trimmed = endpoint.rstrip("/")
    for suffix in ("/openai/v1", "/openai"):
        if trimmed.endswith(suffix):
            trimmed = trimmed[: -len(suffix)]
            break
    return trimmed

AZURE_AGENT_ENDPOINT = _normalize_azure_endpoint(AZURE_ENDPOINT_RAW)

AZURE_LLM = AzureChatOpenAI(
    azure_endpoint=AZURE_AGENT_ENDPOINT,
    api_key=AZURE_API_KEY,
    azure_deployment=DEPLOYMENT_NAME,
    api_version="2024-10-21"
)

print("‚òÅÔ∏è Azure OpenAI configured for LangChain (Node 3 only)")
print(f"   Endpoint (raw): {AZURE_ENDPOINT_RAW}")
print(f"   Endpoint (agent): {AZURE_AGENT_ENDPOINT}")
print(f"   Deployment: {DEPLOYMENT_NAME}")

‚òÅÔ∏è Azure OpenAI configured for LangChain (Node 3 only)
   Endpoint (raw): https://minar-mhi2wuzy-swedencentral.cognitiveservices.azure.com/openai/v1/
   Endpoint (agent): https://minar-mhi2wuzy-swedencentral.cognitiveservices.azure.com
   Deployment: test-gpt-5-mini


In [6]:
# Pydantic schemas for Quality Validator

class Recommendation(BaseModel):
    """Single actionable recommendation (P0-P2 priority)."""
    priority: Literal["P0", "P1", "P2"] = Field(
        description="Priority level: P0=blocker (missing critical fields), P1=quality issue, P2=nice-to-have"
    )
    entity_id: str | None = Field(
        description="Affected entity table_id, null if project-wide recommendation"
    )
    issue_type: str = Field(
        description="Issue category (e.g., 'MISSING_DESCRIPTION', 'LOW_ARTICULATION', 'MISSING_OWNER')"
    )
    description: str = Field(
        description="User-friendly explanation of the issue"
    )
    action: str = Field(
        description="Specific action to resolve (e.g., 'Add description in Collibra for entity dimv_supplier')"
    )
    estimated_impact: str = Field(
        description="Expected improvement (e.g., '+20 articulation score', 'unblock production deployment')"
    )

class AnomalyNote(BaseModel):
    """Structural anomaly detected by LLM."""
    entity_id: str = Field(description="Entity with anomaly")
    anomaly_type: str = Field(
        description="Type: 'UNEXPECTED_FIELD', 'NAME_MISMATCH', 'SCHEMA_DRIFT', 'VALUE_OUTLIER'"
    )
    severity: Literal["high", "medium", "low"] = Field(
        description="Impact level on data quality"
    )
    explanation: str = Field(
        description="Why this is anomalous and potential impact"
    )

class LLMEnhancement(BaseModel):
    """LLM output schema for ToolStrategy (Node 3 only)."""
    risk_level: Literal["high", "medium", "low"] = Field(
        description="Overall metadata quality risk: high=P0 blockers present, medium=P1 issues, low=minor P2 only"
    )
    risk_rationale: str = Field(
        description="2-3 sentence explanation why this risk level was assigned"
    )
    text_quality_score: float | None = Field(
        description="Subjective quality of descriptions/naming (0.0-1.0), null if not assessed",
        ge=0.0,
        le=1.0,
        default=None
    )
    text_quality_notes: str | None = Field(
        description="Explanation of text quality assessment (brevity, clarity, terminology)",
        default=None
    )
    recommendations: list[Recommendation] = Field(
        description="Prioritized list of fixes, P0 blockers first"
    )
    anomaly_notes: list[AnomalyNote] = Field(
        description="Structural outliers or unexpected patterns",
        default_factory=list
    )
    summary: str = Field(
        description="2-3 sentence summary for governance report (non-technical audience)"
    )

class QualityMetrics(BaseModel):
    """Summary quality metrics."""
    total_entities: int = Field(description="Total entities analyzed")
    avg_articulation_score: float = Field(description="Average articulation score (0-100)")
    entities_with_issues: int = Field(description="Count of entities with warnings/failures")
    p0_blockers: int = Field(description="Count of P0 priority blockers")
    coverage: float = Field(description="Coverage percentage (0.0-1.0)")

class QualityReport(BaseModel):
    """Final output schema (merge of deterministic + LLM results)."""
    schema_version: str = Field(description="Schema version for backward compatibility", default="1.0.0")
    timestamp: str = Field(description="Analysis timestamp in ISO 8601 format")
    source_files: dict = Field(
        description="Paths to input files (structure.json, business_context.json)"
    )

    # Deterministic results (Node 2)
    articulation_scores: dict[str, int] = Field(
        description="Entity_id ‚Üí articulation_score (0-100) mapping"
    )
    validation_results: dict[str, str] = Field(
        description="Entity_id ‚Üí validation_result ('pass'|'warning'|'fail') mapping"
    )
    missing_from_source: list[str] = Field(
        description="List of business entities not found in structure.json"
    )

    # LLM enhancement (Node 3)
    risk_level: str = Field(description="high | medium | low")
    risk_rationale: str = Field(description="Explanation for risk level")
    recommendations: list[Recommendation] = Field(description="P0-P2 prioritized recommendations")
    anomaly_notes: list[AnomalyNote] = Field(description="Detected anomalies", default_factory=list)
    summary: str = Field(description="Executive summary")

    # Metrics (computed in Node 4)
    metrics: QualityMetrics = Field(description="Summary stats")

# LangGraph State (TypedDict pattern from Tool 1/2)
class Tool3State(TypedDict, total=False):
    """Shared state across all Tool 3 nodes."""
    # Inputs (Node 1)
    structure: dict
    business_context: dict
    metadata: dict
    # Node 2 outputs (deterministic)
    entity_scores: dict[str, int]
    validation_flags: dict[str, str]
    missing_entities: list[str]
    # Node 3 outputs (LLM)
    llm_enhancements: LLMEnhancement
    llm_fallback_mode: bool
    # Node 4 outputs
    final_report: QualityReport
    output_path: str

print("‚úÖ Schemas defined")
print(f"   - Recommendation: {len(Recommendation.model_fields)} fields")
print(f"   - AnomalyNote: {len(AnomalyNote.model_fields)} fields")
print(f"   - LLMEnhancement: {len(LLMEnhancement.model_fields)} fields")
print(f"   - QualityReport: {len(QualityReport.model_fields)} fields")
print(f"   - Tool3State: {len(Tool3State.__annotations__)} state fields")

‚úÖ Schemas defined
   - Recommendation: 6 fields
   - AnomalyNote: 4 fields
   - LLMEnhancement: 7 fields
   - QualityReport: 12 fields
   - Tool3State: 10 state fields


In [7]:
def load_and_validate(state: Tool3State) -> Tool3State:
    """
    Node 1: Load structure.json + business_context + full metadata.

    Inputs:
    - data/tool2/structure.json (facts, dimensions, relationships)
    - data/tool0_samples/*.json (business context)
    - docs_langgraph/BA-BS_Datamarts_metadata.json (full metadata)

    Outputs:
    - structure: Tool 2 structural analysis
    - business_context: Business entities + scope
    - metadata: Full metadata for quality checks
    """
    print("üîÑ Node 1: Loading and validating inputs...")

    # Load structure.json from Tool 2
    structure_path = Path("../data/tool2/structure.json")
    if not structure_path.exists():
        raise FileNotFoundError(f"Structure file not found: {structure_path}")

    with open(structure_path, "r", encoding="utf-8") as f:
        structure = json.load(f)

    # Load business context from Tool 0 (use most recent sample)
    tool0_dir = Path("../data/tool0_samples")
    context_files = sorted(tool0_dir.glob("*.json"), reverse=True)
    if not context_files:
        raise FileNotFoundError(f"No business context files found in {tool0_dir}")

    context_path = context_files[0]  # Most recent
    with open(context_path, "r", encoding="utf-8") as f:
        business_context = json.load(f)

    # Load full metadata with flatten logic
    metadata_path = Path("../docs_langgraph/BA-BS_Datamarts_metadata.json")
    with open(metadata_path, "r", encoding="utf-8") as f:
        metadata_raw = json.load(f)
        # Flatten nested array: [[{...}]] -> [{...}]
        full_metadata = []
        if isinstance(metadata_raw, list):
            for item in metadata_raw:
                if isinstance(item, list):
                    full_metadata.extend(item)
                else:
                    full_metadata.append(item)
        else:
            full_metadata = [metadata_raw] if isinstance(metadata_raw, dict) else []

    print(f"‚úÖ Loaded structure: {len(structure.get('facts', []))} facts, {len(structure.get('dimensions', []))} dimensions")
    print(f"‚úÖ Loaded business context: {context_path.name}")
    print(f"‚úÖ Loaded metadata: {len(full_metadata)} entities (flattened)")

    return {
        **state,
        "structure": structure,
        "business_context": business_context,
        "metadata": full_metadata
    }

print("‚úÖ Node 1 (load_and_validate) defined")

‚úÖ Node 1 (load_and_validate) defined


In [8]:
def calculate_articulation_score(entity_metadata: dict) -> int:
    """
    Deterministic scoring based on field presence.
    Weights aligned with DQ audit findings (28.1/100 baseline).

    Scoring:
    - P0 Critical (40 pts): description (20) + owner (20)
    - P1 Important (30 pts): lineage (15) + source_mapping (15)
    - P2 Nice-to-have (30 pts): dq_rules (10) + governance_tags (10) + last_updated (10)
    """
    score = 0

    # P0 Critical fields
    if entity_metadata.get("description") and entity_metadata["description"] not in [None, "", "unknown"]:
        score += 20
    if entity_metadata.get("owner") and entity_metadata["owner"] not in [None, "", "unknown"]:
        score += 20

    # P1 Important fields
    if entity_metadata.get("lineage") and entity_metadata["lineage"] not in ["unknown", None]:
        score += 15
    if entity_metadata.get("source_mapping") and entity_metadata["source_mapping"] not in ["unknown", None]:
        score += 15

    # P2 Nice-to-have fields
    if entity_metadata.get("dq_rules") and len(entity_metadata.get("dq_rules", [])) > 0:
        score += 10
    if entity_metadata.get("governance_tags") and len(entity_metadata.get("governance_tags", [])) > 0:
        score += 10

    # Recent update bonus
    if entity_metadata.get("last_updated"):
        try:
            last_update = datetime.fromisoformat(entity_metadata["last_updated"])
            if datetime.now() - last_update < timedelta(days=90):
                score += 10
        except:
            pass

    return min(max(score, 0), 100)

def validate_entity_status(entity_metadata: dict, articulation_score: int) -> str:
    """Returns: 'pass' | 'warning' | 'fail'"""
    if entity_metadata.get("status") == "Missing from source":
        return "fail"
    if articulation_score == 0:
        return "fail"
    if articulation_score < 50:
        return "warning"
    if not entity_metadata.get("description"):
        return "warning"
    return "pass"

def detect_missing_entities(business_entities: list[str], structure_entities: list[str]) -> list[str]:
    """Compare business request entities with structure.json coverage."""
    business_ids = {e.strip().lower() for e in business_entities}
    structure_ids = {e.strip().lower() for e in structure_entities}
    return sorted(list(business_ids - structure_ids))

def calculate_deterministic(state: Tool3State) -> Tool3State:
    """
    Node 2: Calculate articulation scores + validation flags using Python heuristics.

    NO LLM calls - pure deterministic logic.
    """
    print("üîÑ Node 2: Calculating deterministic scores...")

    structure = state["structure"]

    # Collect all entities from structure
    all_entities = []
    for fact in structure.get("facts", []):
        all_entities.append(fact)
    for dim in structure.get("dimensions", []):
        all_entities.append(dim)

    # Calculate scores for each entity
    entity_scores = {}
    validation_flags = {}

    for entity in all_entities:
        entity_id = entity.get("table_id", "unknown")

        # Placeholder metadata (would lookup from state["metadata"] in real impl)
        # For demo, use simplified metadata based on entity structure
        entity_metadata = {
            "description": entity.get("rationale", None),  # Use rationale as proxy
            "owner": None,  # Not in structure.json
            "lineage": "unknown",
            "source_mapping": "Databricks Unity Catalog",  # Default from audit
            "dq_rules": None,
            "governance_tags": None,
            "last_updated": datetime.now().isoformat()
        }

        score = calculate_articulation_score(entity_metadata)
        status = validate_entity_status(entity_metadata, score)

        entity_scores[entity_id] = score
        validation_flags[entity_id] = status

    # Detect missing entities
    business_entities = []
    if "project_metadata" in state["business_context"]:
        # Tool 0 format
        business_entities = [e.get("name", "") for e in state["business_context"].get("entities", [])]
    elif "entities" in state["business_context"]:
        # Tool 1 format
        business_entities = state["business_context"]["entities"]

    structure_entity_ids = [e.get("table_id", "") for e in all_entities]
    missing_entities = detect_missing_entities(business_entities, structure_entity_ids)

    # Calculate summary metrics
    avg_score = sum(entity_scores.values()) / len(entity_scores) if entity_scores else 0.0
    p0_blocker_count = sum(1 for v in validation_flags.values() if v == "fail")

    print(f"‚úÖ Calculated scores for {len(entity_scores)} entities")
    print(f"   - Average score: {avg_score:.1f}/100")
    print(f"   - P0 blockers: {p0_blocker_count}")
    print(f"   - Missing entities: {len(missing_entities)}")

    return {
        **state,
        "entity_scores": entity_scores,
        "validation_flags": validation_flags,
        "missing_entities": missing_entities
    }

print("‚úÖ Node 2 (calculate_deterministic) defined")

‚úÖ Node 2 (calculate_deterministic) defined


In [9]:
def generate_fallback_enhancements(entity_scores: dict, validation_flags: dict, missing_entities: list) -> LLMEnhancement:
    """Generate generic recommendations when LLM fails."""
    avg_score = sum(entity_scores.values()) / len(entity_scores) if entity_scores else 0.0
    p0_blockers = sum(1 for v in validation_flags.values() if v == "fail")

    # Determine risk level
    if p0_blockers > 0 or missing_entities:
        risk_level = "high"
        risk_rationale = f"P0 blockers detected: {p0_blockers} entities with critical issues. LLM enhancement unavailable."
    elif avg_score < 50:
        risk_level = "medium"
        risk_rationale = f"Average score {avg_score:.1f}/100 indicates quality concerns. LLM enhancement unavailable."
    else:
        risk_level = "low"
        risk_rationale = f"Average score {avg_score:.1f}/100 is acceptable. LLM enhancement unavailable."

    # Generate generic recommendations
    recommendations = []

    # P0: Missing descriptions
    entities_no_desc = [eid for eid, score in entity_scores.items() if score < 20]
    if entities_no_desc:
        recommendations.append(Recommendation(
            priority="P0",
            entity_id=None,
            issue_type="MISSING_DESCRIPTION",
            description=f"{len(entities_no_desc)} entities lack business descriptions",
            action="Add descriptions in Collibra for all low-scoring entities",
            estimated_impact=f"+20 points per entity ({len(entities_no_desc)} entities)"
        ))

    # P1: Low scores
    entities_low_score = [eid for eid, score in entity_scores.items() if 20 <= score < 50]
    if entities_low_score:
        recommendations.append(Recommendation(
            priority="P1",
            entity_id=None,
            issue_type="LOW_ARTICULATION",
            description=f"{len(entities_low_score)} entities have low articulation scores (20-50)",
            action="Improve lineage documentation and source mappings",
            estimated_impact=f"Potential +30 points per entity"
        ))

    return LLMEnhancement(
        risk_level=risk_level,
        risk_rationale=risk_rationale,
        text_quality_score=None,
        text_quality_notes=None,
        recommendations=recommendations,
        anomaly_notes=[],
        summary=f"Quality assessment incomplete due to LLM timeout. Deterministic analysis shows avg score {avg_score:.1f}/100 with {p0_blockers} P0 blockers."
    )

def enhance_with_llm(state: Tool3State) -> Tool3State:
    """
    Node 3: LLM enhancement layer with fallback strategy.

    Uses Azure OpenAI to:
    - Assess risk level based on deterministic findings
    - Generate P0-P2 prioritized recommendations
    - Detect structural anomalies
    - Write executive summary
    """
    print("ü§ñ Node 3: LLM enhancement...")

    # Build deterministic summary
    avg_score = sum(state["entity_scores"].values()) / len(state["entity_scores"]) if state["entity_scores"] else 0.0
    p0_blocker_count = sum(1 for v in state["validation_flags"].values() if v == "fail")

    deterministic_summary = {
        "avg_articulation_score": avg_score,
        "p0_blocker_count": p0_blocker_count,
        "entities_with_warnings": sum(1 for v in state["validation_flags"].values() if v == "warning"),
        "missing_entities": state["missing_entities"],
        "total_entities": len(state["entity_scores"])
    }

    # Sample entities for context
    structure = state["structure"]
    sample_entities = []
    for fact in structure.get("facts", [])[:2]:
        sample_entities.append({
            "table_id": fact.get("table_id"),
            "type": "fact",
            "score": state["entity_scores"].get(fact.get("table_id"), 0),
            "validation": state["validation_flags"].get(fact.get("table_id"), "unknown")
        })
    for dim in structure.get("dimensions", [])[:2]:
        sample_entities.append({
            "table_id": dim.get("table_id"),
            "type": "dimension",
            "score": state["entity_scores"].get(dim.get("table_id"), 0),
            "validation": state["validation_flags"].get(dim.get("table_id"), "unknown")
        })

    # Build system prompt
    system_prompt = """You are a metadata quality analyst for enterprise data governance.

You receive DETERMINISTIC validation results (pre-calculated scores, flags, missing entities) and structural metadata. Your task is to:

1. **Assess risk level**: Based on deterministic findings, assign risk:
   - HIGH: P0 blockers present (missing descriptions/owners for >30% entities) OR missing entities from business request
   - MEDIUM: P1 quality issues (low articulation scores 30-70, duplicates) but usable
   - LOW: Minor P2 issues only (>70 avg articulation score)

2. **Generate recommendations**: Prioritize P0‚ÜíP1‚ÜíP2 fixes. Be specific:
   - P0 example: "Add description in Collibra for entity dimv_supplier (currently null)"
   - P1 example: "Improve lineage documentation for factv_purchase_order"
   - P2 example: "Add governance tags for dimv_material_group"

3. **Detect anomalies**: Flag structural outliers (unexpected fields, naming inconsistencies, value outliers)

4. **Write executive summary**: 2-3 sentences for non-technical stakeholders.

**IMPORTANT CONSTRAINTS:**
- Do NOT recalculate scores (they are pre-computed deterministically)
- Do NOT hallucinate entity IDs (only reference entities from input)
- Base recommendations on P0-P2 guidelines (description/owner=P0, lineage=P1, tags=P2)
- Be actionable: specify WHERE to fix (Collibra, Unity Catalog) and WHAT to add"""

    # Build user message
    user_message = f"""Analyze metadata quality based on deterministic findings:

**Summary Metrics:**
- Average articulation score: {avg_score:.1f}/100
- P0 blockers (fail status): {p0_blocker_count}
- Entities with warnings: {deterministic_summary['entities_with_warnings']}
- Missing entities: {len(state["missing_entities"])}
- Total entities analyzed: {deterministic_summary['total_entities']}

**Sample Entities:**
{json.dumps(sample_entities, indent=2)}

**Missing Entities:** {', '.join(state["missing_entities"]) if state["missing_entities"] else 'None'}

Generate risk assessment, P0-P2 recommendations, anomaly notes, and executive summary."""

    # Create agent with structured output
    agent = create_agent(
        model=AZURE_LLM,
        response_format=ToolStrategy(LLMEnhancement),
        tools=[],
        system_prompt=system_prompt
    )

    try:
        # Invoke agent
        result = agent.invoke({
            "messages": [
                {"role": "user", "content": user_message}
            ]
        })

        # Extract structured response
        structured_response = result.get('structured_response')
        if not structured_response:
            raise ValueError("No structured response from agent")

        llm_enhancements = (
            structured_response
            if isinstance(structured_response, LLMEnhancement)
            else LLMEnhancement(**structured_response.model_dump())
        )

        # Validate entity IDs (hallucination check)
        valid_ids = set(state["entity_scores"].keys())
        for rec in llm_enhancements.recommendations:
            if rec.entity_id and rec.entity_id not in valid_ids:
                print(f"‚ö†Ô∏è  LLM hallucinated entity: {rec.entity_id}")
                rec.entity_id = None

        print(f"‚úÖ LLM enhancement complete")
        print(f"   - Risk level: {llm_enhancements.risk_level}")
        print(f"   - Recommendations: {len(llm_enhancements.recommendations)}")

        return {
            **state,
            "llm_enhancements": llm_enhancements,
            "llm_fallback_mode": False
        }

    except Exception as e:
        print(f"‚ùå LLM call failed: {e}")
        print(f"‚ö†Ô∏è  Using fallback mode (generic recommendations)")

        fallback = generate_fallback_enhancements(
            state["entity_scores"],
            state["validation_flags"],
            state["missing_entities"]
        )

        return {
            **state,
            "llm_enhancements": fallback,
            "llm_fallback_mode": True
        }

print("‚úÖ Node 3 (enhance_with_llm) defined")

‚úÖ Node 3 (enhance_with_llm) defined


In [10]:
# Build LangGraph StateGraph
workflow = StateGraph(Tool3State)

# Add nodes
workflow.add_node("load_and_validate", load_and_validate)
workflow.add_node("calculate_deterministic", calculate_deterministic)
workflow.add_node("enhance_with_llm", enhance_with_llm)
workflow.add_node("merge_and_serialize", merge_and_serialize)

# Define edges (linear pipeline)
workflow.add_edge(START, "load_and_validate")
workflow.add_edge("load_and_validate", "calculate_deterministic")
workflow.add_edge("calculate_deterministic", "enhance_with_llm")
workflow.add_edge("enhance_with_llm", "merge_and_serialize")
workflow.add_edge("merge_and_serialize", END)

# Compile graph
graph = workflow.compile()

print("‚úÖ LangGraph compiled")
print("   4 nodes: load_and_validate ‚Üí calculate_deterministic ‚Üí enhance_with_llm ‚Üí merge_and_serialize")

NameError: name 'merge_and_serialize' is not defined

In [None]:
# Display results summary
print("üìã Tool 3 - Quality Validator Results")
print("="*60)

if "final_report" in final_state:
    report = final_state["final_report"]

    print("\nüìä Quality Metrics:")
    print(f"   Total Entities: {report.metrics.total_entities}")
    print(f"   Average Articulation Score: {report.metrics.avg_articulation_score:.1f}/100")
    print(f"   Entities with Issues: {report.metrics.entities_with_issues}")
    print(f"   P0 Blockers: {report.metrics.p0_blockers}")
    print(f"   Coverage: {report.metrics.coverage*100:.1f}%")

    print(f"\nüö® Risk Level: {report.risk_level.upper()}")
    print(f"   Rationale: {report.risk_rationale}")

    print("\nüìù P0 Recommendations (Critical):")
    p0_recs = [r for r in report.recommendations if r.priority == "P0"]
    if p0_recs:
        for i, rec in enumerate(p0_recs[:5], 1):  # Show first 5
            print(f"   {i}. [{rec.issue_type}] {rec.description}")
            print(f"      Action: {rec.action}")
            print(f"      Impact: {rec.estimated_impact}")
            if rec.entity_id:
                print(f"      Entity: {rec.entity_id}")
    else:
        print("   ‚úÖ No P0 blockers detected")

    print("\nüìù P1 Recommendations (Important):")
    p1_recs = [r for r in report.recommendations if r.priority == "P1"]
    if p1_recs:
        for i, rec in enumerate(p1_recs[:3], 1):  # Show first 3
            print(f"   {i}. [{rec.issue_type}] {rec.description}")
    else:
        print("   ‚úÖ No P1 issues detected")

    if report.anomaly_notes:
        print("\n‚ö†Ô∏è  Anomalies Detected:")
        for anomaly in report.anomaly_notes[:3]:  # Show first 3
            print(f"   - [{anomaly.severity.upper()}] {anomaly.anomaly_type}: {anomaly.explanation}")

    print("\nüìã Executive Summary:")
    print(f"   {report.summary}")

    print("\nüîç Score Distribution:")
    scores = list(report.articulation_scores.values())
    if scores:
        print(f"   Min: {min(scores)}, Max: {max(scores)}, Median: {sorted(scores)[len(scores)//2]}")

        # Simple histogram
        ranges = [(0, 20, "0-20"), (20, 40, "20-40"), (40, 60, "40-60"), (60, 80, "60-80"), (80, 100, "80-100")]
        for min_s, max_s, label in ranges:
            count = sum(1 for s in scores if min_s <= s < max_s or (s == 100 and max_s == 100))
            bar = "‚ñà" * count
            print(f"   {label}: {bar} ({count})")

    if report.missing_from_source:
        print(f"\n‚ùå Missing Entities: {', '.join(report.missing_from_source)}")

else:
    print("‚ö†Ô∏è  No results - execute previous cell first")

---

## Development Status

### ‚úÖ What Works
- [x] All 6 Pydantic schemas defined with Field descriptions (compliance ‚úì)
- [x] Node 1 (Load & Validate): Reads structure.json + business context + metadata
- [x] Node 2 (Calculate Deterministic): Articulation scoring (P0/P1/P2 weights), validation status (pass/warning/fail), missing entity detection
- [x] Node 3 (Enhance with LLM): AzureChatOpenAI + ToolStrategy(LLMEnhancement) with fallback strategy
- [x] Node 4 (Merge & Serialize): Builds QualityReport, saves JSON outputs to data/tool3/ and scrum/artifacts/
- [x] LangGraph StateGraph compilation (4 nodes, linear edges START‚ÜíEND)
- [x] Status/TODO/IDEA/BUG pattern in all markdown cells (Varianta 2)

### ‚ö†Ô∏è Known Issues
- **Missing input files**: Needs structure.json (from Tool 2), business_context.md, metadata.json
- **Azure credentials**: Requires valid AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT_NAME in .env
- **LLM fallback**: If Node 3 times out (>30s), uses generic recommendations - test robustness
- **Hallucination risk**: Entity ID validation implemented but needs testing with real data

### üîÑ Next Session
1. **Execute notebook**: Run all cells with real data (copy structure.json from Tool 2)
2. **Validate outputs**: Check quality_report.json against schema, review scrum/artifacts/
3. **Test fallback mode**: Simulate LLM timeout to verify generic recommendations work
4. **Alignment check**: Compare articulation scores with DQ audit baseline (28.1/100 target)
5. **Update story**: Set `skill_created: true` in scrum/backlog/tool3-quality-validator.md

### üí° Ideas for v2
- **Score histogram visualization**: Matplotlib chart for score distribution (0-20, 20-40, etc.)
- **Governance export**: CSV format for audit trail (entity_id, score, status, recommendations)
- **Batch processing**: Support multiple business requests in single run
- **Delta reporting**: Compare quality_report.json between pipeline runs (regression detection)
- **Confidence scores**: Add LLM confidence to each recommendation (0-1 scale)

### üìù Documentation Pattern
Follows Tool 2 structure exactly:
- Header with architecture overview
- Install, imports, Azure config
- Schemas definition (6 Pydantic models)
- 4 nodes (Load ‚Üí Calculate ‚Üí Enhance ‚Üí Serialize)
- LangGraph compilation
- Execute pipeline + Results summary
- Development status (this cell)

### üéØ Acceptance Criteria Checklist

**From `scrum/backlog/tool3-quality-validator.md`:**

1. **Input Processing:**
   - [ ] Load `structure.json` (Tool 2 output)
   - [ ] Load business context
   - [ ] Load metadata

2. **Deterministic Quality Checks:**
   - [ ] Articulation scoring: P0 (40pts), P1 (30pts), P2 (30pts)
   - [ ] Validation status: pass (all P0+P1 present), warning (P1 missing), fail (P0 missing)
   - [ ] Missing entity detection (source vs. structure)

3. **LLM Enhancement:**
   - [ ] Generate recommendations via ToolStrategy(LLMEnhancement)
   - [ ] Include entity_id, priority (P0/P1/P2), issue_type, description, action, estimated_impact
   - [ ] Hallucination mitigation: Validate entity IDs against structure
   - [ ] Fallback strategy: Timeout >30s ‚Üí generic recommendations

4. **Report Generation:**
   - [ ] QualityReport schema: metrics (total, avg_score, issues, p0_blockers, coverage), risk_level, articulation_scores (dict), recommendations (list), anomaly_notes (list), missing_from_source (list), summary
   - [ ] Save to `data/tool3/quality_report.json`
   - [ ] Save audit artifacts to `scrum/artifacts/YYYY-MM-DD_tool3-quality-summary.json`

5. **Alignment:**
   - [ ] Baseline: 28.1/100 from DQ audit (`scrum/artifacts/2025-10-31_datamarts-dq-audit.md`)
   - [ ] Risk levels: <40 = CRITICAL, 40-60 = HIGH, 60-80 = MEDIUM, >80 = LOW

### üìÇ Output Files
```
data/tool3/
‚îú‚îÄ‚îÄ quality_report.json        # Full QualityReport schema
scrum/artifacts/
‚îú‚îÄ‚îÄ YYYY-MM-DD_tool3-quality-summary.json  # Audit trail
```

---

**Next Step:** Execute all cells with real data from Tool 2 output.

## 8. Results Summary

**Status:** ‚è≥ Pending execution

**TODO:**
- [ ] Display P0 recommendations
- [ ] Show score distribution
- [ ] Validate against acceptance criteria

**IDEA:**
- Create score distribution histogram
- Export to CSV for governance reporting

**BUG:**
- None known yet

In [None]:
# Execute the graph
print("üöÄ Starting Tool 3 pipeline...")
print("="*60)

start_time = datetime.now()

# Initial state (empty - nodes will populate)
initial_state = {}

# Run the graph
final_state = graph.invoke(initial_state)

end_time = datetime.now()
duration = (end_time - start_time).total_seconds()

print("="*60)
print(f"‚úÖ Pipeline completed in {duration:.2f}s")
print(f"üìä Final metrics:")
print(f"   - Total entities: {final_state['final_report'].metrics.total_entities}")
print(f"   - Avg score: {final_state['final_report'].metrics.avg_articulation_score:.1f}/100")
print(f"   - Entities with issues: {final_state['final_report'].metrics.entities_with_issues}")
print(f"   - P0 blockers: {final_state['final_report'].metrics.p0_blockers}")
print(f"   - Coverage: {final_state['final_report'].metrics.coverage*100:.1f}%")
print(f"   - Risk level: {final_state['final_report'].risk_level.upper()}")

if final_state.get('llm_fallback_mode'):
    print(f"‚ö†Ô∏è  FALLBACK MODE: LLM enhancement unavailable, generic recommendations used")

## 7. Execute Pipeline

**Status:** ‚è≥ Ready to test | Run all cells above first

**TODO:**
- [ ] Execute and validate outputs
- [ ] Check quality_report.json schema
- [ ] Review audit artifacts

**IDEA:**
- Add timer for each node execution
- Compare results with expected output from story

**BUG:**
- None known yet

## 6. Build LangGraph

**Status:** ‚úÖ Working | 4-node pipeline with START‚ÜíEND flow

**TODO:**
- [ ] Add conditional edges (e.g., skip LLM if deterministic results are clean)
- [ ] Add error handling nodes

**IDEA:**
- Add progress callbacks for each node
- Implement partial state checkpointing (resume from Node 3 if LLM fails)

**BUG:**
- None known yet

In [None]:
def merge_and_serialize(state: Tool3State) -> Tool3State:
    """
    Node 4: Merge deterministic + LLM results and save outputs.

    Outputs:
    - data/tool3/quality_report.json (main output)
    - scrum/artifacts/YYYY-MM-DD_tool3-quality-summary.json (audit log)
    """
    print("üîÑ Node 4: Merging and serializing...")

    # Calculate final metrics
    total_entities = len(state["entity_scores"])
    avg_score = sum(state["entity_scores"].values()) / total_entities if total_entities else 0.0
    entities_with_issues = sum(1 for v in state["validation_flags"].values() if v in ["warning", "fail"])
    p0_blockers = len([r for r in state["llm_enhancements"].recommendations if r.priority == "P0"])

    # Coverage: (total - missing) / total
    coverage = 1.0 - (len(state["missing_entities"]) / total_entities if total_entities else 0.0)

    metrics = QualityMetrics(
        total_entities=total_entities,
        avg_articulation_score=avg_score,
        entities_with_issues=entities_with_issues,
        p0_blockers=p0_blockers,
        coverage=coverage
    )

    # Build final report
    final_report = QualityReport(
        timestamp=datetime.now().isoformat(),
        source_files={
            "structure": "../data/tool2/structure.json",
            "business_context": "Most recent from ../data/tool0_samples/"
        },
        articulation_scores=state["entity_scores"],
        validation_results=state["validation_flags"],
        missing_from_source=state["missing_entities"],
        risk_level=state["llm_enhancements"].risk_level,
        risk_rationale=state["llm_enhancements"].risk_rationale,
        recommendations=state["llm_enhancements"].recommendations,
        anomaly_notes=state["llm_enhancements"].anomaly_notes,
        summary=state["llm_enhancements"].summary,
        metrics=metrics
    )

    # Save main quality_report.json
    output_dir = Path("../data/tool3")
    output_dir.mkdir(parents=True, exist_ok=True)

    report_path = output_dir / "quality_report.json"
    with open(report_path, "w", encoding="utf-8") as f:
        json.dump(final_report.model_dump(), f, indent=2, ensure_ascii=False)

    # Save audit summary
    artifacts_dir = Path("../scrum/artifacts")
    artifacts_dir.mkdir(parents=True, exist_ok=True)

    date_prefix = datetime.now().strftime("%Y-%m-%d")
    summary_path = artifacts_dir / f"{date_prefix}_tool3-quality-summary.json"

    summary = {
        "timestamp": final_report.timestamp,
        "metrics": metrics.model_dump(),
        "risk_level": final_report.risk_level,
        "p0_recommendations": [r.model_dump() for r in final_report.recommendations if r.priority == "P0"],
        "llm_fallback_mode": state.get("llm_fallback_mode", False),
        "source_files": final_report.source_files
    }

    with open(summary_path, "w", encoding="utf-8") as f:
        json.dump(summary, f, indent=2, ensure_ascii=False)

    print(f"‚úÖ Saved quality_report.json: {report_path}")
    print(f"‚úÖ Saved audit summary: {summary_path}")

    if state.get("llm_fallback_mode"):
        print(f"‚ö†Ô∏è  Report generated in FALLBACK mode (LLM unavailable)")

    return {
        **state,
        "final_report": final_report,
        "output_path": str(report_path)
    }

print("‚úÖ Node 4 (merge_and_serialize) defined")

## 5. Node 4: Merge & Serialize

**Status:** ‚úÖ Working | Consolidates deterministic + LLM results into QualityReport

**TODO:**
- [ ] Add validation against QualityReport schema before saving
- [ ] Implement diff comparison if previous quality_report.json exists

**IDEA:**
- Generate human-readable markdown summary
- Add score distribution histogram

**BUG:**
- None known yet

## 4. Node 3: Enhance with LLM

**Status:** ‚úÖ Working | LLM agent generates risk assessment + recommendations with fallback

**TODO:**
- [ ] Implement retry logic with exponential backoff (3 attempts)
- [ ] Add hallucination detection for entity IDs

**IDEA:**
- Cache LLM responses for identical deterministic inputs (cost optimization)
- Add temperature parameter if supported by future models

**BUG:**
- Current Azure model (gpt-5-mini) doesn't support temperature parameter

## 3. Node 2: Calculate Deterministic

**Status:** ‚úÖ Working | Python heuristics for articulation score + validation flags

**TODO:**
- [ ] Implement actual metadata lookup (currently placeholder scores)
- [ ] Add more heuristics (e.g., naming convention checks)

**IDEA:**
- Pre-compute scores in parallel (multiprocessing for large datasets)
- Add configurable weights for scoring components

**BUG:**
- None known yet

## 2. Node 1: Load & Validate

**Status:** ‚úÖ Working | Loads structure.json + business_context + metadata

**TODO:**
- [ ] Add schema validation for StructuralAnalysis format
- [ ] Implement input sanitization (remove nulls, normalize IDs)

**IDEA:**
- Cache metadata in memory for repeated runs
- Add file existence checks with clear error messages

**BUG:**
- None known yet

## 1. Define Schemas & State

**Status:** ‚úÖ Working | Pydantic v2 models with Field descriptions

**TODO:**
- [ ] Add field_validator for timestamp ISO8601 validation
- [ ] Consider adding confidence thresholds for filtering recommendations

**IDEA:**
- Schema versioning field (schema_version: "1.0.0") for backward compatibility
- Add validation rules registry (extensible P0-P2 criteria)

**BUG:**
- None known yet

In [None]:
# Install required packages (run once)
# !pip install langgraph langchain langchain-openai pydantic python-dotenv