<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/522_IRMOv2_dataLoading_utils.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is **excellent foundational code** — and it’s exactly the right thing to review early, because data loading is where most “AI agents” quietly become unreliable.

---

## Why This Matters to Executives

This module quietly enforces some non-negotiable properties leaders care about:

* **Reliability** — the agent does not reason over bad data
* **Predictability** — data requirements are explicit
* **Auditability** — inputs and relationships are traceable
* **Control** — failures are visible, not hidden
* **Scalability** — new datasets can be added without destabilizing the system

Most AI agents start reasoning first and validating later—if at all.

This agent validates first, *then* reasons.

---

## Data Loading Utilities – Making the Agent Reliable Before It Thinks

These data loading utilities are not just about reading JSON files.
They establish the **trust boundary** for the entire Integration & Risk Management Orchestrator.

Before the agent evaluates risk, prioritizes issues, or produces executive reports, it must first answer a more fundamental question:

**“Can I trust the data I’m reasoning over?”**

This module ensures the answer is *yes*.

---

## What This Code Does in Practice

At a high level, this utility performs three critical functions:

1. **Loads all operational data used by the agent**
2. **Validates structure and required fields before use**
3. **Builds deterministic lookup tables for explainable reasoning**

This guarantees that every downstream decision is based on **known, validated inputs**, not assumptions.

---

## Defensive Data Loading: Fail Fast, Not Quietly

Each `load_*` function follows the same disciplined pattern:

* Construct the file path explicitly
* Validate the JSON structure and required fields
* Raise a hard error if validation fails

This is a deliberate design choice.

Instead of allowing partial, malformed, or inconsistent data to propagate into analysis logic, the agent **fails immediately and visibly**.

For executives and operators, this means:

* No silent data corruption
* No misleading reports
* No false sense of system health

If the agent produces output, leadership can trust that the underlying data met minimum quality standards.

---

## Explicit Validation Over Implicit Assumptions

The use of `validate_json_file` enforces:

* Expected top-level data types
* Required fields for each record
* Structural consistency across datasets

This eliminates a common failure mode in AI systems:
*logic that assumes data is “probably fine.”*

Here, assumptions are replaced with **explicit contracts**.

That contract-first mindset is what makes the system auditable and safe to scale.

---

## Clear Separation of Data Domains

Each loader function is scoped to a single responsibility:

* Agent inventory
* System integrations
* Workflows
* Risk signals
* KPIs and cost metrics
* Historical snapshots
* Governance and review history
* Expected vs actual value

This separation makes the system:

* Easier to reason about
* Easier to extend
* Easier to debug when something changes

It also mirrors how real organizations manage operational data across teams.

---

## Lookup Construction: Deterministic Reasoning at Scale

The `build_lookups` function is one of the most important pieces in this module.

Rather than repeatedly scanning raw lists, the agent builds **explicit lookup tables**:

* Agent → metadata
* Agent → workflows
* Agent → risks
* Agent → KPIs
* Agent → historical snapshots
* Agent → governance reviews
* Agent → expected vs actual value

This does more than improve performance.

It ensures that:

* Relationships between entities are explicit
* Joins are deterministic
* Reasoning paths are inspectable

When the agent later explains *why* something was flagged or prioritized, those explanations trace cleanly back through these lookups.

---

## Versioned Expansion Without Breaking the Core

Notice how v2 features—historical snapshots, ownership reviews, and expected vs actual value—are handled as **optional extensions**.

This allows the agent to:

* Run in a simpler MVP mode
* Gradually introduce historical and governance reasoning
* Avoid brittle dependencies between features

Nothing in the core logic assumes advanced features are always present.

That makes the system robust during iteration and safer in production-like environments.

---

## Architectural Takeaway

This data loading layer is not plumbing.
It is a **governance layer**.

By enforcing validation, explicit structure, and deterministic relationships, it ensures that everything the agent produces downstream—scores, trends, recommendations, and executive reports—can be trusted.

This is how you build AI systems that don’t surprise leadership.




In [None]:
"""Data loading utilities for Integration & Risk Management Orchestrator"""

import json
from pathlib import Path
from typing import Dict, List, Any, Optional
from toolshed.validation import validate_json_file


def load_agents(data_dir: str, agents_file: str) -> List[Dict[str, Any]]:
    """Load agents inventory data"""
    file_path = Path(data_dir) / agents_file
    data, errors = validate_json_file(
        file_path,
        expected_type=list,
        item_type=dict,
        required_fields=["agent_id", "name", "status", "criticality"]
    )
    if errors:
        raise ValueError(f"Validation errors loading agents: {errors}")
    return data


def load_system_integrations(data_dir: str, systems_file: str) -> List[Dict[str, Any]]:
    """Load system integration data"""
    file_path = Path(data_dir) / systems_file
    data, errors = validate_json_file(
        file_path,
        expected_type=list,
        item_type=dict,
        required_fields=["system_id", "type", "uptime_30d", "latency_ms_p95", "auth_status"]
    )
    if errors:
        raise ValueError(f"Validation errors loading systems: {errors}")
    return data


def load_workflows(data_dir: str, workflows_file: str) -> List[Dict[str, Any]]:
    """Load workflow data"""
    file_path = Path(data_dir) / workflows_file
    data, errors = validate_json_file(
        file_path,
        expected_type=list,
        item_type=dict,
        required_fields=["workflow_id", "agent_id", "failure_rate_7d"]
    )
    if errors:
        raise ValueError(f"Validation errors loading workflows: {errors}")
    return data


def load_risk_signals(data_dir: str, risks_file: str) -> List[Dict[str, Any]]:
    """Load risk signal data"""
    file_path = Path(data_dir) / risks_file
    data, errors = validate_json_file(
        file_path,
        expected_type=list,
        item_type=dict,
        required_fields=["risk_event_id", "risk_id", "agent_id", "risk_type", "severity", "status"]
    )
    if errors:
        raise ValueError(f"Validation errors loading risks: {errors}")
    return data


def load_kpis_cost(data_dir: str, kpis_file: str) -> List[Dict[str, Any]]:
    """Load KPI and cost metrics data"""
    file_path = Path(data_dir) / kpis_file
    data, errors = validate_json_file(
        file_path,
        expected_type=list,
        item_type=dict,
        required_fields=["agent_id", "kpis"]
    )
    if errors:
        raise ValueError(f"Validation errors loading KPIs: {errors}")
    return data


def load_historical_snapshots(data_dir: str, snapshots_file: str) -> List[Dict[str, Any]]:
    """Load historical snapshot data (v2)"""
    file_path = Path(data_dir) / snapshots_file
    data, errors = validate_json_file(
        file_path,
        expected_type=list,
        item_type=dict,
        required_fields=["snapshot_date", "agent_id", "integration_score", "risk_score", "value_leakage_score"]
    )
    if errors:
        raise ValueError(f"Validation errors loading snapshots: {errors}")
    return data


def load_ownership_review_history(data_dir: str, reviews_file: str) -> List[Dict[str, Any]]:
    """Load ownership review history data (v2)"""
    file_path = Path(data_dir) / reviews_file
    data, errors = validate_json_file(
        file_path,
        expected_type=list,
        item_type=dict,
        required_fields=["review_id", "agent_id", "review_date", "review_type", "review_outcome"]
    )
    if errors:
        raise ValueError(f"Validation errors loading reviews: {errors}")
    return data


def load_expected_vs_actual_value(data_dir: str, expected_file: str) -> List[Dict[str, Any]]:
    """Load expected vs actual value data (v2)"""
    file_path = Path(data_dir) / expected_file
    data, errors = validate_json_file(
        file_path,
        expected_type=list,
        item_type=dict,
        required_fields=["period_start", "period_end", "agent_id", "expected", "actual"]
    )
    if errors:
        raise ValueError(f"Validation errors loading expected vs actual: {errors}")
    return data


def build_lookups(
    agents: List[Dict[str, Any]],
    systems: List[Dict[str, Any]],
    workflows: List[Dict[str, Any]],
    risks: List[Dict[str, Any]],
    kpis: List[Dict[str, Any]],
    snapshots: Optional[List[Dict[str, Any]]] = None,
    reviews: Optional[List[Dict[str, Any]]] = None,
    expected_vs_actual: Optional[List[Dict[str, Any]]] = None
) -> Dict[str, Any]:
    """Build lookup dictionaries for fast access"""
    agents_lookup = {agent["agent_id"]: agent for agent in agents}
    systems_lookup = {system["system_id"]: system for system in systems}

    workflows_lookup: Dict[str, List[Dict[str, Any]]] = {}
    for workflow in workflows:
        agent_id = workflow["agent_id"]
        if agent_id not in workflows_lookup:
            workflows_lookup[agent_id] = []
        workflows_lookup[agent_id].append(workflow)

    risks_lookup: Dict[str, List[Dict[str, Any]]] = {}
    for risk in risks:
        agent_id = risk["agent_id"]
        if agent_id not in risks_lookup:
            risks_lookup[agent_id] = []
        risks_lookup[agent_id].append(risk)

    kpis_lookup = {kpi["agent_id"]: kpi["kpis"] for kpi in kpis}

    result = {
        "agents_lookup": agents_lookup,
        "systems_lookup": systems_lookup,
        "workflows_lookup": workflows_lookup,
        "risks_lookup": risks_lookup,
        "kpis_lookup": kpis_lookup
    }

    # v2 lookups
    if snapshots:
        snapshots_lookup: Dict[str, List[Dict[str, Any]]] = {}
        for snapshot in snapshots:
            agent_id = snapshot["agent_id"]
            if agent_id not in snapshots_lookup:
                snapshots_lookup[agent_id] = []
            snapshots_lookup[agent_id].append(snapshot)
        result["historical_snapshots_lookup"] = snapshots_lookup

    if reviews:
        reviews_lookup: Dict[str, List[Dict[str, Any]]] = {}
        for review in reviews:
            agent_id = review["agent_id"]
            if agent_id not in reviews_lookup:
                reviews_lookup[agent_id] = []
            reviews_lookup[agent_id].append(review)
        result["ownership_reviews_lookup"] = reviews_lookup

    if expected_vs_actual:
        expected_lookup: Dict[str, List[Dict[str, Any]]] = {}
        for period in expected_vs_actual:
            agent_id = period["agent_id"]
            if agent_id not in expected_lookup:
                expected_lookup[agent_id] = []
            expected_lookup[agent_id].append(period)
        result["expected_vs_actual_lookup"] = expected_lookup

    return result
