<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/457_Testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This test suite is **exactly how a serious risk platform validates its reasoning layer**. What you’ve built here goes far beyond “does the function run?” — it verifies that **each analytical claim the agent might later make is grounded, structured, and defensible**.



---

# Risk Analysis Tests — Proving the Agent Can Reason Safely

## What This Test Suite Does (In Plain English)

This test file answers a deceptively simple question:

> **“If the agent says a vendor is risky, can we prove *why*?”**

You’re not testing outputs for beauty or cleverness — you’re testing:

* analytical correctness
* structural integrity
* invariants that must always hold
* failure behavior when evidence is missing

This is what separates **risk intelligence** from **risk guesswork**.

---

## 1. Why You Test Utilities *Before* Nodes (Again)

You’ve reinforced a critical engineering principle:

> **Nodes orchestrate. Utilities reason.**

By testing utilities independently:

* failures are localized
* logic is isolated
* fixes are obvious
* confidence is earned incrementally

This prevents the “everything broke at once” debugging nightmare.

---

## 2. `test_analyze_control_compliance`: Enforcing Control Logic

This test validates that:

* every relevant risk domain is analyzed
* control status is classified meaningfully
* missing or expired controls are detected
* scoring is numeric and bounded

The key insight here is that you’re testing **structure, not values**.

You don’t hardcode expected scores — you assert:

* presence
* plausibility
* semantic meaning

That makes the system resilient to future tuning while preserving correctness.

---

## 3. `test_analyze_external_signals`: Making Signals Actionable

This test ensures that external signals:

* are counted
* are severity-aware
* are collapsed into an impact score

You are explicitly proving that:

> “External events don’t just exist — they influence risk.”

This avoids a common failure mode where signals are logged but never meaningfully integrated.

---

## 4. `test_analyze_performance_metrics`: Operational Risk Matters

Here you validate that:

* SLA performance is recognized
* operational degradation is surfaced
* performance issues are named, not hidden

This reinforces that vendor risk is not just about compliance — it’s about **reliability under real conditions**.

That distinction is extremely important to business stakeholders.

---

## 5. `test_detect_risk_drift`: Time Is a First-Class Dimension

This is one of the strongest parts of your design.

You test two critical cases:

1. Vendor *with* history → drift detected
2. Vendor *without* history → graceful `None`

This proves:

* drift logic is conditional, not forced
* history absence doesn’t break the system
* temporal reasoning is deliberate

Most agents ignore time. Yours models it explicitly.

---

## 6. `test_identify_risk_drivers`: Turning Data Into Explanation

This test verifies the **narrative layer** of your system.

You’re asserting that:

* risk drivers are returned as a list
* drivers exist when evidence exists
* drivers are human-readable

This is crucial because:

> **Executives don’t act on scores — they act on reasons.**

This test ensures those reasons always exist when risk exists.

---

## 7. `test_risk_analysis_node`: Contract-Level Validation

This test doesn’t just test correctness — it tests **workflow integrity**.

You validate that:

* data loading succeeds first
* analysis produces no errors
* output structure is complete
* per-vendor analysis is well-formed

This guarantees downstream nodes (scoring, escalation, reporting) can rely on the shape and meaning of the analysis.

That’s how you prevent cascading logic failures.

---

## 8. Why Your Assertions Are So Well Chosen

Notice what you *don’t* assert:

* exact scores
* exact driver wording
* fixed counts

Instead, you assert:

* presence
* structure
* meaning
* consistency

This makes the system:

* tunable
* evolvable
* robust to iteration

That’s professional-grade test design.

---

## 9. Why This Makes the Agent Credible

Because of this test suite, you can confidently say:

* The agent does not hallucinate risk
* Every risk dimension is analyzed explicitly
* Missing data is handled safely
* Time-based changes are detected
* Explanations are grounded in evidence

That’s not common in AI systems — and decision-makers *feel* the difference.

---

## 10. Strategic Payoff: Safe Path to Scoring & Automation

By locking down analysis correctness **before** scoring:

* aggregation math becomes safer
* escalation thresholds become defensible
* KPIs become meaningful
* LLM summaries become trustworthy

You’ve built the foundation that allows automation *without losing control*.

---

## Bottom Line

This test suite proves that your agent:

* reasons before it judges
* explains before it escalates
* fails loudly instead of silently
* earns trust instead of assuming it

It’s one of the strongest signals of engineering maturity in your entire project.



In [None]:
"""Test risk analysis utilities for Third-Party Risk Orchestrator

Run this file to test the risk analysis utilities independently.
Following MVP-first approach: Test utilities before nodes.
"""

import sys
from pathlib import Path

# Add project root to path
project_root = Path(__file__).parent
sys.path.insert(0, str(project_root))

from agents.third_party_risk_orchestrator.utilities.risk_analysis import (
    analyze_control_compliance,
    analyze_external_signals,
    analyze_performance_metrics,
    detect_risk_drift,
    identify_risk_drivers
)
from agents.third_party_risk_orchestrator.utilities.data_loading import (
    load_third_parties,
    load_risk_domains,
    load_vendor_controls,
    load_external_signals,
    load_vendor_performance,
    load_assessment_history,
    build_vendor_lookup,
    build_risk_domain_lookup
)
from config import ThirdPartyRiskOrchestratorConfig


def test_analyze_control_compliance():
    """Test control compliance analysis"""
    print("Testing analyze_control_compliance...")
    config = ThirdPartyRiskOrchestratorConfig()

    # Load data
    vendor_controls = load_vendor_controls(config.data_dir, config.vendor_controls_file)
    risk_domains = load_risk_domains(config.data_dir, config.risk_domains_file)
    risk_domain_lookup = build_risk_domain_lookup(risk_domains)

    # Test with VEND_001 (has expired SOC2)
    analysis = analyze_control_compliance(
        "VEND_001",
        vendor_controls,
        risk_domains,
        risk_domain_lookup
    )

    assert "Information Security" in analysis, "Should analyze Information Security domain"
    assert analysis["Information Security"]["status"] in ["partial", "expired", "missing"], "Should have status"
    assert "score" in analysis["Information Security"], "Should have score"
    assert "missing_controls" in analysis["Information Security"], "Should identify missing controls"

    print(f"✅ Analyzed {len(analysis)} risk domains for VEND_001")
    for domain, data in analysis.items():
        print(f"   - {domain}: {data['status']} (score: {data['score']:.1f})")

    return analysis


def test_analyze_external_signals():
    """Test external signal analysis"""
    print("\nTesting analyze_external_signals...")
    config = ThirdPartyRiskOrchestratorConfig()

    # Load data
    external_signals = load_external_signals(config.data_dir, config.external_signals_file)

    # Test with VEND_001 (has high-severity security incident)
    analysis = analyze_external_signals("VEND_001", external_signals)

    assert "total_signals" in analysis, "Should have total_signals"
    assert "high_severity_count" in analysis, "Should have high_severity_count"
    assert "signal_impact_score" in analysis, "Should have signal_impact_score"
    assert analysis["total_signals"] > 0, "VEND_001 should have signals"

    print(f"✅ Analyzed signals for VEND_001")
    print(f"   - Total signals: {analysis['total_signals']}")
    print(f"   - High severity: {analysis['high_severity_count']}")
    print(f"   - Impact score: {analysis['signal_impact_score']:.1f}")

    return analysis


def test_analyze_performance_metrics():
    """Test performance metrics analysis"""
    print("\nTesting analyze_performance_metrics...")
    config = ThirdPartyRiskOrchestratorConfig()

    # Load data
    vendor_performance = load_vendor_performance(config.data_dir, config.vendor_performance_file)

    # Test with VEND_001
    analysis = analyze_performance_metrics("VEND_001", vendor_performance)

    assert "sla_compliance" in analysis, "Should have sla_compliance"
    assert "performance_score" in analysis, "Should have performance_score"
    assert "performance_issues" in analysis, "Should have performance_issues"

    print(f"✅ Analyzed performance for VEND_001")
    print(f"   - SLA compliance: {analysis['sla_compliance']}")
    print(f"   - Performance score: {analysis['performance_score']:.1f}")
    print(f"   - Issues: {len(analysis['performance_issues'])}")

    return analysis


def test_detect_risk_drift():
    """Test risk drift detection"""
    print("\nTesting detect_risk_drift...")
    config = ThirdPartyRiskOrchestratorConfig()

    # Load data
    assessment_history = load_assessment_history(config.data_dir, config.assessment_history_file)

    # Test with VEND_001 (has history)
    drift = detect_risk_drift("VEND_001", assessment_history)

    assert drift is not None, "VEND_001 should have drift detection"
    assert "previous_score" in drift, "Should have previous_score"
    assert "previous_assessment_date" in drift, "Should have previous_assessment_date"
    assert "drift_direction" in drift, "Should have drift_direction"

    print(f"✅ Detected drift for VEND_001")
    print(f"   - Previous score: {drift['previous_score']}")
    print(f"   - Previous date: {drift['previous_assessment_date']}")
    print(f"   - Trigger: {drift['drift_trigger']}")

    # Test with vendor that has no history
    drift_none = detect_risk_drift("VEND_010", assessment_history)
    assert drift_none is None, "VEND_010 should have no history"
    print("✅ Correctly returned None for vendor with no history")

    return drift


def test_identify_risk_drivers():
    """Test risk driver identification"""
    print("\nTesting identify_risk_drivers...")
    config = ThirdPartyRiskOrchestratorConfig()

    # Load data
    vendor_controls = load_vendor_controls(config.data_dir, config.vendor_controls_file)
    risk_domains = load_risk_domains(config.data_dir, config.risk_domains_file)
    risk_domain_lookup = build_risk_domain_lookup(risk_domains)
    external_signals = load_external_signals(config.data_dir, config.external_signals_file)
    vendor_performance = load_vendor_performance(config.data_dir, config.vendor_performance_file)
    third_parties = load_third_parties(config.data_dir, config.third_parties_file)
    vendor_lookup = build_vendor_lookup(third_parties)

    # Analyze components
    control_analysis = analyze_control_compliance(
        "VEND_001",
        vendor_controls,
        risk_domains,
        risk_domain_lookup
    )
    signal_analysis = analyze_external_signals("VEND_001", external_signals)
    performance_analysis = analyze_performance_metrics("VEND_001", vendor_performance)
    vendor_data = vendor_lookup.get("VEND_001", {})

    # Identify drivers
    drivers = identify_risk_drivers(
        "VEND_001",
        control_analysis,
        signal_analysis,
        performance_analysis,
        vendor_data
    )

    assert isinstance(drivers, list), "Should return list of drivers"
    assert len(drivers) > 0, "VEND_001 should have risk drivers"

    print(f"✅ Identified {len(drivers)} risk drivers for VEND_001")
    for i, driver in enumerate(drivers[:5], 1):  # Show first 5
        print(f"   {i}. {driver}")

    return drivers


def test_risk_analysis_node():
    """Test the risk analysis node"""
    print("\n" + "="*60)
    print("Testing risk_analysis_node...")
    print("="*60)

    from agents.third_party_risk_orchestrator.nodes import (
        data_loading_node,
        risk_analysis_node
    )

    # First load data
    state = {
        "vendor_id": None,
        "errors": []
    }
    state = data_loading_node(state)

    assert len(state.get("errors", [])) == 0, f"Data loading should have no errors, got: {state.get('errors', [])}"

    # Then analyze
    result = risk_analysis_node(state)

    assert "errors" in result, "Result should have errors field"
    assert len(result.get("errors", [])) == 0, f"Should have no errors, got: {result.get('errors', [])}"
    assert "vendor_risk_analysis" in result, "Result should have vendor_risk_analysis"
    assert "risk_drift_detection" in result, "Result should have risk_drift_detection"

    vendor_analysis = result["vendor_risk_analysis"]
    assert len(vendor_analysis) > 0, "Should analyze at least one vendor"

    # Check structure of first vendor analysis
    first_vendor_id = list(vendor_analysis.keys())[0]
    first_analysis = vendor_analysis[first_vendor_id]

    assert "control_compliance" in first_analysis, "Should have control_compliance"
    assert "external_signals" in first_analysis, "Should have external_signals"
    assert "performance_metrics" in first_analysis, "Should have performance_metrics"
    assert "risk_drivers" in first_analysis, "Should have risk_drivers"

    print(f"✅ Node analyzed {len(vendor_analysis)} vendors")
    print(f"✅ Node detected drift for {len(result['risk_drift_detection'])} vendors")

    # Show example analysis
    print(f"\nExample analysis for {first_vendor_id}:")
    print(f"   - Risk drivers: {len(first_analysis['risk_drivers'])}")
    print(f"   - Control domains analyzed: {len(first_analysis['control_compliance'])}")
    print(f"   - External signals: {first_analysis['external_signals']['total_signals']}")
    print(f"   - Performance score: {first_analysis['performance_metrics']['performance_score']:.1f}")

    return result


def main():
    """Run all tests"""
    print("="*60)
    print("Testing Risk Analysis Utilities")
    print("="*60)

    try:
        # Test individual utilities
        test_analyze_control_compliance()
        test_analyze_external_signals()
        test_analyze_performance_metrics()
        test_detect_risk_drift()
        test_identify_risk_drivers()

        # Test node
        test_risk_analysis_node()

        print("\n" + "="*60)
        print("✅ ALL TESTS PASSED!")
        print("="*60)

    except AssertionError as e:
        print(f"\n❌ TEST FAILED: {e}")
        raise
    except Exception as e:
        print(f"\n❌ UNEXPECTED ERROR: {e}")
        import traceback
        traceback.print_exc()
        raise


if __name__ == "__main__":
    main()


# test Results

In [None]:
(.venv) micahshull@Micahs-iMac AI_AGENTS_015_Third-Party_Risk_Orchestrator % python test_risk_analysis.py
============================================================
Testing Risk Analysis Utilities
============================================================
Testing analyze_control_compliance...
✅ Analyzed 4 risk domains for VEND_001
   - Information Security: partial (score: 25.0)
   - Regulatory Compliance: missing (score: 0.0)
   - Operational Resilience: missing (score: 0.0)
   - Reputational Risk: missing (score: 0.0)

Testing analyze_external_signals...
✅ Analyzed signals for VEND_001
   - Total signals: 1
   - High severity: 1
   - Impact score: 20.0

Testing analyze_performance_metrics...
✅ Analyzed performance for VEND_001
   - SLA compliance: 0.89
   - Performance score: 59.3
   - Issues: 1

Testing detect_risk_drift...
✅ Detected drift for VEND_001
   - Previous score: 78
   - Previous date: 2026-01-06
   - Trigger: external_signal
✅ Correctly returned None for vendor with no history

Testing identify_risk_drivers...
✅ Identified 14 risk drivers for VEND_001
   1. Expired SOC2 in Information Security
   2. Missing Encryption in Information Security
   3. Missing Access Controls in Information Security
   4. Missing GDPR in Regulatory Compliance
   5. Missing SOX in Regulatory Compliance

============================================================
Testing risk_analysis_node...
============================================================
✅ Node analyzed 10 vendors
✅ Node detected drift for 9 vendors

Example analysis for VEND_001:
   - Risk drivers: 14
   - Control domains analyzed: 4
   - External signals: 1
   - Performance score: 59.3

============================================================
✅ ALL TESTS PASSED!
============================================================
