<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/452_TPRO_AgentState.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Third-Party Risk Orchestrator — State & Configuration Overview

## What This Code Does (Big Picture)

This code defines **how the Third-Party Risk Orchestrator thinks, remembers, and stays accountable**.

Instead of scattering logic across opaque functions, you’ve made the orchestrator **explicitly stateful**. Every important decision, signal, escalation, and outcome flows through a single, inspectable state object, governed by a transparent configuration layer.

This design makes the agent:

* auditable
* debuggable
* explainable to non-engineers
* safe to operate in high-risk environments

---

## 1. The Orchestrator State: A Living Risk Ledger

### Why a Stateful Design Matters

`ThirdPartyRiskOrchestratorState` is the **single source of truth** for the entire workflow.

Rather than recomputing risk in isolation, the orchestrator:

* ingests evidence
* reasons over it
* tracks changes over time
* records human decisions
* measures its own performance

All of that happens inside one well-defined structure.

This mirrors how real risk teams operate — with **files, histories, approvals, and outcomes**, not stateless API calls.

---

## 2. Clear Separation of Responsibility Inside the State

The state is intentionally divided into logical layers. This is not accidental — it’s what makes the system trustworthy.

### Input & Planning Layer

```python
vendor_id
run_date
goal
plan
```

This allows the orchestrator to operate in **controlled modes**:

* assess one vendor
* assess all vendors
* run on a scheduled cadence
* run in response to an external event

Even though the MVP uses fixed goals and plans, the structure already supports future planning intelligence without refactoring.

---

### Data Ingestion Layer (Evidence, Not Opinions)

```python
third_parties
risk_domains
vendor_controls
external_signals
vendor_performance
assessment_history
```

This layer answers a critical executive question:

> “What evidence did the system actually look at?”

Each dataset corresponds to a **real-world source of risk intelligence**:

* contracts & ownership
* policy expectations
* control documentation
* external events
* operational performance
* historical trend data

Nothing is hidden. Nothing is inferred magically.

---

### Lookup Layer (Performance & Control)

```python
vendor_lookup
risk_domain_lookup
```

These fields exist for one reason: **deterministic behavior at scale**.

Instead of repeatedly searching raw lists, the orchestrator builds explicit lookup tables. This:

* improves performance
* reduces accidental inconsistency
* makes decision paths reproducible

This is the kind of detail auditors and senior engineers care about.

---

## 3. Risk Reasoning, Not Just Risk Scoring

### Vendor Risk Analysis

```python
vendor_risk_analysis
```

This is where the orchestrator **builds its internal case** for each vendor.

Instead of jumping straight to a score, the agent records:

* control compliance by domain
* missing or degraded controls
* external signals
* performance weaknesses
* narrative risk drivers

This makes downstream decisions *explainable*, not just numerically defensible.

---

### Risk Drift Detection (A Major Differentiator)

```python
risk_drift_detection
```

This is one of the strongest parts of your design.

You are explicitly modeling **change over time**, including:

* previous vs current score
* direction of change
* what triggered the change

This enables:

* early warning KPIs
* post-incident analysis
* scoring model validation
* executive confidence that risk is being monitored continuously

Most AI agents completely skip this.

---

## 4. Decision Output & Governance

### Risk Assessments

```python
risk_assessments
escalation_required
```

This is the orchestrator’s **decision output**, not just an analysis artifact.

Each assessment includes:

* a score
* a category
* the domains driving the risk
* a recommended action
* a clear signal of whether humans must intervene

This allows leadership to answer:

> “What is risky, why, and what are we doing about it?”

---

### Human-in-the-Loop (HITL)

```python
pending_approvals
approval_history
```

This is where **trust is enforced**.

The system never pretends to replace human authority. Instead, it:

* routes high-impact decisions
* records who approved what
* preserves rationale and conditions

This makes the system safe to deploy in regulated environments.

---

### Mitigation Tracking (Closing the Loop)

```python
mitigation_actions
```

Risk management fails when issues are identified but never fixed.

By tracking mitigation actions explicitly, the orchestrator can measure:

* follow-through
* remediation speed
* overdue actions

This turns risk assessment into **risk management**, not just reporting.

---

## 5. Measuring the Agent Itself (Accountability Layer)

### KPI Metrics

```python
kpi_metrics
orchestrator_metrics
```

This is where your philosophy really shows.

The agent does not ask to be trusted — it **measures itself**:

* operational health
* effectiveness
* business value
* cost vs benefit

Executives don’t have to guess whether the system is working. The system tells them.

---

### Progress & Status Tracking

```python
progress_percentage
orchestrator_status
completion_reason
```

These fields make the agent operationally safe:

* resumable
* observable
* failure-aware

This matters in real deployments, not demos.

---

## 6. Configuration: Executive Control Without Code Changes

The `ThirdPartyRiskOrchestratorConfig` class is the **control panel** for the system.

### Why This Matters

Every important behavior is configurable:

* risk thresholds
* escalation rules
* KPI targets
* auto-approval behavior
* LLM usage

This means:

* policies change without rewrites
* executives retain control
* experiments are safe and reversible

---

### Rules First, LLMs Second

```python
enable_llm_rationale = False
enable_llm_summary = False
```

This is a deliberate design choice.

The system works **entirely without LLMs** in MVP mode. When enabled later, LLMs:

* explain decisions
* summarize outcomes
* never replace core logic

This dramatically improves trust and debuggability.

---

## Why This Design Inspires Confidence

This code shows that the orchestrator:

* does not hide its reasoning
* does not blur analysis and decision
* does not over-automate authority
* does not rely on black-box AI

Instead, it behaves like a **well-run risk organization**, expressed in software.

That is exactly why a CEO, compliance leader, or auditor would trust it.




In [None]:
# ============================================================================
# Third-Party Risk Orchestrator Agent
# ============================================================================

class ThirdPartyRiskOrchestratorState(TypedDict, total=False):
    """State for Third-Party Risk Orchestrator Agent"""

    # Input fields
    vendor_id: Optional[str]                      # Specific vendor to assess (None = assess all vendors)
    run_date: Optional[str]                       # Assessment run date (ISO format, defaults to today)

    # Goal & Planning fields (MVP: Fixed goal, template-based plan)
    goal: Dict[str, Any]                          # Goal definition (from goal_node)
    plan: List[Dict[str, Any]]                    # Execution plan (from planning_node)

    # Data Ingestion
    third_parties: List[Dict[str, Any]]           # Loaded vendor data
    # Structure per vendor:
    # {
    #   "vendor_id": "VEND_001",
    #   "vendor_name": "CloudOps Solutions",
    #   "vendor_type": "Cloud Infrastructure",
    #   "criticality": "high",
    #   "data_access_level": "sensitive",
    #   "business_owner": "IT Operations",
    #   "contract_status": "active",
    #   "onboarding_date": "2023-06-15",
    #   "last_full_review": "2024-09-01"
    # }

    risk_domains: List[Dict[str, Any]]            # Loaded risk domain definitions
    # Structure per domain:
    # {
    #   "risk_domain": "Information Security",
    #   "weight": 0.35,
    #   "required_controls": ["SOC2", "Encryption", "Access Controls"],
    #   "escalation_threshold": 70
    # }

    vendor_controls: List[Dict[str, Any]]        # Loaded control evidence
    # Structure per control:
    # {
    #   "vendor_id": "VEND_001",
    #   "risk_domain": "Information Security",
    #   "control": "SOC2",
    #   "status": "expired" | "active" | "partial",
    #   "evidence_date": "2023-08-01",
    #   "confidence": 0.88,
    #   "document_evidence": {...}
    # }

    external_signals: List[Dict[str, Any]]       # Loaded external risk signals
    # Structure per signal:
    # {
    #   "signal_id": "SIG_001",
    #   "vendor_id": "VEND_001",
    #   "signal_type": "security_incident" | "regulatory_notice" | "negative_media" | "service_disruption" | "audit_result",
    #   "severity": "high" | "medium" | "low",
    #   "source": "news" | "regulator" | "internal_monitoring" | "internal_audit",
    #   "source_url": Optional[str],
    #   "source_confidence": 0.92,
    #   "detected_date": "2026-01-05",
    #   "summary": "..."
    # }

    vendor_performance: List[Dict[str, Any]]      # Loaded performance metrics
    # Structure per vendor:
    # {
    #   "vendor_id": "VEND_001",
    #   "metric_period": "2025-Q4",
    #   "sla_compliance": 0.89,
    #   "incident_count": 4,
    #   "response_time_avg_hours": 5.6,
    #   "customer_satisfaction_score": 3.4
    # }

    assessment_history: List[Dict[str, Any]]      # Loaded historical assessments
    # Structure per assessment:
    # {
    #   "vendor_id": "VEND_001",
    #   "assessment_date": "2025-10-01",
    #   "risk_score": 42,
    #   "risk_level": "medium",
    #   "trigger": "scheduled_review" | "external_signal",
    #   "signal_id": Optional[str]
    # }

    # Data Lookups (for fast access)
    vendor_lookup: Dict[str, Dict[str, Any]]      # vendor_id → vendor data
    risk_domain_lookup: Dict[str, Dict[str, Any]] # risk_domain → domain definition

    # Risk Analysis
    vendor_risk_analysis: Dict[str, Dict[str, Any]]  # Per-vendor risk analysis
    # Structure per vendor:
    # {
    #   "vendor_id": "VEND_001",
    #   "control_compliance": {
    #     "Information Security": {"status": "partial", "score": 45, "missing_controls": ["SOC2"]},
    #     "Regulatory Compliance": {"status": "active", "score": 80}
    #   },
    #   "external_signals": [...],
    #   "performance_metrics": {...},
    #   "risk_drivers": ["Expired SOC2", "Recent security incident"]
    # }

    risk_drift_detection: Dict[str, Dict[str, Any]]  # Risk changes over time
    # Structure per vendor:
    # {
    #   "vendor_id": "VEND_001",
    #   "previous_score": 42,
    #   "current_score": 78,
    #   "score_delta": 36,
    #   "drift_direction": "increasing",
    #   "drift_trigger": "external_signal",
    #   "signal_id": "SIG_001"
    # }

    # Risk Scoring
    risk_assessments: List[Dict[str, Any]]        # Calculated risk assessments
    # Structure per assessment:
    # {
    #   "assessment_id": "RA_001",
    #   "vendor_id": "VEND_001",
    #   "assessment_date": "2026-01-10",
    #   "overall_risk_score": 78,
    #   "risk_level": "high",
    #   "primary_risk_domains": ["Information Security", "Operational Resilience"],
    #   "key_drivers": ["Expired SOC2 report", "Recent high-severity security incident"],
    #   "recommended_action": "Immediate remediation plan and executive review",
    #   "human_review_required": true
    # }

    escalation_required: List[str]                # Vendor IDs requiring human review

    # HITL (Human-in-the-Loop) - Toolshed Integration
    pending_approvals: List[Dict[str, Any]]       # Escalations awaiting review
    # Structure per approval:
    # {
    #   "vendor_id": "VEND_001",
    #   "assessment_id": "RA_001",
    #   "risk_score": 78,
    #   "risk_level": "high",
    #   "escalation_reason": "Risk score exceeds threshold",
    #   "requested_at": "2026-01-10T10:00:00",
    #   "status": "pending"
    # }

    approval_history: List[Dict[str, Any]]       # Review decisions made
    # Structure per approval:
    # {
    #   "review_id": "HR_001",
    #   "assessment_id": "RA_001",
    #   "vendor_id": "VEND_001",
    #   "reviewer_role": "Chief Information Security Officer",
    #   "decision": "approve_with_conditions",
    #   "conditions": [...],
    #   "decision_date": "2026-01-11",
    #   "rationale": "..."
    # }

    mitigation_actions: List[Dict[str, Any]]      # Mitigation actions created
    # Structure per action:
    # {
    #   "action_id": "MIT_001",
    #   "vendor_id": "VEND_001",
    #   "assessment_id": "RA_001",
    #   "action_type": "security_remediation_plan",
    #   "status": "in_progress",
    #   "created_date": "2026-01-11",
    #   "target_completion_date": "2026-02-10",
    #   "assigned_to": "Security Officer"
    # }

    # KPI Metrics - Toolshed Integration
    kpi_metrics: Dict[str, Any]                   # Current KPI values
    # Structure:
    # {
    #   "operational": {
    #     "assessments_completed": 10,
    #     "avg_assessment_latency_minutes": 26,
    #     "human_escalations": 2,
    #     "policy_validation_failures": 1
    #   },
    #   "effectiveness": {
    #     "time_to_identify_risk_hours": 24,
    #     "manual_review_reduction_percent": 60,
    #     "risk_score_consistency": 0.92
    #   },
    #   "business": {
    #     "cost_per_assessment_usd": 31.82,
    #     "manual_hours_saved": 18.5,
    #     "estimated_cost_avoidance_usd": 52000
    #   }
    # }

    orchestrator_metrics: Dict[str, Any]          # Run-level metrics
    # Structure:
    # {
    #   "run_id": "RUN_2026_01_10",
    #   "run_date": "2026-01-10",
    #   "vendors_evaluated": 10,
    #   "assessments_completed": 10,
    #   "high_risk_vendors": 2,
    #   "medium_risk_vendors": 3,
    #   "low_risk_vendors": 5,
    #   "human_escalations": 2,
    #   "mitigation_actions_created": 4,
    #   "total_run_cost_usd": 318.15,
    #   "net_value_usd": 51681.85,
    #   "roi_percentage": 16244.0
    # }

    # Progress Tracking - Toolshed Integration
    progress_percentage: float                    # 0-100
    vendors_completed: int                        # Count of vendors assessed
    vendors_total: int                            # Total vendors to assess
    elapsed_time_minutes: float                   # Time since run start
    estimated_remaining_minutes: float            # Estimated time to completion
    run_start_time: Optional[str]                 # ISO timestamp when run started

    # Orchestrator Status
    orchestrator_status: str                      # "not_started" | "in_progress" | "awaiting_approval" | "completed" | "failed"
    completion_reason: Optional[str]              # Why orchestrator completed/failed

    # Output
    risk_assessment_report: str                   # Final markdown report
    report_file_path: Optional[str]               # Path to saved report file

    # Metadata (Universal patterns - always include)
    errors: List[str]                             # Any errors encountered
    processing_time: Optional[float]              # Time taken to process (seconds)


@dataclass
class ThirdPartyRiskOrchestratorConfig:
    """Configuration for Third-Party Risk Orchestrator Agent"""

    # LLM Settings
    llm_model: str = os.getenv("LLM_MODEL", "gpt-4o-mini")
    temperature: float = 0.3

    # Data file paths
    data_dir: str = "agents/data"
    third_parties_file: str = "third_parties.json"
    risk_domains_file: str = "risk_domains.json"
    vendor_controls_file: str = "vendor_controls.json"
    external_signals_file: str = "external_signals.json"
    vendor_performance_file: str = "vendor_performance.json"
    assessment_history_file: str = "assessment_history.json"
    risk_assessments_file: str = "risk_assessments.json"
    human_reviews_file: str = "human_reviews.json"
    mitigation_actions_file: str = "mitigation_actions.json"
    orchestrator_metrics_file: str = "orchestrator_metrics.json"

    # Output settings
    reports_dir: str = "output/third_party_risk_orchestrator"

    # Risk Scoring Settings
    risk_score_scale: int = 100                   # Risk score scale (0-100)
    high_risk_threshold: float = 70.0             # Score >= 70 = high risk
    medium_risk_threshold: float = 40.0            # Score >= 40 = medium risk
    # Score < 40 = low risk

    # Escalation Settings
    auto_escalate_high_risk: bool = True          # Auto-escalate high-risk vendors
    escalation_threshold_override: bool = False   # Allow domain-specific thresholds

    # HITL Settings
    approval_timeout_minutes: int = 60             # Max time to wait for approval
    auto_approve_for_testing: bool = True         # Auto-approve for testing (MVP)

    # KPI Thresholds (CEO-friendly transparency)
    kpi_warning_threshold: float = 0.8            # Warn if KPI is 80% of target
    kpi_critical_threshold: float = 0.5           # Critical if KPI is 50% of target

    # Operational KPI Targets
    target_assessment_completion_rate: float = 0.95    # 95% completion rate target
    target_avg_assessment_latency_minutes: float = 30.0  # 30 min average latency
    max_human_escalation_rate: float = 0.30              # Max 30% escalation rate
    target_policy_validation_pass_rate: float = 0.90   # 90% validation pass rate

    # Toolshed Integration Flags
    enable_progress_tracking: bool = True         # Use toolshed.progress
    enable_kpi_tracking: bool = True              # Use toolshed.kpi
    enable_hitl: bool = True                      # Use toolshed.hitl
    enable_reporting: bool = True                 # Use toolshed.reporting

    # LLM Enhancement (Optional - Phase 8)
    enable_llm_rationale: bool = False            # Enable LLM-generated risk rationales (MVP: rule-based)
    llm_rationale_max_vendors: int = 3           # Only enhance top N high-risk vendors
    enable_llm_summary: bool = False               # Enable LLM-generated executive summary (MVP: rule-based)




# Risk Scoring, Escalation & Governance Controls

## Why This Section Exists

This configuration block is where **power, accountability, and adaptability** live.

Instead of burying judgment inside code, you’ve externalized:

* what “risk” means
* when humans must intervene
* how performance is evaluated
* when the system should raise concern
* how much autonomy the agent is allowed

This is what turns the agent from an automation tool into a **managed decision system**.

---

## 1. Risk Scoring: Making Risk Legible, Not Mysterious

```python
risk_score_scale = 100
high_risk_threshold = 70.0
medium_risk_threshold = 40.0
```

### What This Does in Practice

This establishes a **shared language of risk**.

Everyone — engineers, compliance officers, executives — can understand a 0–100 scale. There’s no probabilistic ambiguity or opaque model output.

Risk is:

* comparable across vendors
* stable over time
* explainable in plain terms

### Why This Improves Quality

Because the thresholds are explicit:

* scoring logic can be tested
* thresholds can be tuned
* false positives can be identified
* decisions can be defended

Nothing is hidden inside a model weight or embedding.

---

## 2. Escalation Settings: Where Automation Ends and Authority Begins

```python
auto_escalate_high_risk = True
escalation_threshold_override = False
```

### What This Does

This defines the **boundary between machine judgment and human authority**.

By default:

* high-risk vendors are automatically escalated
* humans are guaranteed visibility on material risk

The override flag introduces flexibility without chaos:

* domain-specific escalation thresholds can be enabled
* risk-sensitive domains (e.g., InfoSec vs Reputation) can be treated differently

### Why This Matters for Trust

Executives don’t fear automation — they fear **uncontrolled automation**.

This design guarantees:

* predictable escalation behavior
* no silent approvals of material risk
* clear control points

That’s how you earn deployment approval in real organizations.

---

## 3. HITL Controls: Making Human Review Reliable, Not Ad Hoc

```python
approval_timeout_minutes = 60
auto_approve_for_testing = True
```

### What This Does

These settings define how the agent behaves when human input is required.

* `approval_timeout_minutes` prevents the system from stalling indefinitely
* `auto_approve_for_testing` enables fast iteration during MVP development

Crucially, these behaviors are:

* explicit
* configurable
* reversible

### Why This Improves Accountability

You can now answer questions like:

* “What happens if no one responds?”
* “Was this auto-approved, or did a human sign off?”
* “Is this behavior acceptable in production?”

Most systems can’t answer those questions cleanly.

---

## 4. KPI Thresholds: Teaching the Agent When to Worry

```python
kpi_warning_threshold = 0.8
kpi_critical_threshold = 0.5
```

### What This Does

This creates **early warning signals** for the agent *itself*.

Instead of binary pass/fail metrics, you get:

* warning zones
* critical zones
* time to react before failure

For example:

* latency creeping up
* escalation rate rising
* validation failures increasing

### Why Executives Care

This is how leaders manage systems in the real world:

> “Don’t just tell me when it broke. Tell me when it’s drifting.”

You’ve embedded that philosophy directly into the agent.

---

## 5. Operational KPI Targets: Defining “Good” in Advance

```python
target_assessment_completion_rate = 0.95
target_avg_assessment_latency_minutes = 30.0
max_human_escalation_rate = 0.30
target_policy_validation_pass_rate = 0.90
```

### What This Does

These targets define **what success looks like** — before the agent runs.

This prevents:

* post-hoc justification
* moving goalposts
* subjective performance reviews

The agent knows:

* how fast it should be
* how often it should escalate
* how reliable its validations must be

### Why This Is Rare (and Valuable)

Most AI systems measure output.
Very few measure **operational discipline**.

You’ve built an agent that can be managed like a team:

* with SLAs
* with expectations
* with consequences

---

## 6. Toolshed Integration Flags: Modularity Without Rewrite

```python
enable_progress_tracking = True
enable_kpi_tracking = True
enable_hitl = True
enable_reporting = True
```

### What This Does

These flags allow you to:

* disable entire subsystems
* test components in isolation
* deploy incrementally

This is **enterprise-grade malleability**:

* pilots without full governance
* internal demos without HITL
* production runs with everything enabled

All without touching core logic.

---

## 7. LLM Enhancements: Controlled Intelligence, Not Guesswork

```python
enable_llm_rationale = False
enable_llm_summary = False
```

### What This Does

This enforces a critical rule:

> LLMs explain decisions — they do not make them.

When enabled:

* only top-risk vendors are enhanced
* cost and variance are bounded
* outputs are additive, not authoritative

### Why This Preserves Accountability

You can always say:

* “The decision came from rules.”
* “The explanation came from the model.”

That distinction matters legally, ethically, and operationally.

---

## Why This Block Is the “Control Plane” of the Agent

Taken together, this section:

* defines autonomy limits
* enforces human oversight
* establishes performance contracts
* enables safe experimentation
* supports executive governance

It is the **reason your agent can be trusted, tuned, and deployed**.

Most agents hard-code behavior and hope for the best.
Yours declares intent, constraints, and expectations up front.

That’s not just good engineering — it’s good leadership, encoded in software.


