<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/504_EPOv2_decision_utils.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Excellent ‚Äî this node is **structurally sound** and very much in line with the architecture you‚Äôve been building. I‚Äôll review it the same way we‚Äôve been doing the others: crisp, practical, and focused on correctness + future resilience.

I‚Äôll cover:

1. ‚úÖ What this node already does *very well*
2. üîç One subtle but important state-management issue
3. ‚ö†Ô∏è Two small fixes that will prevent future bugs
4. üß† Why this node completes your ‚Äúanalysis ‚Üí decision‚Äù contract cleanly

No rewrites yet ‚Äî just review and guidance while you‚Äôre mid-revision.

---

## 1. ‚úÖ What This Node Already Nails

### üîπ Correct separation of concerns

This node **does not re-analyze data** and **does not compute stats**. It strictly:

* Consumes analysis
* Applies policy
* Emits decisions

That‚Äôs exactly right.

You avoided the common mistake of letting ‚Äúdecision logic creep into analysis logic.‚Äù

---

### üîπ Proper merge of calculated + loaded analyses

This is *very good*:

```python
analysis_lookup = state.get("analysis_lookup", {})
calculated_analyses = state.get("calculated_analyses", [])

for analysis in calculated_analyses:
    exp_id = analysis.get("experiment_id")
    if exp_id:
        analysis_lookup[exp_id] = analysis
```

This ensures:

* Fresh calculations override stale stored analysis
* Downstream nodes see a unified view

This is **state-centric orchestration done right**.

---

### üîπ Symmetric handling of single vs portfolio scope

Both branches behave consistently:

| Scope     | Behavior                              |
| --------- | ------------------------------------- |
| Single    | Evaluate exactly one experiment       |
| Portfolio | Evaluate only those needing decisions |
| Fallback  | Safe default for partial pipelines    |

This symmetry makes the system predictable.

---

## 2. üîç Subtle but Important State Issue

### ‚ùó You are mutating `analysis_lookup` in-place

This line is the culprit:

```python
analysis_lookup = state.get("analysis_lookup", {})
```

If `analysis_lookup` is a **reference** to the state object (which it usually is), you are mutating shared state without explicitly returning it.

Right now this *works* because:

* You don‚Äôt rely on the original `analysis_lookup` later
* Python dicts are mutable

But this can cause **silent coupling** between nodes later.

---

### ‚úÖ Recommended (safe) pattern

Make the merge explicit and return it:

```python
analysis_lookup = dict(state.get("analysis_lookup", {}))

for analysis in calculated_analyses:
    exp_id = analysis.get("experiment_id")
    if exp_id:
        analysis_lookup[exp_id] = analysis
```

And then **return it**:

```python
return {
    "analysis_lookup": analysis_lookup,
    "generated_decisions": generated_decisions,
    "errors": errors
}
```

This keeps your orchestrator **pure, explicit, and testable**.

---

## 3. ‚ö†Ô∏è Two Small Fixes You Should Make

These are easy and worth doing now.

---

### ‚ùó Fix 1: Require `analysis_lookup` explicitly

Right now you check:

```python
if not definitions_lookup:
    return error
```

But you don‚Äôt check for `analysis_lookup`.

In single-experiment mode, this can silently do nothing if:

* Definitions exist
* Analysis is missing

#### ‚úÖ Add this guard:

```python
if not analysis_lookup and not calculated_analyses:
    return {
        "errors": errors + ["decision_evaluation_node: analysis required. Run statistical_analysis_node first."]
    }
```

This prevents ‚Äúempty success.‚Äù

---

### ‚ùó Fix 2: Align error messages with other nodes

Most of your nodes follow this pattern:

```python
"<node_name>: <clear actionable message>"
```

This one already mostly does ‚Äî just make it consistent everywhere.

Example tweak:

```python
"decision_evaluation_node: definitions_lookup required. Run data_loading_node first."
```

is perfect.
Apply the same clarity for missing analysis.

---

## 4. üß† Why This Node Completes the System

This node is where your architecture **crosses the line** from:

> ‚ÄúA data analysis pipeline‚Äù

to

> **‚ÄúAn organizational decision system‚Äù**

Because now you have:

* Evidence ‚Üí Analysis
* Analysis ‚Üí Policy
* Policy ‚Üí Action
* Action ‚Üí Review date
* Review date ‚Üí Reversal triggers

That‚Äôs a *full governance loop*.

---

## Final Verdict

### This node is:

* ‚úÖ Correctly scoped
* ‚úÖ State-aware
* ‚úÖ Policy-driven
* ‚úÖ Deterministic
* ‚úÖ CEO-defensible

And with **two small safety tweaks**, it becomes production-grade.


In [None]:
"""Decision Evaluation Utilities for Experimentation Portfolio Orchestrator

Functions to evaluate experiments and generate decision recommendations
based on statistical analysis results and decision policies.
"""

from typing import Dict, Any, Optional
from datetime import datetime, timedelta


def evaluate_decision_confidence(
    analysis: Dict[str, Any],
    p_value_threshold: float = 0.05
) -> str:
    """
    Evaluate decision confidence based on statistical analysis.

    Args:
        analysis: Experiment analysis result
        p_value_threshold: P-value threshold for significance

    Returns:
        "high", "medium", or "low"
    """
    statistical_test = analysis.get("statistical_test", {})
    p_value = statistical_test.get("p_value")

    if p_value is None:
        return "low"

    if p_value < 0.01:
        return "high"
    elif p_value < p_value_threshold:
        return "medium"
    else:
        return "low"


def evaluate_decision_risk(
    definition: Dict[str, Any],
    analysis: Dict[str, Any],
    portfolio_entry: Optional[Dict[str, Any]] = None
) -> str:
    """
    Evaluate decision risk based on experiment characteristics and results.

    Args:
        definition: Experiment definition
        analysis: Experiment analysis result
        portfolio_entry: Optional portfolio entry for additional context

    Returns:
        "low", "medium", or "high"
    """
    # Start with risk tier from portfolio
    risk_tier = portfolio_entry.get("risk_tier", "medium") if portfolio_entry else "medium"

    # Check guardrails
    guardrails_passed = analysis.get("guardrails_passed", True)
    if not guardrails_passed:
        return "high"

    # Check for data quality flags
    metrics = analysis.get("metrics", [])
    has_quality_flags = any(
        metric.get("data_quality_flags", [])
        for metric in metrics
    )
    if has_quality_flags:
        if risk_tier == "low":
            risk_tier = "medium"
        else:
            risk_tier = "high"

    # Check segment consistency
    segment_consistency = analysis.get("segment_consistency", "consistent")
    if segment_consistency != "consistent":
        if risk_tier == "low":
            risk_tier = "medium"

    # Check statistical significance
    statistical_test = analysis.get("statistical_test", {})
    is_significant = statistical_test.get("is_statistically_significant", False) or statistical_test.get("is_significant", False)
    if not is_significant and risk_tier == "low":
        risk_tier = "medium"

    return risk_tier


def determine_decision(
    analysis: Dict[str, Any],
    definition: Dict[str, Any],
    config: Any,
    decision_signal: Optional[str] = None
) -> str:
    """
    Determine decision recommendation based on analysis and config thresholds.

    Args:
        analysis: Experiment analysis result
        definition: Experiment definition
        config: Config with decision thresholds
        decision_signal: Optional pre-calculated decision signal from analysis

    Returns:
        "scale", "iterate", "retire", "pause", or "do_not_start"
    """
    # Use decision_signal from analysis if available
    if decision_signal:
        if decision_signal == "strong_scale":
            return "scale"
        elif decision_signal == "cautious_scale":
            return "iterate"  # Cautious scale = iterate first
        elif decision_signal == "iterate":
            return "iterate"
        elif decision_signal == "retire":
            return "retire"

    # Fallback to rule-based decision
    status = definition.get("status", "unknown")

    # Planned experiments that shouldn't start
    if status == "planned":
        risk_notes = definition.get("risk_notes", "").lower()
        if "bias" in risk_notes or "compliance" in risk_notes or "regulatory" in risk_notes:
            return "do_not_start"

    # Get lift metrics
    relative_lift_percent = analysis.get("relative_lift_percent")
    if relative_lift_percent is None:
        relative_lift_percent = analysis.get("relative_change_percent", 0)
        # For decrease metrics, make it positive
        if definition.get("expected_direction") == "decrease":
            relative_lift_percent = abs(relative_lift_percent)

    meets_minimum_effect = analysis.get("meets_minimum_effect", False)
    statistical_test = analysis.get("statistical_test", {})
    is_significant = statistical_test.get("is_statistically_significant", False) or statistical_test.get("is_significant", False)

    # Decision logic
    if not meets_minimum_effect:
        return "retire"

    if is_significant and relative_lift_percent >= config.scale_threshold_lift:
        return "scale"
    elif is_significant and relative_lift_percent >= config.iterate_threshold_lift:
        return "iterate"
    elif relative_lift_percent < config.retire_threshold_lift:
        return "retire"
    else:
        return "iterate"


def generate_decision_rationale(
    analysis: Dict[str, Any],
    definition: Dict[str, Any],
    decision: str
) -> str:
    """
    Generate human-readable rationale for decision.

    Args:
        analysis: Experiment analysis result
        definition: Experiment definition
        decision: Decision recommendation

    Returns:
        Rationale string
    """
    primary_metric = analysis.get("primary_metric", "metric")

    if decision == "scale":
        lift = analysis.get("relative_lift_percent", 0)
        p_value = analysis.get("statistical_test", {}).get("p_value")
        if p_value:
            return (
                f"{primary_metric} improved by {lift:.1f}% with statistical significance "
                f"(p={p_value:.4f}). Effect exceeds minimum threshold and meets scale criteria."
            )
        else:
            return f"{primary_metric} improved by {lift:.1f}%. Effect exceeds minimum threshold."

    elif decision == "iterate":
        lift = analysis.get("relative_lift_percent", 0)
        p_value = analysis.get("statistical_test", {}).get("p_value")
        if p_value and p_value >= 0.05:
            return (
                f"{primary_metric} improved by {lift:.1f}% but statistical significance is uncertain "
                f"(p={p_value:.4f}). Continue experiment with refinements to increase confidence."
            )
        else:
            return (
                f"{primary_metric} improved by {lift:.1f}% but effect size is below scale threshold. "
                f"Continue experiment with optimizations."
            )

    elif decision == "retire":
        lift = analysis.get("relative_lift_percent", 0)
        return (
            f"{primary_metric} change ({lift:.1f}%) does not meet minimum effect threshold. "
            f"Experiment should be retired."
        )

    elif decision == "do_not_start":
        risk_notes = definition.get("risk_notes", "")
        return (
            f"Experiment design presents elevated risk: {risk_notes}. "
            f"Refine design and add guardrails before proceeding."
        )

    else:
        return "Decision evaluation completed."


def generate_recommended_action(
    decision: str,
    definition: Dict[str, Any],
    analysis: Dict[str, Any]
) -> str:
    """
    Generate recommended action based on decision.

    Args:
        decision: Decision recommendation
        definition: Experiment definition
        analysis: Experiment analysis result

    Returns:
        Recommended action string
    """
    experiment_name = definition.get("hypothesis", "experiment")

    if decision == "scale":
        return f"Roll out {experiment_name} to full population."
    elif decision == "iterate":
        return f"Continue experiment with refinements and expanded monitoring."
    elif decision == "retire":
        return f"End experiment and document learnings."
    elif decision == "do_not_start":
        return f"Refine hypothesis, add guardrails, and re-submit for review."
    else:
        return "Review experiment status and determine next steps."


def estimate_expected_impact(
    analysis: Dict[str, Any],
    definition: Dict[str, Any],
    portfolio_entry: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
    """
    Estimate expected impact for decision.

    Args:
        analysis: Experiment analysis result
        definition: Experiment definition
        portfolio_entry: Optional portfolio entry

    Returns:
        Dictionary with expected impact estimates
    """
    primary_metric = analysis.get("primary_metric")
    relative_lift_percent = analysis.get("relative_lift_percent", 0)

    # Conservative estimate: use 70% of observed lift for scaling
    estimated_lift_percent = relative_lift_percent * 0.7 if relative_lift_percent > 0 else relative_lift_percent

    # Annual value would need business context - placeholder for now
    annual_value_usd = None

    return {
        "kpi": primary_metric,
        "estimated_lift_percent": round(estimated_lift_percent, 1) if estimated_lift_percent else None,
        "annual_value_usd": annual_value_usd
    }


def generate_reversal_triggers(
    decision: str,
    definition: Dict[str, Any],
    analysis: Dict[str, Any]
) -> list:
    """
    Generate reversal triggers for decision.

    Args:
        decision: Decision recommendation
        definition: Experiment definition
        analysis: Experiment analysis result

    Returns:
        List of reversal trigger strings
    """
    primary_metric = analysis.get("primary_metric", "metric")
    guardrail_metrics = definition.get("guardrail_metrics", [])

    triggers = []

    if decision == "scale":
        # If scaling, monitor primary metric
        triggers.append(f"{primary_metric} falls below control baseline for two consecutive weeks")

        # Add guardrail triggers
        for guardrail in guardrail_metrics:
            triggers.append(f"{guardrail} degrades below acceptable threshold")

    elif decision == "iterate":
        # If iterating, monitor for degradation
        triggers.append(f"{primary_metric} shows negative trend")
        for guardrail in guardrail_metrics:
            triggers.append(f"{guardrail} drops below baseline")

    return triggers


def evaluate_experiment_decision(
    experiment_id: str,
    definition: Dict[str, Any],
    analysis: Dict[str, Any],
    portfolio_entry: Optional[Dict[str, Any]],
    config: Any
) -> Dict[str, Any]:
    """
    Evaluate experiment and generate complete decision recommendation.

    Args:
        experiment_id: Experiment ID
        definition: Experiment definition
        analysis: Experiment analysis result
        portfolio_entry: Optional portfolio entry
        config: Config with thresholds

    Returns:
        Complete decision dictionary
    """
    # Get decision signal from analysis if available
    decision_signal = analysis.get("decision_signal")

    # Determine decision
    decision = determine_decision(analysis, definition, config, decision_signal)

    # Evaluate confidence and risk
    decision_confidence = evaluate_decision_confidence(analysis, config.confidence_threshold)
    decision_risk = evaluate_decision_risk(definition, analysis, portfolio_entry)

    # Generate rationale and action
    rationale = generate_decision_rationale(analysis, definition, decision)
    recommended_action = generate_recommended_action(decision, definition, analysis)

    # Get decision owner from definition
    decision_owner = definition.get("decision_owner", "unknown")

    # Estimate impact
    expected_impact = estimate_expected_impact(analysis, definition, portfolio_entry)

    # Generate reversal triggers
    reversal_triggers = generate_reversal_triggers(decision, definition, analysis)

    # Calculate next review date (30 days from now for scale, 60 for iterate, etc.)
    days_until_review = {
        "scale": 30,
        "iterate": 60,
        "retire": 90,
        "do_not_start": 90,
        "pause": 30
    }.get(decision, 60)

    next_review_date = (datetime.now() + timedelta(days=days_until_review)).strftime("%Y-%m-%d")
    decision_date = datetime.now().strftime("%Y-%m-%d")

    return {
        "experiment_id": experiment_id,
        "decision": decision,
        "decision_confidence": decision_confidence,
        "decision_risk": decision_risk,
        "rationale": rationale,
        "recommended_action": recommended_action,
        "decision_owner": decision_owner,
        "expected_impact": expected_impact,
        "reversal_triggers": reversal_triggers,
        "next_review_date": next_review_date,
        "decision_date": decision_date
    }


def evaluate_experiments_needing_decisions(
    analyzed_experiments: list,
    definitions_lookup: Dict[str, Dict[str, Any]],
    analysis_lookup: Dict[str, Dict[str, Any]],
    portfolio_lookup: Dict[str, Dict[str, Any]],
    decisions_lookup: Dict[str, Dict[str, Any]],
    config: Any
) -> list:
    """
    Evaluate experiments that need decision recommendations.

    Args:
        analyzed_experiments: List of experiment status analyses
        definitions_lookup: Definitions lookup
        analysis_lookup: Analysis lookup (includes calculated_analyses)
        portfolio_lookup: Portfolio lookup
        decisions_lookup: Existing decisions lookup
        config: Config with thresholds

    Returns:
        List of newly generated decisions
    """
    generated_decisions = []

    for exp_status in analyzed_experiments:
        if not exp_status.get("needs_decision", False):
            continue

        experiment_id = exp_status.get("experiment_id")
        if not experiment_id:
            continue

        # Skip if decision already exists
        if experiment_id in decisions_lookup:
            continue

        definition = definitions_lookup.get(experiment_id)
        analysis = analysis_lookup.get(experiment_id)
        portfolio_entry = portfolio_lookup.get(experiment_id)

        if not definition or not analysis:
            continue

        # Generate decision
        decision = evaluate_experiment_decision(
            experiment_id=experiment_id,
            definition=definition,
            analysis=analysis,
            portfolio_entry=portfolio_entry,
            config=config
        )

        generated_decisions.append(decision)

    return generated_decisions
