<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/516_EPOv2_historicalTracking_utils.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is **very strong work** ‚Äî architecturally sound, MVP-disciplined, and clearly aligned with the strategic goal we discussed: **turning the agent from point-in-time reporting into a learning system with memory**.

I‚Äôll break this into four parts:

1. **What you absolutely nailed**
2. **Small but important corrections**
3. **Where this fits in the agent lifecycle**
4. **One optional refinement that will pay off later**

No fluff ‚Äî this is a senior-level review.

---

## 1. What You Nailed (This Is the Right Shape)

### ‚úÖ You chose the right abstraction: *snapshots, not logs*

You correctly avoided:

* Per-event logging
* High-granularity experiment timelines
* Premature time-series complexity

Instead, you implemented:

* **Run-level portfolio snapshots**
* Stable, comparable metrics
* Append-only historical memory

This is exactly what executives need.

> CEOs don‚Äôt want raw data ‚Äî they want *direction over time*.

---

### ‚úÖ Snapshot schema is executive-grade

Your snapshot captures the *right* categories:

* Portfolio structure (counts by state)
* ROI economics
* Decision throughput
* Analysis coverage
* Risk / opportunity signals
* Processing metadata

Importantly:

* You did **not** store raw experiment internals
* You stored **derived outcomes**

That makes snapshots:

* Small
* Durable
* Safe to evolve

This is how real production systems do it.

---

### ‚úÖ Comparison logic is clean and explainable

The `compare_snapshots()` function is especially solid:

* Explicit metrics list (no magic)
* Absolute + percent deltas
* Directional classification
* Human-readable icons

This is **CEO-safe math**:

* No black boxes
* No spurious precision
* No misleading trends

The `<1% = stable` threshold is *exactly right*.

---

### ‚úÖ Graceful degradation is handled properly

Two excellent examples:

* `load_latest_snapshot()` returns `None` cleanly
* `calculate_trend_significance()` falls back safely

This is consistent with your overall philosophy:

> ‚ÄúLLMs enhance, rules must never break the system.‚Äù

Same principle applied to analytics.

---

## 2. Small but Important Corrections (Worth Fixing Now)

These are not structural issues ‚Äî just polish to prevent future confusion.

---

### ‚ö†Ô∏è Issue 1: `portfolio_insights` shape mismatch

Earlier in your codebase, `portfolio_insights` is a **dictionary** with keys like:

```python
trends, risks, opportunities, recommendations
```

But in this snapshot code you treat it as a **list**:

```python
portfolio_insights = state.get("portfolio_insights", [])
len(portfolio_insights)
for i in portfolio_insights if i.get("type") == "trend"
```

#### Fix (recommended)

Normalize this explicitly:

```python
portfolio_insights = state.get("portfolio_insights", {})

trends = portfolio_insights.get("trends", [])
risks = portfolio_insights.get("risks", [])
opportunities = portfolio_insights.get("opportunities", [])
recommendations = portfolio_insights.get("recommendations", [])
```

Then count each cleanly.

Why this matters:

* Prevents silent undercounting
* Keeps schema stable as insights grow richer

---

### ‚ö†Ô∏è Issue 2: `analysis_success_rate` semantics

You store:

```python
analysis_success_rate = performance_metrics.get("analysis_success_rate", 0.0)
```

But earlier:

* ‚ÄúStatistical Tests Performed: 0‚Äù
* Yet analyses may exist

That‚Äôs not wrong ‚Äî but it *will* confuse trend interpretation.

#### Suggested rename (optional but smart)

In snapshots:

* `analysis_coverage_rate`
* or `analysis_completion_rate`

Then reserve `analysis_success_rate` for:

> ‚ÄúValid statistical test completed with sufficient data.‚Äù

This avoids executives asking:

> ‚ÄúWhy is success dropping if nothing failed?‚Äù

---

### ‚ö†Ô∏è Issue 3: Percent change when previous = 0

You currently do:

```python
percent_change = 100.0  # New metric
trend = "new"
```

This is fine internally, but **dangerous in reports**.

#### Recommendation

Keep `"trend": "new"`
But set:

```python
percent_change = None
```

Then display:

> ‚ÄúNew metric ‚Äî no historical baseline‚Äù

Executives *hate* fake percentages.

---

## 3. Where This Fits in the Agent Lifecycle (Very Important)

You‚Äôve now added **memory**.

Your agent lifecycle is officially:

1. **Observe** (data loading)
2. **Analyze** (stats + insights)
3. **Decide** (recommendations)
4. **Evaluate** (ROI + performance)
5. **Remember** ‚Üê ‚úÖ *you just built this*
6. **Improve** (future step)

This is the minimum loop for a **learning system**.

Most AI agents stop at step 3.

---

## 4. One Optional Refinement (High Leverage, Low Cost)

You don‚Äôt need this now ‚Äî but if you add it, everything else gets easier.

### üîë Add a stable `run_id`

Right now you use timestamps in filenames (fine), but a formal ID helps later:

```python
"run_id": f"EPO_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
```

Why this matters later:

* Audit trails
* Decision execution linkage
* CEO questions like ‚Äúwhat changed since *that* run?‚Äù

This unlocks #2 (Decision Execution) cleanly.

---

## Overall Verdict

**This is exactly the right next step, and it‚Äôs executed at a professional level.**

You‚Äôve:

* Chosen the right abstraction
* Avoided premature complexity
* Designed for executive trust
* Preserved system explainability

This is not ‚Äúnice to have‚Äù ‚Äî this is **foundational**.

---

## What I Recommend You Do Next (Concrete)

**Next build step (small):**

* Integrate snapshot saving into the orchestrator tail
* Load latest snapshot at report time
* Add a **‚ÄúHistorical Comparison‚Äù** section to the report

**Then stop.**
Let it run.
Let it collect memory.

After 3‚Äì5 runs, you‚Äôll *feel* the system change.

You‚Äôre building something unusually coherent.


In [None]:
"""Historical Tracking Utilities for Experimentation Portfolio Orchestrator

Functions to save, load, and compare historical report snapshots for trend analysis.
"""

import json
from pathlib import Path
from typing import Dict, Any, Optional, List
from datetime import datetime


def save_report_snapshot(
    state: Dict[str, Any],
    snapshots_dir: str = "output/experimentation_portfolio_snapshots"
) -> str:
    """
    Save a snapshot of key metrics from the current state for historical comparison.

    Args:
        state: Complete EPO state
        snapshots_dir: Directory to save snapshots

    Returns:
        Path to saved snapshot file
    """
    # Create snapshots directory if it doesn't exist
    snapshots_path = Path(snapshots_dir)
    snapshots_path.mkdir(parents=True, exist_ok=True)

    # Extract key metrics for snapshot
    portfolio_summary = state.get("portfolio_summary", {})
    portfolio_roi = state.get("portfolio_roi", {})
    performance_metrics = state.get("performance_metrics", {})
    analyzed_experiments = state.get("analyzed_experiments", [])
    generated_decisions = state.get("generated_decisions", [])
    calculated_analyses = state.get("calculated_analyses", [])
    portfolio_insights = state.get("portfolio_insights", [])

    # Build snapshot
    snapshot = {
        "timestamp": datetime.now().isoformat(),
        "experiment_id": state.get("experiment_id"),
        "scope": state.get("goal", {}).get("scope", "unknown"),

        # Portfolio metrics
        "total_experiments": portfolio_summary.get("total_experiments", 0),
        "completed_count": portfolio_summary.get("completed_count", 0),
        "running_count": portfolio_summary.get("running_count", 0),
        "planned_count": portfolio_summary.get("planned_count", 0),

        # ROI metrics
        "total_cost": portfolio_roi.get("total_cost", 0.0),
        "total_revenue_impact": portfolio_roi.get("total_revenue_impact", 0.0),
        "net_roi": portfolio_roi.get("net_roi", 0.0),
        "roi_percent": portfolio_roi.get("roi_percent", 0.0),
        "positive_roi_count": portfolio_roi.get("experiments_with_positive_roi", 0),
        "negative_roi_count": portfolio_roi.get("experiments_with_negative_roi", 0),

        # Performance metrics
        "experiments_analyzed": performance_metrics.get("total_experiments_analyzed", 0),
        "analysis_success_rate": performance_metrics.get("analysis_success_rate", 0.0),
        "statistical_tests_performed": performance_metrics.get("statistical_tests_performed", 0),
        "decisions_generated": performance_metrics.get("decisions_generated", 0),

        # Decision breakdown
        "decision_counts": _count_decisions(generated_decisions),

        # Analysis breakdown
        "significant_analyses": sum(1 for a in calculated_analyses if a.get("is_significant", False)),
        "total_analyses": len(calculated_analyses),

        # Insights breakdown
        "insights_count": len(portfolio_insights),
        "trends_count": sum(1 for i in portfolio_insights if i.get("type") == "trend"),
        "risks_count": sum(1 for i in portfolio_insights if i.get("type") == "risk"),
        "opportunities_count": sum(1 for i in portfolio_insights if i.get("type") == "opportunity"),
        "recommendations_count": sum(1 for i in portfolio_insights if i.get("type") == "recommendation"),

        # Processing metadata
        "processing_time": state.get("processing_time", 0.0),
        "errors_count": len(state.get("errors", []))
    }

    # Generate filename
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    scope = snapshot["scope"]
    experiment_id = snapshot.get("experiment_id")

    if experiment_id:
        filename = f"snapshot_{experiment_id}_{timestamp}.json"
    else:
        filename = f"snapshot_{scope}_{timestamp}.json"

    filepath = snapshots_path / filename

    # Save snapshot
    with open(filepath, 'w', encoding='utf-8') as f:
        json.dump(snapshot, f, indent=2)

    return str(filepath)


def _count_decisions(decisions: List[Dict[str, Any]]) -> Dict[str, int]:
    """Count decisions by type."""
    counts = {}
    for decision in decisions:
        decision_type = decision.get("decision", "unknown")
        counts[decision_type] = counts.get(decision_type, 0) + 1
    return counts


def load_latest_snapshot(
    experiment_id: Optional[str] = None,
    scope: str = "portfolio_wide",
    snapshots_dir: str = "output/experimentation_portfolio_snapshots"
) -> Optional[Dict[str, Any]]:
    """
    Load the most recent snapshot for comparison.

    Args:
        experiment_id: Optional experiment ID (for single experiment snapshots)
        scope: Analysis scope (for portfolio snapshots)
        snapshots_dir: Directory containing snapshots

    Returns:
        Latest snapshot dictionary or None if no snapshots exist
    """
    snapshots_path = Path(snapshots_dir)

    if not snapshots_path.exists():
        return None

    # Find matching snapshots
    if experiment_id:
        pattern = f"snapshot_{experiment_id}_*.json"
    else:
        pattern = f"snapshot_{scope}_*.json"

    matching_files = list(snapshots_path.glob(pattern))

    if not matching_files:
        return None

    # Get most recent (by filename timestamp)
    latest_file = max(matching_files, key=lambda p: p.stat().st_mtime)

    # Load snapshot
    with open(latest_file, 'r', encoding='utf-8') as f:
        snapshot = json.load(f)

    return snapshot


def compare_snapshots(
    current: Dict[str, Any],
    previous: Dict[str, Any]
) -> Dict[str, Any]:
    """
    Compare current snapshot with previous snapshot to calculate trends.

    Args:
        current: Current snapshot
        previous: Previous snapshot

    Returns:
        Comparison dictionary with trends and changes
    """
    comparison = {
        "current_timestamp": current.get("timestamp"),
        "previous_timestamp": previous.get("timestamp"),
        "days_between": _calculate_days_between(
            previous.get("timestamp"),
            current.get("timestamp")
        ),
        "changes": {},
        "trends": {}
    }

    # Compare key metrics
    metrics_to_compare = [
        "total_experiments",
        "completed_count",
        "running_count",
        "planned_count",
        "total_cost",
        "total_revenue_impact",
        "net_roi",
        "roi_percent",
        "positive_roi_count",
        "negative_roi_count",
        "experiments_analyzed",
        "analysis_success_rate",
        "statistical_tests_performed",
        "decisions_generated",
        "significant_analyses",
        "total_analyses",
        "insights_count",
        "trends_count",
        "risks_count",
        "opportunities_count",
        "recommendations_count"
    ]

    for metric in metrics_to_compare:
        current_val = current.get(metric, 0)
        previous_val = previous.get(metric, 0)

        if previous_val == 0:
            if current_val > 0:
                percent_change = 100.0  # New metric
                trend = "new"
            else:
                percent_change = 0.0
                trend = "stable"
        else:
            percent_change = ((current_val - previous_val) / abs(previous_val)) * 100
            if abs(percent_change) < 1.0:  # Less than 1% change
                trend = "stable"
            elif percent_change > 0:
                trend = "increasing"
            else:
                trend = "decreasing"

        comparison["changes"][metric] = {
            "current": current_val,
            "previous": previous_val,
            "absolute_change": current_val - previous_val,
            "percent_change": round(percent_change, 2),
            "trend": trend
        }

        # Add trend indicator
        if trend == "increasing":
            trend_icon = "‚Üë"
        elif trend == "decreasing":
            trend_icon = "‚Üì"
        else:
            trend_icon = "‚Üí"

        comparison["trends"][metric] = {
            "direction": trend,
            "icon": trend_icon,
            "percent_change": round(percent_change, 2)
        }

    # Compare decision counts
    current_decisions = current.get("decision_counts", {})
    previous_decisions = previous.get("decision_counts", {})

    all_decision_types = set(current_decisions.keys()) | set(previous_decisions.keys())
    decision_changes = {}

    for decision_type in all_decision_types:
        current_count = current_decisions.get(decision_type, 0)
        previous_count = previous_decisions.get(decision_type, 0)
        change = current_count - previous_count

        decision_changes[decision_type] = {
            "current": current_count,
            "previous": previous_count,
            "change": change
        }

    comparison["decision_changes"] = decision_changes

    return comparison


def _calculate_days_between(timestamp1: Optional[str], timestamp2: Optional[str]) -> Optional[float]:
    """Calculate days between two ISO timestamps."""
    if not timestamp1 or not timestamp2:
        return None

    try:
        dt1 = datetime.fromisoformat(timestamp1.replace('Z', '+00:00'))
        dt2 = datetime.fromisoformat(timestamp2.replace('Z', '+00:00'))
        delta = dt2 - dt1
        return delta.total_seconds() / (24 * 3600)
    except Exception:
        return None


def calculate_trend_significance(
    values: List[float],
    confidence_level: float = 0.95
) -> Dict[str, Any]:
    """
    Calculate trend significance using toolshed statistics.

    Args:
        values: List of values over time (ordered chronologically)
        confidence_level: Confidence level for significance test

    Returns:
        Trend significance results
    """
    if len(values) < 3:
        return {
            "trend_direction": "stable",
            "is_significant": False,
            "reason": "Insufficient data for trend analysis (need at least 3 observations)"
        }

    try:
        from toolshed.statistics.kpi_roi_tests import test_trend_significance

        result = test_trend_significance(values, confidence_level)

        return {
            "trend_direction": result.get("trend_direction", "stable"),
            "is_significant": result.get("is_significant", False),
            "p_value": result.get("p_value"),
            "slope": result.get("slope", 0.0),
            "r_squared": result.get("r_squared", 0.0),
            "interpretation": result.get("interpretation", "")
        }
    except ImportError:
        # Fallback to simple trend detection
        if len(values) >= 2:
            recent_avg = sum(values[-3:]) / min(3, len(values))
            earlier_avg = sum(values[:-3]) / max(1, len(values) - 3) if len(values) > 3 else values[0]

            if recent_avg > earlier_avg * 1.05:  # 5% increase
                return {"trend_direction": "increasing", "is_significant": False}
            elif recent_avg < earlier_avg * 0.95:  # 5% decrease
                return {"trend_direction": "decreasing", "is_significant": False}

        return {"trend_direction": "stable", "is_significant": False}


# testing

In [None]:
"""Test Historical Tracking for EPO Agent

Tests historical snapshot saving, loading, and comparison.
"""

import sys
from pathlib import Path
import time
import json

# Add project root to path
project_root = Path(__file__).parent
sys.path.insert(0, str(project_root))

from agents.epo import create_orchestrator
from agents.epo.utilities.historical_tracking import (
    save_report_snapshot,
    load_latest_snapshot,
    compare_snapshots,
    calculate_trend_significance,
)
from config import (
    ExperimentationPortfolioOrchestratorState,
    ExperimentationPortfolioOrchestratorConfig,
)


def test_historical_tracking():
    """Test historical tracking with two consecutive runs"""
    print("\n" + "="*70)
    print("Test: Historical Tracking (Two Consecutive Runs)")
    print("="*70)

    config = ExperimentationPortfolioOrchestratorConfig()
    orchestrator = create_orchestrator(config)

    # First run
    print("\nüìä Running first analysis...")
    initial_state: ExperimentationPortfolioOrchestratorState = {
        "experiment_id": None,
        "errors": []
    }

    state1 = orchestrator.invoke(initial_state)
    state1["processing_time"] = time.time() - time.time()  # Set to 0 for consistency

    print("‚úÖ First run complete")
    print(f"   - Net ROI: ${state1.get('portfolio_roi', {}).get('net_roi', 0):,.2f}")
    print(f"   - Total Experiments: {state1.get('portfolio_summary', {}).get('total_experiments', 0)}")

    # Wait a moment to ensure different timestamps
    time.sleep(1)

    # Second run (should compare with first)
    print("\nüìä Running second analysis...")
    initial_state2: ExperimentationPortfolioOrchestratorState = {
        "experiment_id": None,
        "errors": []
    }

    state2 = orchestrator.invoke(initial_state2)
    state2["processing_time"] = time.time() - time.time()  # Set to 0 for consistency

    print("‚úÖ Second run complete")
    print(f"   - Net ROI: ${state2.get('portfolio_roi', {}).get('net_roi', 0):,.2f}")
    print(f"   - Total Experiments: {state2.get('portfolio_summary', {}).get('total_experiments', 0)}")

    # Check if historical comparison was generated
    historical_comparison = state2.get("historical_comparison")

    if historical_comparison:
        print("\n‚úÖ Historical comparison generated!")
        print(f"   - Days between: {historical_comparison.get('days_between', 'N/A')}")

        # Check trends
        trends = historical_comparison.get("trends", {})
        if "net_roi" in trends:
            roi_trend = trends["net_roi"]
            print(f"   - ROI Trend: {roi_trend.get('icon')} {roi_trend.get('direction')} ({roi_trend.get('percent_change', 0):+.1f}%)")

        # Check changes
        changes = historical_comparison.get("changes", {})
        if "net_roi" in changes:
            roi_change = changes["net_roi"]
            print(f"   - ROI Change: ${roi_change.get('absolute_change', 0):+,.2f}")

        # Verify report includes historical section
        report_content = state2.get("portfolio_report", "")
        if "Historical Comparison" in report_content:
            print("\n‚úÖ Report includes Historical Comparison section")
        else:
            print("\n‚ö†Ô∏è  Report missing Historical Comparison section")
    else:
        print("\n‚ö†Ô∏è  No historical comparison generated (this is OK for first run)")

    # Verify snapshots were saved
    snapshots_dir = config.reports_dir.replace("reports", "snapshots")
    snapshots_path = Path(snapshots_dir)

    if snapshots_path.exists():
        snapshot_files = list(snapshots_path.glob("snapshot_*.json"))
        print(f"\n‚úÖ Snapshots saved: {len(snapshot_files)} file(s)")
        if len(snapshot_files) >= 2:
            print("   - Multiple snapshots found (expected for two runs)")
    else:
        print("\n‚ö†Ô∏è  Snapshots directory not found")

    print("\n‚úÖ Historical tracking test complete!")


def test_snapshot_utilities():
    """Test snapshot utilities directly"""
    print("\n" + "="*70)
    print("Test: Snapshot Utilities")
    print("="*70)

    # Create test state
    test_state = {
        "experiment_id": None,
        "goal": {"scope": "portfolio_wide"},
        "portfolio_summary": {
            "total_experiments": 3,
            "completed_count": 1,
            "running_count": 1,
            "planned_count": 1
        },
        "portfolio_roi": {
            "total_cost": 2250.0,
            "total_revenue_impact": 14800.0,
            "net_roi": 12550.0,
            "roi_percent": 557.8,
            "experiments_with_positive_roi": 2,
            "experiments_with_negative_roi": 0
        },
        "performance_metrics": {
            "total_experiments_analyzed": 3,
            "analysis_success_rate": 0.667,
            "statistical_tests_performed": 0,
            "decisions_generated": 0
        },
        "analyzed_experiments": [],
        "generated_decisions": [],
        "calculated_analyses": [],
        "portfolio_insights": [],
        "processing_time": 0.05,
        "errors": []
    }

    # Test save
    print("\nüìù Testing snapshot save...")
    snapshot_path = save_report_snapshot(test_state)
    print(f"‚úÖ Snapshot saved: {snapshot_path}")

    # Test load
    print("\nüìñ Testing snapshot load...")
    loaded = load_latest_snapshot(scope="portfolio_wide")
    if loaded:
        print(f"‚úÖ Snapshot loaded: {loaded.get('timestamp')}")
        print(f"   - Net ROI: ${loaded.get('net_roi', 0):,.2f}")
        print(f"   - Total Experiments: {loaded.get('total_experiments', 0)}")
    else:
        print("‚ö†Ô∏è  No snapshot loaded")

    # Test comparison
    if loaded:
        print("\nüìä Testing snapshot comparison...")
        # Create a modified state for comparison
        test_state2 = test_state.copy()
        test_state2["portfolio_roi"]["net_roi"] = 15000.0  # Increased ROI
        test_state2["portfolio_summary"]["total_experiments"] = 4  # More experiments

        snapshot_path2 = save_report_snapshot(test_state2)
        with open(snapshot_path2, 'r') as f:
            current_snapshot = json.load(f)

        comparison = compare_snapshots(current_snapshot, loaded)

        if comparison:
            print("‚úÖ Comparison generated")
            print(f"   - Days between: {comparison.get('days_between', 'N/A')}")

            roi_trend = comparison.get("trends", {}).get("net_roi", {})
            print(f"   - ROI Trend: {roi_trend.get('icon')} {roi_trend.get('direction')}")

            roi_change = comparison.get("changes", {}).get("net_roi", {})
            print(f"   - ROI Change: ${roi_change.get('absolute_change', 0):+,.2f} ({roi_change.get('percent_change', 0):+.1f}%)")

    print("\n‚úÖ Snapshot utilities test complete!")


if __name__ == "__main__":
    print("\n" + "="*70)
    print("Historical Tracking Tests for EPO Agent")
    print("="*70)

    try:
        test_snapshot_utilities()
        test_historical_tracking()

        print("\n" + "="*70)
        print("‚úÖ ALL HISTORICAL TRACKING TESTS PASSED!")
        print("="*70)

    except Exception as e:
        print(f"\n‚ùå Test failed: {str(e)}")
        import traceback
        traceback.print_exc()
        sys.exit(1)


# Test Results

In [None]:
(.venv) micahshull@Micahs-iMac AI_AGENTS_017_EPO_2.0 % python3 test_epo_e2e.py

======================================================================
End-to-End Integration Tests for EPO Agent
======================================================================

======================================================================
Test 1: Portfolio-Wide Analysis (Full Workflow)
======================================================================

üìä Starting portfolio-wide analysis...

‚è±Ô∏è  Total processing time: 0.05 seconds

‚úÖ No errors in workflow

üìà Results Summary:
   - Experiments analyzed: 3
   - Statistical tests: 0
   - Decisions generated: 0
   - Portfolio status: 3 total
     - Completed: 1
     - Running: 1
     - Planned: 1

üí∞ Portfolio ROI:
   - Total Cost: $2,250.00
   - Total Revenue Impact: $14,800.00
   - Net ROI: $12,550.00
   - ROI %: 557.78%
   - Positive ROI experiments: 2

‚ö° Performance Metrics:
   - Analysis success rate: 66.7%
   - Statistical tests performed: 0
   - Decisions generated: 0

üìÑ Report Generated:
   - Path: output/experimentation_portfolio_reports/epo_report_epo_report_portfolio_20260118_152607.md
   - File exists: ‚úÖ

‚úÖ Portfolio-wide E2E test passed!

======================================================================
Test 2: Single Experiment Analysis (E001)
======================================================================

üî¨ Starting single experiment analysis for E001...

‚è±Ô∏è  Total processing time: 0.00 seconds

‚úÖ No errors in workflow

üìà Results Summary:

üí∞ ROI:
   - Total Cost: $850.00
   - Net ROI: $9,150.00
   - ROI %: 1076.47%

üìÑ Report Generated:
   - Path: output/experimentation_portfolio_reports/epo_report_epo_report_E001_20260118_152607.md
   - File exists: ‚úÖ

‚úÖ Single experiment E2E test passed!

======================================================================
Test 3: State Progression Validation
======================================================================

‚úÖ All required fields present in final state
‚úÖ Data integrity validated: 3 experiments

‚úÖ State progression test passed!

======================================================================
Test 4: Error Handling
======================================================================

üîç Testing with non-existent experiment ID (E999)...
‚úÖ Errors captured: 3
   - statistical_analysis_node: definitions_lookup and metrics_lookup required. Run data_loading_node first.
   - decision_evaluation_node: definitions_lookup required. Run data_loading_node first.
   - roi_calculation_node: analyzed_experiments or experiment_id with analysis required

‚úÖ Error handling test passed!

======================================================================
‚úÖ ALL END-TO-END TESTS PASSED!
======================================================================

The EPO agent workflow is fully functional and ready for use.
(.venv) micahshull@Micahs-iMac AI_AGENTS_017_EPO_2.0 %