# Phase Logger Workflow Demo

This notebook demonstrates the **PhaseLogger** - an AgentRxiv-inspired logging system for multi-phase AI agent workflows that tracks decisions and outcomes through each phase.

## What You'll Learn
1. Initializing PhaseLogger for workflow tracking
2. Phase lifecycle: `start_phase` ‚Üí `log_decision` ‚Üí `log_artifact` ‚Üí `end_phase`
3. Working with different workflow phases (Planning, Data Collection, Execution, Validation)
4. Decision logging with reasoning, alternatives considered, and selection rationale
5. Artifact tracking with metadata (file type, path, phase association)
6. Error handling: recoverable vs fatal errors
7. Generating Mermaid workflow diagrams with `visualize_workflow()`
8. Summarizing workflow statistics with `get_phase_summary()`
9. Exporting complete workflow logs for compliance and debugging

## AgentRxiv Background
The PhaseLogger is inspired by AgentRxiv (agentrxiv.github.io), which introduced phase-based research workflow logging for autonomous research agents. This approach is particularly valuable for:
- **Compliance auditing**: Track every decision made during multi-step processes
- **Debugging**: Understand why an agent made specific choices
- **Reproducibility**: Document the reasoning behind workflow outcomes


In [None]:
# Setup imports
import sys
from pathlib import Path
from datetime import datetime, UTC
import json

# Add lesson-17 to path
sys.path.insert(0, str(Path.cwd().parent))

from backend.explainability.phase_logger import (
    PhaseLogger,
    WorkflowPhase,
    Decision,
    Artifact,
    PhaseOutcome,
    PhaseSummary,
)

# Initialize PhaseLogger for a research workflow
storage_path = Path.cwd().parent / "cache"
logger = PhaseLogger(
    workflow_id="research-workflow-001",
    storage_path=storage_path
)

print("=== PhaseLogger Initialized ===")
print(f"Workflow ID: {logger.workflow_id}")
print(f"Storage Path: {logger.storage_path}")
print(f"Logs Directory: {logger._logs_path}")
print()
print("Available Workflow Phases:")
for phase in WorkflowPhase:
    phase_type = "terminal" if phase in [WorkflowPhase.COMPLETED, WorkflowPhase.FAILED] else "active"
    print(f"  ‚Ä¢ {phase.value:20} ({phase_type})")


## 1. Phase Lifecycle

Each workflow phase follows a consistent lifecycle:

```
start_phase() ‚Üí [log_decision() | log_artifact() | log_error()]* ‚Üí end_phase()
```

**Rules:**
- Only one phase can be active at a time
- You cannot start a new phase while another is in progress
- Decisions, artifacts, and errors can only be logged during an active phase
- Each phase ends with a status: `success`, `failure`, `partial`, or `skipped`

Let's demonstrate this with a **PLANNING** phase.


In [None]:
# Phase 1: PLANNING - Demonstrates the complete lifecycle

# Start the planning phase
logger.start_phase(WorkflowPhase.PLANNING, metadata={"objective": "research_analysis"})
print(f"‚úì Started phase: {logger.get_current_phase().value}")

# Log a decision with full reasoning details
decision1 = logger.log_decision(
    decision="Use GPT-4 as primary extraction model",
    reasoning="Need high accuracy for financial document analysis",
    alternatives=["GPT-3.5-turbo", "Claude-3-Sonnet", "Llama-3-70B"],
    selected_because="GPT-4 shows 15% higher accuracy on financial benchmarks and supports structured output",
    confidence=0.92,
    agent_id="planning-agent",
    reversible=True,
)

print(f"\nüìã Decision Logged:")
print(f"   ID: {decision1.decision_id}")
print(f"   Decision: {decision1.decision}")
print(f"   Alternatives: {decision1.alternatives_considered}")
print(f"   Selected Because: {decision1.selected_because}")
print(f"   Confidence: {decision1.confidence:.0%}")
print(f"   Reversible: {decision1.reversible}")

# Log a second decision 
decision2 = logger.log_decision(
    decision="Set batch processing limit to 100 documents",
    reasoning="Balance throughput with memory constraints",
    alternatives=["50 docs", "200 docs", "500 docs"],
    selected_because="100 docs provides optimal memory usage while maintaining 2-hour processing deadline",
    confidence=0.85,
    agent_id="planning-agent",
)

# Log a planning artifact
plan_artifact = logger.log_artifact(
    artifact_name="research_plan",
    artifact_path=Path("outputs/research_plan_v1.json"),
    artifact_type="plan",
    metadata={"version": 1, "steps": 4, "estimated_duration_hours": 3},
)

print(f"\nüìÅ Artifact Logged:")
print(f"   ID: {plan_artifact.artifact_id}")
print(f"   Name: {plan_artifact.name}")
print(f"   Type: {plan_artifact.artifact_type}")
print(f"   Metadata: {plan_artifact.metadata}")

# End the phase
outcome = logger.end_phase(status="success")

print(f"\n‚úì Phase Completed:")
print(f"   Status: {outcome.status}")
print(f"   Duration: {outcome.duration_ms}ms")
print(f"   Decisions Made: {outcome.decisions_made}")
print(f"   Artifacts Produced: {outcome.artifacts_produced}")


## 2. DATA_COLLECTION Phase

The **DATA_COLLECTION** phase is where agents gather inputs for processing. This phase typically involves:
- Source selection decisions
- Data quality assessments  
- Artifact tracking for downloaded/retrieved data

Let's demonstrate data collection with multiple artifact types.


In [None]:
# Phase 2: DATA_COLLECTION - Multiple artifacts and decisions

logger.start_phase(WorkflowPhase.DATA_COLLECTION)
print(f"‚úì Started phase: {logger.get_current_phase().value}")

# Decision: Choose data sources
source_decision = logger.log_decision(
    decision="Use S3 bucket + PostgreSQL as data sources",
    reasoning="Comprehensive coverage of historical and real-time data",
    alternatives=["S3 only", "PostgreSQL only", "API endpoints", "Local filesystem"],
    selected_because="Combination provides historical archives (S3) + latest transactions (PostgreSQL)",
    confidence=0.88,
    agent_id="data-collector",
)

# Log multiple artifacts representing collected data
artifacts_info = [
    ("invoice_archive_2024", "data/invoices_2024.parquet", "dataset", {"records": 15420, "size_mb": 128}),
    ("vendor_master_data", "data/vendors.csv", "dataset", {"records": 892, "columns": 15}),
    ("transaction_log", "data/transactions_q4.jsonl", "log", {"records": 48230, "date_range": "2024-10-01 to 2024-12-31"}),
]

print(f"\nüìÅ Artifacts Collected:")
for name, path, art_type, meta in artifacts_info:
    artifact = logger.log_artifact(
        artifact_name=name,
        artifact_path=Path(path),
        artifact_type=art_type,
        metadata=meta,
    )
    print(f"   ‚Ä¢ {artifact.name}: {meta.get('records', 'N/A')} records ({art_type})")

# Log data quality decision
quality_decision = logger.log_decision(
    decision="Accept data with 2.3% null rate",
    reasoning="Null rate within acceptable threshold for analysis",
    alternatives=["Reject and re-collect", "Impute missing values", "Filter incomplete records"],
    selected_because="2.3% null rate is below 5% threshold; imputation would introduce bias",
    confidence=0.95,
    agent_id="quality-checker",
)

outcome = logger.end_phase(status="success")
print(f"\n‚úì Phase Completed: {outcome.decisions_made} decisions, {len(outcome.artifacts_produced)} artifacts")


## 3. EXECUTION Phase

The **EXECUTION** phase is where the main processing happens. This is typically the longest phase and may involve:
- Model inference decisions
- Processing strategy choices
- Runtime parameter adjustments
- Intermediate result artifacts

This phase often has the most decisions logged because it's where agents actively problem-solve.


In [None]:
# Phase 3: EXECUTION - Main processing with multiple decisions

logger.start_phase(WorkflowPhase.EXECUTION)
print(f"‚úì Started phase: {logger.get_current_phase().value}")

# Decision 1: Processing strategy
strategy_decision = logger.log_decision(
    decision="Use parallel processing with 4 workers",
    reasoning="Optimize throughput for large dataset",
    alternatives=["Sequential processing", "2 workers", "8 workers", "Async/await pattern"],
    selected_because="4 workers maximizes CPU utilization without memory pressure; 8 workers caused OOM in testing",
    confidence=0.90,
    agent_id="executor",
    reversible=True,  # Can restart with different config
)

# Decision 2: Model temperature setting (irreversible during run)
temp_decision = logger.log_decision(
    decision="Set temperature to 0.1 for deterministic extraction",
    reasoning="Financial data requires consistent, reproducible outputs",
    alternatives=["temperature=0.0", "temperature=0.3", "temperature=0.7"],
    selected_because="0.1 provides slight variation for edge cases while maintaining determinism; 0.0 too rigid",
    confidence=0.87,
    agent_id="model-controller",
    reversible=False,  # Applied to current batch
)

# Decision 3: Error handling strategy
error_decision = logger.log_decision(
    decision="Continue on soft errors, halt on schema violations",
    reasoning="Maximize data recovery while ensuring output integrity",
    alternatives=["Halt on any error", "Continue on all errors", "Queue for manual review"],
    selected_because="Schema violations indicate fundamental extraction failures requiring re-processing",
    confidence=0.93,
    agent_id="executor",
)

# Log intermediate results
extraction_artifact = logger.log_artifact(
    artifact_name="extraction_results",
    artifact_path=Path("outputs/extraction_batch_001.json"),
    artifact_type="intermediate_result",
    metadata={
        "records_processed": 15420,
        "success_rate": 0.982,
        "soft_errors": 278,
        "schema_violations": 0,
    },
)

print(f"\nüìä Execution Progress:")
print(f"   Records Processed: {extraction_artifact.metadata['records_processed']}")
print(f"   Success Rate: {extraction_artifact.metadata['success_rate']:.1%}")
print(f"   Soft Errors: {extraction_artifact.metadata['soft_errors']}")
print(f"   Schema Violations: {extraction_artifact.metadata['schema_violations']}")

outcome = logger.end_phase(status="success")
print(f"\n‚úì Phase Completed: {outcome.decisions_made} decisions in {outcome.duration_ms}ms")


## 4. VALIDATION Phase with Error Handling

The **VALIDATION** phase verifies outputs meet quality standards. This phase demonstrates:
- **Recoverable errors**: Issues that can be worked around (logged but don't stop the workflow)
- **Fatal errors**: Critical failures that require immediate attention

The `log_error()` method accepts a `recoverable` flag that determines how the error is categorized:
- `recoverable=True`: Logged as `[recoverable]` - workflow continues
- `recoverable=False`: Logged as `[fatal]` - typically triggers workflow halt or escalation


In [None]:
# Phase 4: VALIDATION - Demonstrates error handling

logger.start_phase(WorkflowPhase.VALIDATION)
print(f"‚úì Started phase: {logger.get_current_phase().value}")

# Decision: Choose validation strategy
validation_decision = logger.log_decision(
    decision="Use schema validation + business rule checks",
    reasoning="Ensure both structural and semantic correctness",
    alternatives=["Schema only", "Business rules only", "ML-based anomaly detection"],
    selected_because="Combined approach catches format errors AND logical inconsistencies",
    confidence=0.91,
    agent_id="validator",
)

# Simulate validation checks with some errors
print("\nüîç Running Validation Checks...")
validation_results = {
    "schema_valid": True,
    "amount_in_range": True,
    "vendor_exists": True,
    "duplicate_check": False,  # Found 3 potential duplicates
    "date_sequence": True,
    "currency_match": False,  # 12 records with mismatched currencies
}

# Log recoverable errors (issues that don't stop processing)
logger.log_error(
    "Found 3 potential duplicate invoices (INV-2024-4521, INV-2024-4522, INV-2024-4523)",
    recoverable=True,  # Can continue, flag for review
)
print("   ‚ö†Ô∏è  [recoverable] Duplicate invoice warning logged")

logger.log_error(
    "12 records have source/target currency mismatch - auto-converted using daily rates",
    recoverable=True,  # Applied automatic fix
)
print("   ‚ö†Ô∏è  [recoverable] Currency mismatch warning logged")

# In a real scenario, you might also encounter fatal errors:
# logger.log_error("Database connection lost during validation", recoverable=False)

# Decision to proceed despite warnings
proceed_decision = logger.log_decision(
    decision="Proceed with flagged records marked for manual review",
    reasoning="Overall data quality (99.9%) exceeds threshold",
    alternatives=["Halt and investigate", "Remove flagged records", "Re-run extraction"],
    selected_because="15 flagged records out of 15420 (0.1%) is within acceptable error margin",
    confidence=0.88,
    agent_id="validator",
)

# Log validation report artifact
validation_artifact = logger.log_artifact(
    artifact_name="validation_report",
    artifact_path=Path("outputs/validation_report.json"),
    artifact_type="report",
    metadata={
        "total_records": 15420,
        "passed": 15405,
        "flagged": 15,
        "error_rate": "0.1%",
        "checks_performed": list(validation_results.keys()),
    },
)

# End with partial status due to warnings
outcome = logger.end_phase(status="partial")

print(f"\n‚ö†Ô∏è Phase Completed with Warnings:")
print(f"   Status: {outcome.status}")
print(f"   Errors Logged: {len(outcome.errors)}")
for error in outcome.errors:
    print(f"      {error[:80]}...")


## 5. Workflow Visualization with Mermaid

The `visualize_workflow()` method generates a **Mermaid diagram** showing:
- All completed phases with their status (success / failure)
- Phase connections (flow order)
- Decision counts per phase
- Color-coded status: green=success, pink=failure, yellow=partial

This visualization is invaluable for workflow status overview and debugging.


In [None]:
# Generate Mermaid visualization of workflow phases

mermaid_diagram = logger.visualize_workflow()

print("=== Workflow Mermaid Diagram ===")
print()
print(mermaid_diagram)
print()
print("Copy the above diagram to mermaid.live or a Mermaid-enabled markdown viewer to render it.")
print()
print("=== Diagram Explanation ===")
print("- Each box represents a completed phase")
print("- Arrows show phase execution order")
print("- Decision counts are shown as connected notes")
print("- Colors indicate status:")
print("    #90EE90 (light green) = success")
print("    #FFB6C1 (light pink) = failure")
print("    #FFE4B5 (moccasin/yellow) = partial")


## 6. Workflow Summary Statistics

The `get_phase_summary()` method provides aggregated statistics across all phases:
- Total and completed phase counts
- Total decisions made
- Total duration
- Per-phase outcomes with details
- Overall workflow status

This is essential for monitoring and compliance reporting.


In [None]:
# Get comprehensive workflow summary

summary = logger.get_phase_summary()

print("=" * 60)
print("WORKFLOW SUMMARY")
print("=" * 60)
print()
print(f"Workflow ID:      {summary.workflow_id}")
print(f"Overall Status:   {summary.overall_status.upper()}")
print(f"Total Phases:     {summary.total_phases}")
print(f"Completed:        {summary.completed_phases}")
print(f"Total Decisions:  {summary.total_decisions}")
print(f"Total Duration:   {summary.total_duration_ms}ms")
print()

print("-" * 60)
print("PHASE-BY-PHASE BREAKDOWN")
print("-" * 60)

for outcome in summary.phase_outcomes:
    status_icon = {"success": "‚úì", "failure": "‚úó", "partial": "‚ö†"}.get(outcome.status, "?")
    print(f"\n{status_icon} {outcome.phase.value.upper()}")
    print(f"    Status:     {outcome.status}")
    print(f"    Duration:   {outcome.duration_ms}ms")
    print(f"    Decisions:  {outcome.decisions_made}")
    print(f"    Artifacts:  {outcome.artifacts_produced}")
    if outcome.errors:
        print(f"    Errors:     {len(outcome.errors)}")
        for err in outcome.errors[:2]:  # Show first 2 errors
            print(f"                {err[:50]}...")


## 7. Querying Decisions and Artifacts by Phase

You can retrieve all decisions and artifacts for a specific phase using:
- `get_phase_decisions(phase)` - Returns list of Decision objects
- `get_phase_artifacts(phase)` - Returns list of Artifact objects

This is useful for auditing specific phases or investigating issues.


In [None]:
# Query decisions from the EXECUTION phase

execution_decisions = logger.get_phase_decisions(WorkflowPhase.EXECUTION)

print("=== EXECUTION Phase Decisions ===")
print()
for dec in execution_decisions:
    print(f"üìã {dec.decision_id}")
    print(f"   Decision:   {dec.decision}")
    print(f"   Reasoning:  {dec.reasoning}")
    print(f"   Confidence: {dec.confidence:.0%}")
    print(f"   Reversible: {dec.reversible}")
    print()

# Query artifacts from DATA_COLLECTION phase
collection_artifacts = logger.get_phase_artifacts(WorkflowPhase.DATA_COLLECTION)

print("=== DATA_COLLECTION Phase Artifacts ===")
print()
for art in collection_artifacts:
    print(f"üìÅ {art.artifact_id}")
    print(f"   Name: {art.name}")
    print(f"   Type: {art.artifact_type}")
    print(f"   Path: {art.path}")
    print()


## 8. Exporting Workflow Logs

The `export_workflow_log()` method creates a comprehensive JSON export containing:
- Workflow summary
- All decisions organized by phase
- All artifacts organized by phase
- All errors organized by phase

This export is suitable for:
- **Compliance auditing** (HIPAA, SOX, GDPR)
- **Post-incident analysis**
- **Long-term archival**
- **Integration with external monitoring systems**


In [None]:
# Export complete workflow log to JSON

export_path = Path.cwd().parent / "cache" / "exports" / "research_workflow_log.json"
logger.export_workflow_log(export_path)

print(f"‚úì Exported workflow log to: {export_path}")
print()

# Show the structure of the exported file
with open(export_path) as f:
    export_data = json.load(f)

print("=== Export Structure ===")
print()
print(f"workflow_id: {export_data['workflow_id']}")
print(f"exported_at: {export_data['exported_at']}")
print()
print("summary:")
print(f"  - total_phases: {export_data['summary']['total_phases']}")
print(f"  - completed_phases: {export_data['summary']['completed_phases']}")
print(f"  - total_decisions: {export_data['summary']['total_decisions']}")
print(f"  - overall_status: {export_data['summary']['overall_status']}")
print()
print("decisions: (organized by phase)")
for phase, decisions in export_data['decisions'].items():
    print(f"  - {phase}: {len(decisions)} decisions")
print()
print("artifacts: (organized by phase)")
for phase, artifacts in export_data['artifacts'].items():
    print(f"  - {phase}: {len(artifacts)} artifacts")
print()
print("errors: (organized by phase)")
for phase, errors in export_data['errors'].items():
    if errors:
        print(f"  - {phase}: {len(errors)} errors")


## 9. Summary

### What We Covered

| Method | Purpose |
|--------|---------|
| `PhaseLogger(workflow_id, storage_path)` | Initialize logger for a workflow |
| `start_phase(phase, metadata)` | Begin a new workflow phase |
| `log_decision(decision, reasoning, alternatives, ...)` | Record a decision with full context |
| `log_artifact(name, path, type, metadata)` | Track produced artifacts |
| `log_error(error, recoverable)` | Log recoverable or fatal errors |
| `end_phase(status)` | Complete current phase with status |
| `get_current_phase()` | Get currently active phase |
| `get_phase_decisions(phase)` | Query decisions for a phase |
| `get_phase_artifacts(phase)` | Query artifacts for a phase |
| `get_phase_summary()` | Get aggregated workflow statistics |
| `visualize_workflow()` | Generate Mermaid diagram |
| `export_workflow_log(path)` | Export complete log to JSON |

### Workflow Phases Demonstrated

1. **PLANNING** - Initial decisions, strategy selection
2. **DATA_COLLECTION** - Source selection, data quality assessment
3. **EXECUTION** - Main processing, runtime decisions
4. **VALIDATION** - Quality checks, error handling

### Key Use Cases

- **Compliance Auditing**: Track every decision for regulatory requirements (HIPAA, SOX)
- **Debugging**: Understand why agents made specific choices
- **Reproducibility**: Document reasoning for workflow outcomes
- **Monitoring**: Real-time visibility into multi-phase workflows

### Next Steps

- Explore [Tutorial 01: Explainability Fundamentals](../tutorials/01_explainability_fundamentals.md) for the conceptual foundation
- See [BlackBoxRecorder Demo](01_black_box_recording_demo.ipynb) for event-level recording
- Check [AgentFacts Demo](02_agent_facts_verification.ipynb) for agent identity verification
- Review [GuardRails Demo](03_guardrails_validation_traces.ipynb) for validation patterns
