# Agent Failure Analysis (Lesson 14)

**Objective:** Analyze agent failure modes, diagnose root causes, and provide remediation strategies.

**Learning Goals:**
- Classify failures into Planning, Execution, and Efficiency categories
- Detect specific failure types (wrong tools, invalid args, timeouts, etc.)
- Calculate failure rates and efficiency metrics
- Generate actionable debugging recommendations

**Prerequisites:**
- `lesson-14/react_agent_implementation.ipynb` - ReAct agent basics
- `backend/agent_evaluation.py` - Validation functions
- `lesson-14/diagrams/agent_failure_modes_taxonomy.mmd` - Failure taxonomy

**Execution Modes:**
- **DEMO mode**: 5-7 failure cases, <$0.40, ~4 min execution
- **FULL mode**: 20 comprehensive cases, <$2, ~7 min execution

---

## Setup and Configuration

In [1]:
# Cell 1: Imports and setup
import json
import time
from collections import Counter, defaultdict
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any

# Execution mode
MODE = "DEMO"  # Change to "FULL" for comprehensive analysis

CONFIG = {
    "DEMO": {
        "num_cases": 7,
        "estimated_cost": "$0.30-0.40",
        "estimated_time": "3-5 minutes"
    },
    "FULL": {
        "num_cases": 20,
        "estimated_cost": "$1.50-2.00",
        "estimated_time": "6-8 minutes"
    }
}

config = CONFIG[MODE]
print(f"üîß Mode: {MODE}")
print(f"üìä Test Cases: {config['num_cases']}")
print(f"üí∞ Est. Cost: {config['estimated_cost']}")
print(f"‚è±Ô∏è  Est. Time: {config['estimated_time']}")

üîß Mode: DEMO
üìä Test Cases: 7
üí∞ Est. Cost: $0.30-0.40
‚è±Ô∏è  Est. Time: 3-5 minutes


## Failure Classification System

Implement functions to classify and diagnose agent failures.

In [2]:
# Cell 2: Failure classification

@dataclass
class FailureAnalysis:
    """Results of failure mode analysis."""
    category: str  # PLANNING, EXECUTION, EFFICIENCY
    specific_type: str  # e.g., WRONG_TOOL, TIMEOUT, EXCESSIVE_STEPS
    severity: str  # LOW, MEDIUM, HIGH, CRITICAL
    description: str
    root_cause: str
    remediation: list[str]
    metrics: dict = field(default_factory=dict)

class FailureModeAnalyzer:
    """Analyzer for agent failure modes."""
    
    def __init__(self, tools: dict = None):
        self.tools = tools or {}
    
    def classify_failure(self, result: dict, expected: dict = None) -> str:
        """Classify failure into main category.
        
        Returns:
            Category: PLANNING, EXECUTION, EFFICIENCY, or SUCCESS
        """
        # Check planning validity
        if result.get("metrics", {}).get("errors", 0) > 0:
            # Check if errors were due to planning or execution
            trajectory = result.get("trajectory", [])
            for entry in trajectory:
                if entry.get("type") == "observation" and "Error" in entry.get("content", ""):
                    if "not found" in entry.get("content", "").lower():
                        return "PLANNING"  # Wrong tool selection
                    elif "timeout" in entry.get("content", "").lower():
                        return "EXECUTION"  # Execution failure
        
        # Check if task completed
        if not result.get("metrics", {}).get("completed", False):
            return "PLANNING"
        
        # Check efficiency
        steps = result.get("metrics", {}).get("steps", 0)
        if steps > 5:  # Arbitrary threshold
            return "EFFICIENCY"
        
        return "SUCCESS"
    
    def diagnose_planning_failure(self, result: dict) -> list[str]:
        """Diagnose specific planning failure types."""
        issues = []
        trajectory = result.get("trajectory", [])
        
        for entry in trajectory:
            if entry.get("type") == "action":
                tool = entry.get("tool", "")
                args = entry.get("args", {})
                
                # Check if tool exists
                if tool not in self.tools:
                    issues.append(f"WRONG_TOOL: {tool} not available")
                
                # Check for type errors in args
                for key, value in args.items():
                    if isinstance(value, str) and value.isdigit():
                        issues.append(f"INVALID_ARG_TYPE: {key} should be int, got string")
        
        # Check for incomplete plans
        if result.get("metrics", {}).get("steps", 0) == 0:
            issues.append("INCOMPLETE_PLAN: No steps executed")
        
        return issues
    
    def diagnose_execution_failure(self, result: dict) -> list[str]:
        """Diagnose specific execution failure types."""
        issues = []
        
        for obs in result.get("observations", []):
            content = obs.get("observation", "").lower()
            
            if "timeout" in content:
                issues.append("TIMEOUT: Tool execution exceeded time limit")
            elif "503" in content or "unavailable" in content:
                issues.append("SERVICE_UNAVAILABLE: External service down")
            elif "429" in content or "rate limit" in content:
                issues.append("RATE_LIMIT: API rate limit exceeded")
            elif "invalid response" in content or "parse" in content:
                issues.append("INVALID_RESPONSE: Unexpected response format")
        
        return issues
    
    def diagnose_efficiency_issue(self, result: dict, optimal_steps: int = 2) -> list[str]:
        """Diagnose efficiency issues."""
        issues = []
        
        actual_steps = result.get("metrics", {}).get("steps", 0)
        
        # Excessive steps
        if actual_steps > optimal_steps * 2:
            issues.append(f"EXCESSIVE_STEPS: {actual_steps} steps (optimal: {optimal_steps})")
        
        # Detect redundant actions
        action_counts = Counter()
        for entry in result.get("trajectory", []):
            if entry.get("type") == "action":
                tool = entry.get("tool")
                action_counts[tool] += 1
        
        for tool, count in action_counts.items():
            if count > 1:
                issues.append(f"REDUNDANT_ACTION: {tool} called {count} times")
        
        return issues
    
    def analyze(self, result: dict, expected: dict = None, optimal_steps: int = 2) -> FailureAnalysis:
        """Complete failure analysis.
        
        Args:
            result: Agent execution result
            expected: Expected output (optional)
            optimal_steps: Optimal plan length
        
        Returns:
            FailureAnalysis object with diagnosis and remediation
        """
        category = self.classify_failure(result, expected)
        
        if category == "SUCCESS":
            return FailureAnalysis(
                category="SUCCESS",
                specific_type="NONE",
                severity="LOW",
                description="Task completed successfully",
                root_cause="N/A",
                remediation=[],
                metrics=result.get("metrics", {})
            )
        
        # Diagnose specific issues
        if category == "PLANNING":
            issues = self.diagnose_planning_failure(result)
            specific_type = issues[0].split(":")[0] if issues else "UNKNOWN"
            remediation = [
                "Add validator agent before execution",
                "Improve prompt with few-shot examples",
                "Add schema validation for tool arguments",
                "Use LLM-as-judge for goal alignment check"
            ]
            root_cause = "LLM reasoning error, insufficient context, or prompt ambiguity"
            
        elif category == "EXECUTION":
            issues = self.diagnose_execution_failure(result)
            specific_type = issues[0].split(":")[0] if issues else "UNKNOWN"
            remediation = [
                "Implement retry with exponential backoff",
                "Add fallback tools for critical services",
                "Use circuit breaker pattern",
                "Add graceful degradation with cached results"
            ]
            root_cause = "External service failure, network issues, or resource constraints"
            
        else:  # EFFICIENCY
            issues = self.diagnose_efficiency_issue(result, optimal_steps)
            specific_type = issues[0].split(":")[0] if issues else "UNKNOWN"
            remediation = [
                "Optimize plan to reduce redundant actions",
                "Add step count budget enforcement",
                "Reorder actions for better efficiency",
                "Cache repeated tool calls"
            ]
            root_cause = "Suboptimal planning or lack of optimization"
        
        # Determine severity
        if category == "PLANNING" and "WRONG_TOOL" in specific_type:
            severity = "HIGH"
        elif category == "EXECUTION":
            severity = "CRITICAL"
        elif category == "EFFICIENCY":
            severity = "MEDIUM"
        else:
            severity = "LOW"
        
        description = "; ".join(issues) if issues else "Unknown failure"
        
        return FailureAnalysis(
            category=category,
            specific_type=specific_type,
            severity=severity,
            description=description,
            root_cause=root_cause,
            remediation=remediation,
            metrics=result.get("metrics", {})
        )

print("‚úÖ FailureModeAnalyzer class defined")

‚úÖ FailureModeAnalyzer class defined


## Test Cases: Failure Scenarios

Define diverse failure scenarios for analysis.

In [3]:
# Cell 3: Test failure cases

# Simulated agent results with different failure modes
DEMO_CASES = [
    {
        "id": "planning_001",
        "query": "Find vegan recipes",
        "result": {
            "answer": "No results found",
            "trajectory": [
                {"type": "thought", "content": "I should search for recipes", "step": 0},
                {"type": "action", "tool": "search_web", "args": {"query": "vegan recipes"}, "step": 0},
                {"type": "observation", "content": "Error: Tool 'search_web' not found. Available: ['search_recipes', 'get_recipe_details']", "step": 0}
            ],
            "observations": [{"step": 0, "observation": "Error: Tool not found", "status": "error"}],
            "metrics": {"steps": 1, "completed": False, "errors": 1, "execution_time": 2.1}
        },
        "expected_category": "PLANNING",
        "expected_type": "WRONG_TOOL",
        "optimal_steps": 1
    },
    {
        "id": "planning_002",
        "query": "Get recipe with ID 5",
        "result": {
            "answer": "Failed to retrieve recipe",
            "trajectory": [
                {"type": "action", "tool": "get_recipe_details", "args": {"recipe_id": "5"}, "step": 0},
                {"type": "observation", "content": "Error: recipe_id expects int, got str", "step": 0}
            ],
            "observations": [{"step": 0, "observation": "Type error", "status": "error"}],
            "metrics": {"steps": 1, "completed": False, "errors": 1, "execution_time": 1.5}
        },
        "expected_category": "PLANNING",
        "expected_type": "INVALID_ARG_TYPE",
        "optimal_steps": 1
    },
    {
        "id": "execution_001",
        "query": "Search for pasta recipes",
        "result": {
            "answer": "Search failed due to timeout",
            "trajectory": [
                {"type": "action", "tool": "search_recipes", "args": {"ingredients": ["pasta"]}, "step": 0},
                {"type": "observation", "content": "Error: Connection timeout after 30s", "step": 0}
            ],
            "observations": [{"step": 0, "observation": "timeout", "status": "error"}],
            "metrics": {"steps": 1, "completed": False, "errors": 1, "execution_time": 30.5}
        },
        "expected_category": "EXECUTION",
        "expected_type": "TIMEOUT",
        "optimal_steps": 1
    },
    {
        "id": "execution_002",
        "query": "Find Italian recipes",
        "result": {
            "answer": "Service temporarily unavailable",
            "trajectory": [
                {"type": "action", "tool": "search_recipes", "args": {"cuisine": "Italian"}, "step": 0},
                {"type": "observation", "content": "Error 503: Service unavailable", "step": 0}
            ],
            "observations": [{"step": 0, "observation": "503 unavailable", "status": "error"}],
            "metrics": {"steps": 1, "completed": False, "errors": 1, "execution_time": 5.2}
        },
        "expected_category": "EXECUTION",
        "expected_type": "SERVICE_UNAVAILABLE",
        "optimal_steps": 1
    },
    {
        "id": "efficiency_001",
        "query": "Find keto recipes",
        "result": {
            "answer": "Found 2 keto recipes",
            "trajectory": [
                {"type": "action", "tool": "search_recipes", "args": {}, "step": 0},
                {"type": "observation", "content": "Found 500 recipes", "step": 0},
                {"type": "action", "tool": "search_recipes", "args": {"dietary_restrictions": ["keto"]}, "step": 1},
                {"type": "observation", "content": "Found 2 recipes", "step": 1},
                {"type": "action", "tool": "search_recipes", "args": {"dietary_restrictions": ["keto"]}, "step": 2},
                {"type": "observation", "content": "Found 2 recipes", "step": 2}
            ],
            "observations": [{"step": i, "observation": "Success", "status": "success"} for i in range(3)],
            "metrics": {"steps": 3, "completed": True, "errors": 0, "execution_time": 8.5}
        },
        "expected_category": "EFFICIENCY",
        "expected_type": "REDUNDANT_ACTION",
        "optimal_steps": 1
    },
    {
        "id": "efficiency_002",
        "query": "Get Thai recipe details",
        "result": {
            "answer": "Found Thai Green Curry",
            "trajectory": [
                {"type": "action", "tool": "search_recipes", "args": {"cuisine": "Thai"}, "step": i}
                for i in range(8)
            ],
            "observations": [{"step": i, "observation": "Success", "status": "success"} for i in range(8)],
            "metrics": {"steps": 8, "completed": True, "errors": 0, "execution_time": 15.2}
        },
        "expected_category": "EFFICIENCY",
        "expected_type": "EXCESSIVE_STEPS",
        "optimal_steps": 2
    },
    {
        "id": "success_001",
        "query": "Find gluten-free recipes",
        "result": {
            "answer": "Found 1 gluten-free recipe: Gluten-Free Pizza",
            "trajectory": [
                {"type": "action", "tool": "search_recipes", "args": {"dietary_restrictions": ["gluten-free"]}, "step": 0},
                {"type": "observation", "content": "Found 1 recipe", "step": 0}
            ],
            "observations": [{"step": 0, "observation": "Success", "status": "success"}],
            "metrics": {"steps": 1, "completed": True, "errors": 0, "execution_time": 2.3}
        },
        "expected_category": "SUCCESS",
        "expected_type": "NONE",
        "optimal_steps": 1
    }
]

# Extended cases for FULL mode
FULL_CASES = DEMO_CASES + [
    # Additional planning failures
    {"id": f"planning_{i:03d}", "query": f"Test query {i}", "result": {"metrics": {"steps": 0, "completed": False, "errors": 1}}, "expected_category": "PLANNING", "optimal_steps": 1}
    for i in range(3, 8)
] + [
    # Additional execution failures
    {"id": f"execution_{i:03d}", "query": f"Test query {i}", "result": {"observations": [{"observation": "rate limit"}], "metrics": {"steps": 1, "completed": False, "errors": 1}}, "expected_category": "EXECUTION", "optimal_steps": 1}
    for i in range(3, 8)
] + [
    # Additional efficiency issues
    {"id": f"efficiency_{i:03d}", "query": f"Test query {i}", "result": {"trajectory": [], "metrics": {"steps": 10, "completed": True, "errors": 0}}, "expected_category": "EFFICIENCY", "optimal_steps": 2}
    for i in range(3, 8)
]

test_cases = DEMO_CASES if MODE == "DEMO" else FULL_CASES

print(f"üìã Loaded {len(test_cases)} test cases")
print(f"\nCategories:")
categories = Counter([c.get("expected_category") for c in test_cases])
for cat, count in categories.items():
    print(f"  {cat}: {count}")

üìã Loaded 7 test cases

Categories:
  PLANNING: 2
  EXECUTION: 2
  EFFICIENCY: 2
  SUCCESS: 1


## Execute Failure Analysis

Analyze all test cases and collect diagnostics.

In [4]:
# Cell 4: Execute analysis

print(f"üöÄ Starting failure analysis ({MODE} mode)...\n")

TOOLS_MOCK = {
    "search_recipes": {"function": lambda **kwargs: []},
    "get_recipe_details": {"function": lambda **kwargs: {}},
    "add_to_shopping_list": {"function": lambda **kwargs: {}}
}

analyzer = FailureModeAnalyzer(tools=TOOLS_MOCK)
analyses = []
start_time = time.time()

for i, case in enumerate(test_cases, 1):
    print(f"\n{'='*80}")
    print(f"Case {i}/{len(test_cases)}: {case['id']}")
    print(f"Query: {case['query']}")
    print(f"{'='*80}")
    
    # Analyze
    analysis = analyzer.analyze(
        result=case["result"],
        optimal_steps=case.get("optimal_steps", 2)
    )
    
    analyses.append({
        "case_id": case["id"],
        "query": case["query"],
        "analysis": analysis,
        "expected": case.get("expected_category", "UNKNOWN")
    })
    
    # Display results
    print(f"\nüìä Analysis:")
    print(f"   Category: {analysis.category}")
    print(f"   Type: {analysis.specific_type}")
    print(f"   Severity: {analysis.severity}")
    print(f"   Description: {analysis.description}")
    print(f"\nüîç Root Cause:\n   {analysis.root_cause}")
    print(f"\nüí° Remediation (top 2):")
    for j, rec in enumerate(analysis.remediation[:2], 1):
        print(f"   {j}. {rec}")
    
    # Verify accuracy
    expected = case.get("expected_category")
    match = "‚úÖ" if analysis.category == expected else "‚ùå"
    print(f"\n{match} Expected: {expected}, Got: {analysis.category}")

total_time = time.time() - start_time

print(f"\n\n{'='*80}")
print(f"‚úÖ Analysis Complete")
print(f"{'='*80}")
print(f"Total cases: {len(analyses)}")
print(f"Total time: {total_time:.2f}s")

üöÄ Starting failure analysis (DEMO mode)...


Case 1/7: planning_001
Query: Find vegan recipes

üìä Analysis:
   Category: PLANNING
   Type: WRONG_TOOL
   Severity: HIGH
   Description: WRONG_TOOL: search_web not available

üîç Root Cause:
   LLM reasoning error, insufficient context, or prompt ambiguity

üí° Remediation (top 2):
   1. Add validator agent before execution
   2. Improve prompt with few-shot examples

‚úÖ Expected: PLANNING, Got: PLANNING

Case 2/7: planning_002
Query: Get recipe with ID 5

üìä Analysis:
   Category: PLANNING
   Type: INVALID_ARG_TYPE
   Severity: LOW
   Description: INVALID_ARG_TYPE: recipe_id should be int, got string

üîç Root Cause:
   LLM reasoning error, insufficient context, or prompt ambiguity

üí° Remediation (top 2):
   1. Add validator agent before execution
   2. Improve prompt with few-shot examples

‚úÖ Expected: PLANNING, Got: PLANNING

Case 3/7: execution_001
Query: Search for pasta recipes

üìä Analysis:
   Category: EXECUTION
 

## Generate Failure Report

Aggregate results and calculate failure rates.

In [5]:
# Cell 5: Generate report

print("üìä Failure Mode Analysis Report\n")

# Category distribution
categories = Counter([a["analysis"].category for a in analyses])
total = len(analyses)

print(f"Category Distribution:")
for cat, count in categories.most_common():
    pct = count / total * 100
    print(f"  {cat}: {count} ({pct:.1f}%)")

# Specific failure types
types = Counter([a["analysis"].specific_type for a in analyses if a["analysis"].category != "SUCCESS"])

print(f"\nTop Failure Types:")
for failure_type, count in types.most_common(5):
    print(f"  {failure_type}: {count}")

# Severity distribution
severities = Counter([a["analysis"].severity for a in analyses])

print(f"\nSeverity Distribution:")
for sev, count in sorted(severities.items(), key=lambda x: ["LOW", "MEDIUM", "HIGH", "CRITICAL"].index(x[0])):
    print(f"  {sev}: {count}")

# Classification accuracy
correct = sum(1 for a in analyses if a["analysis"].category == a["expected"])
accuracy = correct / total

print(f"\nClassification Accuracy:")
print(f"  Correct: {correct}/{total} ({accuracy:.1%})")

# Failure rates by category
planning_failures = categories.get("PLANNING", 0)
execution_failures = categories.get("EXECUTION", 0)
efficiency_issues = categories.get("EFFICIENCY", 0)
successes = categories.get("SUCCESS", 0)

print(f"\nFailure Rates:")
print(f"  Planning failure rate: {planning_failures/total:.1%}")
print(f"  Execution failure rate: {execution_failures/total:.1%}")
print(f"  Efficiency issue rate: {efficiency_issues/total:.1%}")
print(f"  Success rate: {successes/total:.1%}")

# Most common remediations
all_remediations = []
for a in analyses:
    if a["analysis"].category != "SUCCESS":
        all_remediations.extend(a["analysis"].remediation)

top_remediations = Counter(all_remediations).most_common(5)

print(f"\nTop Recommended Actions:")
for i, (action, count) in enumerate(top_remediations, 1):
    print(f"  {i}. {action} ({count} cases)")

üìä Failure Mode Analysis Report

Category Distribution:
  PLANNING: 3 (42.9%)
  SUCCESS: 2 (28.6%)
  EXECUTION: 1 (14.3%)
  EFFICIENCY: 1 (14.3%)

Top Failure Types:
  WRONG_TOOL: 1
  INVALID_ARG_TYPE: 1
  TIMEOUT: 1
  UNKNOWN: 1
  EXCESSIVE_STEPS: 1

Severity Distribution:
  LOW: 4
  MEDIUM: 1
  HIGH: 1
  CRITICAL: 1

Classification Accuracy:
  Correct: 5/7 (71.4%)

Failure Rates:
  Planning failure rate: 42.9%
  Execution failure rate: 14.3%
  Efficiency issue rate: 14.3%
  Success rate: 28.6%

Top Recommended Actions:
  1. Add validator agent before execution (3 cases)
  2. Improve prompt with few-shot examples (3 cases)
  3. Add schema validation for tool arguments (3 cases)
  4. Use LLM-as-judge for goal alignment check (3 cases)
  5. Implement retry with exponential backoff (1 cases)


## Save Results

Export analysis results for dashboard integration.

In [6]:
# Cell 6: Save results

output_dir = Path("lesson-14/results")
output_dir.mkdir(parents=True, exist_ok=True)

# Prepare output data
output_data = {
    "metadata": {
        "mode": MODE,
        "num_cases": len(analyses),
        "execution_date": time.strftime("%Y-%m-%d %H:%M:%S"),
        "total_time": total_time
    },
    "summary": {
        "planning_failure_rate": planning_failures / total,
        "execution_failure_rate": execution_failures / total,
        "efficiency_issue_rate": efficiency_issues / total,
        "success_rate": successes / total,
        "classification_accuracy": accuracy
    },
    "category_distribution": dict(categories),
    "failure_types": dict(types),
    "severity_distribution": dict(severities),
    "top_remediations": [r[0] for r in top_remediations],
    "detailed_analyses": [
        {
            "case_id": a["case_id"],
            "query": a["query"],
            "category": a["analysis"].category,
            "specific_type": a["analysis"].specific_type,
            "severity": a["analysis"].severity,
            "description": a["analysis"].description,
            "root_cause": a["analysis"].root_cause,
            "remediation": a["analysis"].remediation
        }
        for a in analyses[:10]  # First 10 for brevity
    ]
}

output_path = output_dir / f"agent_failure_analysis_{MODE.lower()}.json"

with open(output_path, "w", encoding="utf-8") as f:
    json.dump(output_data, f, indent=2, ensure_ascii=False)

print(f"‚úÖ Results saved to: {output_path}")
print(f"üìÅ File size: {output_path.stat().st_size / 1024:.1f} KB")

# Save agent_performance.json for dashboard
dashboard_data = {
    "version": "1.0",
    "created": time.strftime("%Y-%m-%d"),
    "mode": MODE,
    "performance_metrics": {
        "overall_success_rate": successes / total,
        "planning_accuracy": 1 - (planning_failures / total),
        "execution_reliability": 1 - (execution_failures / total),
        "efficiency_score": 1 - (efficiency_issues / total),
        "classification_accuracy": accuracy
    },
    "failure_breakdown": {
        "planning_failures": planning_failures,
        "execution_failures": execution_failures,
        "efficiency_issues": efficiency_issues,
        "successes": successes
    },
    "top_issues": [f"{t}: {c}" for t, c in types.most_common(5)],
    "recommended_actions": [r[0] for r in top_remediations[:5]]
}

dashboard_path = output_dir / "agent_performance.json"
with open(dashboard_path, "w", encoding="utf-8") as f:
    json.dump(dashboard_data, f, indent=2, ensure_ascii=False)

print(f"‚úÖ Dashboard data saved to: {dashboard_path}")
print("\nüéâ Notebook execution complete!")

‚úÖ Results saved to: lesson-14/results/agent_failure_analysis_demo.json
üìÅ File size: 4.3 KB
‚úÖ Dashboard data saved to: lesson-14/results/agent_performance.json

üéâ Notebook execution complete!


## Validation

Verify analysis quality and accuracy.

In [7]:
# Cell 7: Validation

print("üîç Validating analysis quality...\n")

checks = [
    ("All cases analyzed", len(analyses) == len(test_cases)),
    ("Classification accuracy ‚â•70%", accuracy >= 0.7),
    ("At least 3 failure categories detected", len(categories) >= 3),
    ("Severity levels assigned", len(severities) > 0),
    ("Remediations provided for failures", len(all_remediations) > 0),
    ("Execution time reasonable", total_time < 300),  # <5 minutes
    ("Results saved successfully", output_path.exists() and dashboard_path.exists())
]

passed = 0
for check_name, result in checks:
    status = "‚úÖ" if result else "‚ùå"
    print(f"{status} {check_name}")
    if result:
        passed += 1

print(f"\nüìä Validation: {passed}/{len(checks)} checks passed ({passed/len(checks)*100:.1f}%)")

if passed == len(checks):
    print("\nüéâ All validation checks passed!")
elif passed >= len(checks) * 0.8:
    print("\n‚ö†Ô∏è  Most checks passed, minor issues detected")
else:
    print("\n‚ùå Multiple validation failures - review results")

print("\n" + "="*80)
print("Analysis complete. Results saved to lesson-14/results/")
print("="*80)

üîç Validating analysis quality...

‚úÖ All cases analyzed
‚úÖ Classification accuracy ‚â•70%
‚úÖ At least 3 failure categories detected
‚úÖ Severity levels assigned
‚úÖ Remediations provided for failures
‚úÖ Execution time reasonable
‚úÖ Results saved successfully

üìä Validation: 7/7 checks passed (100.0%)

üéâ All validation checks passed!

Analysis complete. Results saved to lesson-14/results/
