# NoETL Execution Validation Notebook

**Validate a playbook execution by inspecting server APIs, database tables, and local logs.**

This notebook has numbered cells to help diagnose where the distributed loop execution chain breaks:

## Cell Reference Guide:
- **Cell 1-3**: Configuration and setup
- **Cell 4-5**: Basic execution and event validation  
- **Cell 6**: LOOP EVENT VALIDATION - Check if loop events are emitted
- **Cell 6B**: STEP RESULT TRACKING - Verify step_name.result patterns and loop aggregation
- **Cell 7**: CHAIN ANALYSIS - Track complete loopâ†’childâ†’completion flow
- **Cell 8**: DATABASE VALIDATION - Query DB directly for loop events
- **Cell 9-10**: Manual intervention for child completion and loop completion
- **Cell 11**: TROUBLESHOOTING GUIDE - Common issues and fixes
- **Cell 12**: FINAL EXECUTION SUMMARY - Overall health check

## What Each Cell Tells You:
- Uses HTTP API to fetch execution summary and events
- Queries Postgres "noetl.error_log" for recent errors  
- Reads final table "weather_alert_summary" for inserted rows
- Analyzes distributed loop completion chain step-by-step
- Validates step result patterns (step_name.result) and aggregation logic
- Tracks loop execution and confirms last returned items for aggregated results
- Provides specific diagnostics for empty weather_alert_summary table

## Key Validation Points:
- **Step Results**: Ensures child executions return data using `return: {{ step_name.result }}` pattern
- **Loop Aggregation**: Verifies loop completion mechanism properly aggregates child results
- **Template Resolution**: Checks that `{{ city_loop.result }}` resolves correctly in subsequent steps
- **Data Flow**: Tracks weather data from child executions through aggregation to final storage

In [1]:
# Cell 1: Configuration
import os, json, time, pathlib

# Server
HOST = os.environ.get('NOETL_HOST', 'localhost')
PORT = int(os.environ.get('NOETL_PORT', '8082'))
BASE = f'http://{HOST}:{PORT}/api'

# Database parameters â€” updated with correct connection details
PGHOST = 'localhost'
PGPORT = 30543
PGUSER = 'demo'
PGPASSWORD = 'demo'
PGDATABASE = 'demo_noetl'

# If you already know the execution id, set it here (as string or int)
EXECUTION_ID = os.environ.get('NOETL_LAST_EXECUTION_ID') or ''

# Resolve logs directory robustly when running from notebooks/
LOGS_DIR_ENV = os.environ.get('NOETL_LOG_DIR')
LOGS_DIR_CANDIDATES = []
if LOGS_DIR_ENV: LOGS_DIR_CANDIDATES.append(pathlib.Path(LOGS_DIR_ENV))
LOGS_DIR_CANDIDATES += [pathlib.Path('logs'), pathlib.Path('../logs'), pathlib.Path('../../logs')]
LOGS_DIR = next((p for p in LOGS_DIR_CANDIDATES if p.exists()), pathlib.Path('logs'))

print('Server:', BASE)
print('DB (updated):', PGHOST, PGPORT, PGUSER, PGDATABASE)
print('Logs dir:', str(LOGS_DIR), 'exists:', LOGS_DIR.exists())

Server: http://localhost:8082/api
DB (updated): localhost 30543 demo demo_noetl
Logs dir: ../logs exists: True


In [3]:
# Cell 2: Helper HTTP GET with stdlib fallback
def http_get_json(url: str):
    try:
        import requests  # type: ignore
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        return r.json()
    except Exception:
        import urllib.request, urllib.error
        try:
            with urllib.request.urlopen(url, timeout=30) as resp:
                return json.loads(resp.read().decode('utf-8'))
        except Exception as e:
            print('HTTP error:', e)
            return None

def pretty(obj):
    print(json.dumps(obj, indent=2, ensure_ascii=False))

In [2]:
# Cell 3: Try to infer EXECUTION_ID if not provided
if not EXECUTION_ID:
    # 1) Try status.json in repo root
    st = pathlib.Path('status.json')
    if st.exists():
        try:
            data = json.loads(st.read_text(encoding='utf-8'))
            eid = data.get('id') or data.get('execution_id')
            if eid:
                EXECUTION_ID = str(eid)
        except Exception:
            pass

# 2) Try server /executions to pick the most recent
if not EXECUTION_ID:
    ex_list = http_get_json(f'{BASE}/executions')
    if isinstance(ex_list, list) and ex_list:
        # Items are dicts; keep first
        eid = ex_list[0].get('execution_id') or ex_list[0].get('id')
        if eid:
            EXECUTION_ID = str(eid)

print('EXECUTION_ID =', EXECUTION_ID or '(set me)')

NameError: name 'http_get_json' is not defined

In [70]:
# Cell 4: Fetch execution summary
if EXECUTION_ID:
    summary = http_get_json(f'{BASE}/executions/{EXECUTION_ID}')
    print('Execution summary:')
    pretty(summary)
    # Extract DB connection from the execution's workload (execution_start event)
    try:
        evs = (summary or {}).get('events') or []
        start = next((e for e in evs if e.get('event_type')=='execution_start'), None)
        if start:
            wl = ((start.get('input_context') or {}).get('workload')) or {}
            PGHOST = wl.get('pg_host') or PGHOST
            PGPORT = int(wl.get('pg_port') or PGPORT or 0)
            PGUSER = wl.get('pg_user') or PGUSER
            PGPASSWORD = wl.get('pg_password') or PGPASSWORD
            PGDATABASE = wl.get('pg_db') or PGDATABASE
    except Exception as e:
        print('Failed to extract DB params from workload:', e)
    print('DB resolved:', PGHOST or '(unset)', PGPORT or '(unset)', PGUSER or '(unset)', PGDATABASE or '(unset)')
else:
    print('Please set EXECUTION_ID above.')

Execution summary:
{
  "id": "222504588622692352",
  "playbook_id": "",
  "playbook_name": "Unknown",
  "status": "completed",
  "start_time": "2025-09-05T18:53:47.866730",
  "end_time": "2025-09-05T18:53:51.270330",
  "duration": 3.4036,
  "progress": 100,
  "result": {
    "id": "548ea210-91fe-453f-af42-93cafa7ee842",
    "status": "success",
    "data": {
      "global_alert": false,
      "summary": {
        "alert_cities": [],
        "count": 0
      }
    }
  },
  "error": null,
  "events": [
    {
      "event_id": "222504588639469568",
      "event_type": "execution_start",
      "node_id": "222504588639469568",
      "node_name": "weather_loop_example",
      "node_type": "playbook",
      "status": "in_progress",
      "duration": 0.0,
      "timestamp": "2025-09-05T18:53:47.866730",
      "input_context": {
        "path": "examples/weather/weather_loop_example",
        "version": "0.1.0",
        "workload": {
          "jobId": "{{ job.uuid }}",
          "state": "read

In [71]:
# Cell 6: LOOP EVENT VALIDATION - Check if end_loop and execution_complete events exist
# DIAGNOSTIC: This identifies missing events that indicate where the chain breaks

if events and isinstance(events, dict):
    ev = events.get('events') or []
    
    print("LOOP EVENT VALIDATION")
    print("=" * 50)
    
    # Check for end_loop events
    end_loop_events = [e for e in ev if e.get('event_type') == 'end_loop']
    print(f"End_loop events found: {len(end_loop_events)}")
    
    if len(end_loop_events) == 0:
        print("ISSUE: No end_loop events found - loop completion mechanism not triggered!")
        print("   â†’ Loop iterations may have completed but didn't aggregate results")
        print("   â†’ Child executions may be missing execution_complete events")
        print("   â†’ Run Cell 7 for detailed chain analysis")
    else:
        for e in end_loop_events:
            node_name = e.get('node_name')
            status = e.get('status') 
            result_available = bool(e.get('output_result'))
            print(f"   Loop: {node_name}, Status: {status}, Has result: {result_available}")
    
    # Check for execution_complete events from child executions
    execution_complete_events = [e for e in ev if e.get('event_type') == 'execution_complete']
    print(f"\nExecution_complete events found: {len(execution_complete_events)}")
    
    if len(execution_complete_events) == 0:
        print("ISSUE: No execution_complete events found!")
        print("   â†’ Child executions may have finished but didn't emit completion events")
        print("   â†’ This prevents loop completion mechanism from triggering")
        print("   â†’ Run Cell 9 to manually trigger child completion")
    else:
        for e in execution_complete_events:
            exec_id = e.get('execution_id')
            status = e.get('status')
            return_value = e.get('output_result')
            print(f"   Execution {exec_id}: {status}")
            if return_value:
                print(f"      Return value: {json.dumps(return_value, indent=2)}")
    
    # Check for city_loop specific completion
    city_loop_completed = [e for e in ev 
                          if e.get('event_type') == 'action_completed' 
                          and e.get('node_name') == 'city_loop']
    
    print(f"\nCity_loop completion events: {len(city_loop_completed)}")
    
    if len(city_loop_completed) == 0:
        print("ISSUE: No city_loop completion events found!")
        print("   â†’ Loop didn't complete successfully")
        print("   â†’ Aggregated results not available for next steps")
        print("   â†’ Run Cell 10 to manually trigger loop completion")
    else:
        for e in city_loop_completed:
            status = e.get('status')
            result = e.get('output_result')
            print(f"   Status: {status}")
            if result:
                print(f"   Aggregated result: {json.dumps(result, indent=2)}")
    
    # Overall assessment
    print(f"\nOVERALL ASSESSMENT:")
    if len(end_loop_events) > 0 and len(execution_complete_events) > 0 and len(city_loop_completed) > 0:
        print("SUCCESS: All critical loop events found - distributed loop completed correctly")
    else:
        missing = []
        if len(execution_complete_events) == 0: missing.append("execution_complete")
        if len(end_loop_events) == 0: missing.append("end_loop") 
        if len(city_loop_completed) == 0: missing.append("city_loop completion")
        print(f"ISSUES: Missing events: {', '.join(missing)}")
        print("        â†’ This explains why weather_alert_summary table is empty")
        print("        â†’ Use manual intervention cells (9-10) to fix")

else:
    print('Events not loaded - cannot validate loop events.')

LOOP EVENT VALIDATION
End_loop events found: 1
   Loop: city_loop, Status: TRACKING, Has result: False

Execution_complete events found: 0
ISSUE: No execution_complete events found!
   â†’ Child executions may have finished but didn't emit completion events
   â†’ This prevents loop completion mechanism from triggering
   â†’ Run Cell 9 to manually trigger child completion

City_loop completion events: 3
   Status: COMPLETED
   Status: COMPLETED
   Status: COMPLETED

OVERALL ASSESSMENT:
ISSUES: Missing events: execution_complete
        â†’ This explains why weather_alert_summary table is empty
        â†’ Use manual intervention cells (9-10) to fix


In [72]:
# Quick inline validation:
# - Count loop iterations for city_loop
# - Ensure at least one COMPLETED action exists
if events and isinstance(events, dict):
    ev = events.get('events') or []
    loop_iters = [e for e in ev if e.get('event_type') == 'loop_iteration' and e.get('node_name') == 'city_loop']
    completed = [e for e in ev if (e.get('status') or '').lower() in ('completed','success')]
    print('city_loop.iterations =', len(loop_iters))
    print('completed events =', len(completed))
else:
    print('Events not loaded.')


city_loop.iterations = 3
completed events = 16


In [73]:
# Cell 6B: STEP RESULT TRACKING - Verify step_name.result patterns and loop aggregation
# DIAGNOSTIC: Ensures steps return proper .result values and loops aggregate correctly

if events and isinstance(events, dict):
    ev = events.get('events') or []
    
    print("STEP RESULT TRACKING AND LOOP AGGREGATION ANALYSIS")
    print("=" * 60)
    
    # 1. Track child step results (should follow step_name.result pattern)
    print("Step 1 - Child Step Result Analysis:")
    print("-" * 40)
    
    # Find child executions and their returned results
    loop_iterations = [e for e in ev if e.get('event_type') == 'loop_iteration' and e.get('node_name') == 'city_loop']
    child_execution_ids = []
    for e in loop_iterations:
        ctx = e.get('input_context', {})
        child_id = ctx.get('child_execution_id')
        if child_id:
            child_execution_ids.append(child_id)
    
    print(f"Found {len(child_execution_ids)} child executions to analyze")
    
    child_results = {}
    for child_id in child_execution_ids:
        # Look for execution_complete event with return value
        child_complete = [e for e in ev if e.get('execution_id') == child_id and e.get('event_type') == 'execution_complete']
        
        if child_complete:
            latest_complete = child_complete[-1]
            return_value = latest_complete.get('output_result')
            child_results[child_id] = return_value
            
            print(f"   Child {child_id}:")
            if return_value:
                print(f"      Return value: {json.dumps(return_value, indent=6)}")
                
                # Check if it follows step_name.result pattern
                if isinstance(return_value, dict):
                    step_keys = [k for k in return_value.keys() if k.endswith('.result') or '.' not in k]
                    if step_keys:
                        print(f"      Step result keys: {step_keys}")
                        for key in step_keys:
                            step_result = return_value.get(key)
                            if step_result:
                                print(f"         {key}: {json.dumps(step_result, indent=8)}")
                    else:
                        print("      WARNING: No step.result pattern found in return value")
                else:
                    print(f"      WARNING: Return value is not dict: {type(return_value)}")
            else:
                print("      ERROR: No return value found")
        else:
            print(f"   Child {child_id}: ERROR - No execution_complete event")
            child_results[child_id] = None
    
    # 2. Track loop aggregation process
    print(f"\nStep 2 - Loop Aggregation Process:")
    print("-" * 40)
    
    # Find end_loop events that should contain aggregated results
    end_loop_events = [e for e in ev if e.get('event_type') == 'end_loop' and e.get('node_name') == 'city_loop']
    
    if end_loop_events:
        for i, e in enumerate(end_loop_events):
            print(f"   End_loop event {i+1}:")
            status = e.get('status')
            result = e.get('output_result')
            
            print(f"      Status: {status}")
            if result:
                print(f"      Aggregated result: {json.dumps(result, indent=6)}")
                
                # Analyze aggregation structure
                if isinstance(result, list):
                    print(f"      Aggregation type: List with {len(result)} items")
                    for j, item in enumerate(result):
                        print(f"         Item {j+1}: {json.dumps(item, indent=8)}")
                elif isinstance(result, dict):
                    print(f"      Aggregation type: Dict with keys: {list(result.keys())}")
                    for key, value in result.items():
                        print(f"         {key}: {json.dumps(value, indent=8)}")
                else:
                    print(f"      Aggregation type: {type(result)} - {result}")
            else:
                print("      ERROR: No aggregated result found")
    else:
        print("   ERROR: No end_loop events found - aggregation not performed")
    
    # 3. Verify final city_loop result aggregation
    print(f"\nStep 3 - Final Loop Result Verification:")
    print("-" * 40)
    
    city_loop_completed = [e for e in ev 
                          if e.get('event_type') == 'action_completed' 
                          and e.get('node_name') == 'city_loop'
                          and e.get('status') == 'COMPLETED']
    
    if city_loop_completed:
        final_event = city_loop_completed[-1]
        final_result = final_event.get('output_result')
        
        print("   Final city_loop completion result:")
        if final_result:
            print(f"      Final result: {json.dumps(final_result, indent=6)}")
            
            # Verify aggregation matches child results
            print(f"      Aggregation verification:")
            if isinstance(final_result, list):
                print(f"         Expected {len(child_results)} items, got {len(final_result)}")
                if len(final_result) == len(child_results):
                    print("         SUCCESS: Aggregation count matches child count")
                else:
                    print("         ERROR: Aggregation count mismatch")
            
            # Check if results contain weather data
            weather_data_found = False
            if isinstance(final_result, list):
                for item in final_result:
                    if isinstance(item, dict):
                        # Look for weather-related fields
                        weather_fields = ['temperature', 'city', 'alert', 'weather']
                        if any(field in str(item).lower() for field in weather_fields):
                            weather_data_found = True
                            break
            
            if weather_data_found:
                print("         SUCCESS: Weather data found in aggregated result")
            else:
                print("         WARNING: No weather data detected in aggregated result")
                
        else:
            print("      ERROR: No final result found")
    else:
        print("   ERROR: No completed city_loop action_completed event")
    
    # 4. Template variable resolution check
    print(f"\nStep 4 - Template Variable Resolution:")
    print("-" * 40)
    
    # Check if subsequent steps properly reference city_loop.result
    aggregate_events = [e for e in ev if e.get('node_name') == 'aggregate_alerts_task']
    
    for e in aggregate_events:
        if e.get('event_type') == 'action_started':
            ctx = e.get('input_context', {})
            task_with = ctx.get('task', {}).get('with', {})
            alerts_param = task_with.get('alerts')
            
            print(f"   aggregate_alerts_task input:")
            print(f"      alerts parameter: {alerts_param}")
            
            # Check if it's properly resolved from city_loop.result
            if alerts_param:
                if isinstance(alerts_param, str) and 'city_loop' in alerts_param:
                    print("      WARNING: Template not resolved - still contains city_loop reference")
                elif isinstance(alerts_param, (list, dict)):
                    print("      SUCCESS: Template resolved to actual data")
                    print(f"         Data type: {type(alerts_param)}")
                    if isinstance(alerts_param, list):
                        print(f"         Item count: {len(alerts_param)}")
                else:
                    print(f"      INFO: Parameter type: {type(alerts_param)}")
            else:
                print("      ERROR: alerts parameter is empty or missing")
    
    # 5. Summary and recommendations
    print(f"\nStep 5 - Summary and Recommendations:")
    print("-" * 40)
    
    issues_found = []
    
    # Check child result pattern
    valid_child_results = sum(1 for result in child_results.values() if result is not None)
    if valid_child_results < len(child_results):
        issues_found.append(f"Missing child results: {len(child_results) - valid_child_results}/{len(child_results)}")
    
    # Check aggregation
    if not end_loop_events:
        issues_found.append("No loop aggregation performed")
    
    # Check final result
    if not city_loop_completed:
        issues_found.append("No final loop completion")
    
    if issues_found:
        print("   ISSUES FOUND:")
        for issue in issues_found:
            print(f"      â†’ {issue}")
        print("   RECOMMENDATIONS:")
        print("      â†’ Ensure child executions return data using 'return: {{ step_name.result }}' pattern")
        print("      â†’ Verify loop completion mechanism aggregates child results properly")
        print("      â†’ Check template resolution for {{ city_loop.result }} in subsequent steps")
    else:
        print("   SUCCESS: All step results and loop aggregation working correctly")

else:
    print('Events not loaded - cannot analyze step results.')

STEP RESULT TRACKING AND LOOP AGGREGATION ANALYSIS
Step 1 - Child Step Result Analysis:
----------------------------------------
Found 3 child executions to analyze
   Child 222504588958236672: ERROR - No execution_complete event
   Child 222504589000179712: ERROR - No execution_complete event
   Child 222504589050511360: ERROR - No execution_complete event

Step 2 - Loop Aggregation Process:
----------------------------------------
   End_loop event 1:
      Status: TRACKING
      ERROR: No aggregated result found

Step 3 - Final Loop Result Verification:
----------------------------------------
   Final city_loop completion result:
      ERROR: No final result found

Step 4 - Template Variable Resolution:
----------------------------------------
   aggregate_alerts_task input:
      alerts parameter: 
      ERROR: alerts parameter is empty or missing
   aggregate_alerts_task input:
      alerts parameter: 
      ERROR: alerts parameter is empty or missing
   aggregate_alerts_task inp

In [74]:
# Cell 6C: CHILD PLAYBOOK RESULT PATTERN VALIDATION
# DIAGNOSTIC: Specifically checks if child executions follow "return: {{ step_name.result }}" pattern

if events and isinstance(events, dict):
    ev = events.get('events') or []
    
    print("CHILD PLAYBOOK RESULT PATTERN VALIDATION")
    print("=" * 55)
    
    # Get child execution IDs
    loop_iterations = [e for e in ev if e.get('event_type') == 'loop_iteration' and e.get('node_name') == 'city_loop']
    child_execution_ids = [e.get('input_context', {}).get('child_execution_id') 
                          for e in loop_iterations if e.get('input_context', {}).get('child_execution_id')]
    
    print(f"Analyzing {len(child_execution_ids)} child executions for result patterns:")
    
    for i, child_id in enumerate(child_execution_ids):
        print(f"\nChild Execution {i+1}: {child_id}")
        print("-" * 50)
        
        # Find all events for this child execution
        child_events = [e for e in ev if e.get('execution_id') == child_id]
        
        # Look for step execution patterns
        step_results = {}
        step_names = set()
        
        for e in child_events:
            if e.get('event_type') == 'action_completed' and e.get('status') == 'COMPLETED':
                node_name = e.get('node_name')
                if node_name:
                    step_names.add(node_name)
                    result = e.get('output_result')
                    if result:
                        step_results[node_name] = result
                        print(f"   Step '{node_name}' completed with result:")
                        print(f"      Result: {json.dumps(result, indent=6)}")
        
        # Check execution_complete event for proper return pattern
        execution_complete = [e for e in child_events if e.get('event_type') == 'execution_complete']
        
        if execution_complete:
            final_event = execution_complete[-1]
            return_value = final_event.get('output_result')
            
            print(f"\n   Final return value from child:")
            if return_value:
                print(f"      Return: {json.dumps(return_value, indent=6)}")
                
                # Analyze return pattern
                expected_patterns = []
                issues = []
                
                # Check if return follows step_name.result pattern
                if isinstance(return_value, dict):
                    for step_name in step_names:
                        step_result_key = f"{step_name}.result"
                        direct_key = step_name
                        
                        if step_result_key in return_value:
                            expected_patterns.append(f"âœ“ {step_result_key}")
                            print(f"      SUCCESS: Found {step_result_key} pattern")
                        elif direct_key in return_value:
                            expected_patterns.append(f"âœ“ {direct_key}")
                            print(f"      SUCCESS: Found {direct_key} pattern")
                        else:
                            issues.append(f"Missing {step_name}.result or {step_name}")
                
                # Special check for weather evaluation step
                weather_step_found = False
                weather_data_found = False
                
                for key, value in return_value.items():
                    if 'weather' in key.lower() or 'evaluate' in key.lower():
                        weather_step_found = True
                        print(f"      Weather step result found: {key}")
                        
                        # Check if it contains actual weather data
                        if isinstance(value, dict):
                            weather_fields = ['temperature', 'city', 'alert']
                            found_fields = [field for field in weather_fields if field in str(value).lower()]
                            if found_fields:
                                weather_data_found = True
                                print(f"         Contains weather data: {found_fields}")
                            else:
                                print(f"         WARNING: No weather data detected in result")
                
                if not weather_step_found:
                    issues.append("No weather evaluation step result found")
                
                if not weather_data_found:
                    issues.append("No weather data found in results")
                
                # Summary for this child
                print(f"\n   Child {i+1} Analysis Summary:")
                if issues:
                    print(f"      ISSUES: {', '.join(issues)}")
                    print(f"      RECOMMENDATION: Check child playbook 'return' statement")
                    print(f"      EXPECTED: return: {{ evaluate_weather_step.result }}")
                else:
                    print(f"      SUCCESS: Proper result pattern detected")
            else:
                print(f"      ERROR: No return value found")
                print(f"      ISSUE: Child execution didn't return any data")
                print(f"      FIX: Add 'return: {{ step_name.result }}' to child playbook end step")
        else:
            print(f"   ERROR: No execution_complete event found for child")
    
    # Overall pattern analysis
    print(f"\nOVERALL PATTERN ANALYSIS:")
    print("=" * 30)
    
    # Count successful patterns
    successful_children = 0
    total_children = len(child_execution_ids)
    
    for child_id in child_execution_ids:
        child_complete = [e for e in ev if e.get('execution_id') == child_id and e.get('event_type') == 'execution_complete']
        if child_complete and child_complete[-1].get('output_result'):
            successful_children += 1
    
    success_rate = (successful_children / total_children * 100) if total_children > 0 else 0
    
    print(f"Success Rate: {successful_children}/{total_children} ({success_rate:.1f}%)")
    
    if success_rate == 100:
        print("EXCELLENT: All child executions returning proper step results")
    elif success_rate >= 75:
        print("GOOD: Most child executions returning results - check failed ones")
    elif success_rate >= 50:
        print("WARNING: Some child executions missing results")
    else:
        print("CRITICAL: Most child executions not returning proper results")
        print("IMMEDIATE ACTION REQUIRED:")
        print("  1. Check child playbook (city_process.yaml) 'return' statement")
        print("  2. Ensure format: 'return: {{ step_name.result }}'")
        print("  3. Verify step names match between playbook and return statement")
        print("  4. Test child playbook individually")

else:
    print('Events not loaded - cannot validate child result patterns.')

CHILD PLAYBOOK RESULT PATTERN VALIDATION
Analyzing 3 child executions for result patterns:

Child Execution 1: 222504588958236672
--------------------------------------------------
   ERROR: No execution_complete event found for child

Child Execution 2: 222504589000179712
--------------------------------------------------
   ERROR: No execution_complete event found for child

Child Execution 3: 222504589050511360
--------------------------------------------------
   ERROR: No execution_complete event found for child

OVERALL PATTERN ANALYSIS:
Success Rate: 0/3 (0.0%)
CRITICAL: Most child executions not returning proper results
IMMEDIATE ACTION REQUIRED:
  1. Check child playbook (city_process.yaml) 'return' statement
  2. Ensure format: 'return: {{ step_name.result }}'
  3. Verify step names match between playbook and return statement
  4. Test child playbook individually


In [None]:
# Cell 9A: MANUAL LOOP COMPLETION TRIGGER - Fixed execution logic
# DIAGNOSTIC: Force loop completion with enhanced child result detection

import requests

print("MANUAL LOOP COMPLETION WITH ENHANCED DIAGNOSTICS")
print("=" * 60)

if not EXECUTION_ID:
    print("ERROR: EXECUTION_ID not set - run Cell 3 first")
elif not events:
    print("ERROR: Events not loaded - run Cell 5 first")
else:
    # Trigger loop completion check manually
    print("Step 1 - Triggering enhanced loop completion check...")
    
    try:
        # Use the broker evaluation endpoint which includes loop completion
        response = requests.post(f'{BASE}/broker/evaluate/{EXECUTION_ID}')
        
        if response.status_code == 200:
            result = response.json()
            print("SUCCESS: Broker evaluation triggered (includes loop completion)")
            print(f"   Response: {result}")
        else:
            print(f"ERROR: Broker evaluation failed: {response.status_code}")
            print(f"   Error: {response.text}")
    except Exception as e:
        print(f"ERROR: Exception during broker evaluation: {e}")
    
    print("\nStep 2 - Checking for child executions that need completion...")
    
    # Check child executions manually
    ev = events.get('events') or []
    loop_iterations = [e for e in ev if e.get('event_type') == 'loop_iteration' and e.get('node_name') == 'city_loop']
    
    child_execution_ids = []
    for e in loop_iterations:
        ctx = e.get('input_context', {})
        child_id = ctx.get('child_execution_id')
        if child_id:
            child_execution_ids.append(child_id)
    
    print(f"Found {len(child_execution_ids)} child executions:")
    
    for i, child_id in enumerate(child_execution_ids):
        print(f"\n   Child {i+1}: {child_id}")
        
        # Check if child has execution_complete event
        child_complete = [e for e in ev if e.get('execution_id') == child_id and e.get('event_type') == 'execution_complete']
        
        if child_complete:
            print(f"      Status: COMPLETED")
            result = child_complete[-1].get('output_result')
            if result:
                print(f"      Result: {json.dumps(result, indent=8)}")
            else:
                print(f"      Result: None")
        else:
            print(f"      Status: NOT COMPLETED - needs manual trigger")
            
            # Trigger completion for this child
            try:
                child_response = requests.post(f'{BASE}/broker/evaluate/{child_id}')
                if child_response.status_code == 200:
                    print(f"      Manual trigger: SUCCESS")
                else:
                    print(f"      Manual trigger: FAILED ({child_response.status_code})")
            except Exception as e:
                print(f"      Manual trigger: EXCEPTION - {e}")
    
    print(f"\nStep 3 - Verification:")
    print("   â†’ Wait 2-3 seconds for processing")
    print("   â†’ Re-run Cell 5 to reload events")
    print("   â†’ Re-run Cell 6 to check for new end_loop events")
    print("   â†’ Check if city_loop action_completed event appears with aggregated results")
    
    print(f"\nStep 4 - Expected Results After Fix:")
    print("   â†’ All child executions should have execution_complete events")
    print("   â†’ city_loop should have end_loop event with COMPLETED status")
    print("   â†’ city_loop should have action_completed event with aggregated weather data")
    print("   â†’ aggregate_alerts_task should receive non-empty alerts parameter")
    print("   â†’ weather_alert_summary table should get populated with data")

In [None]:
# Cell 8: DATABASE VALIDATION - Check event_log directly for debugging
# DIAGNOSTIC: This queries the database directly to bypass API issues

try:
    import psycopg2
    from psycopg2.extras import RealDictCursor
    
    print("DIRECT DATABASE EVENT VALIDATION")
    print("=" * 50)
    
    # Connect to database directly using correct parameters
    conn = psycopg2.connect(
        host=PGHOST,
        port=PGPORT, 
        database=PGDATABASE,
        user=PGUSER,
        password=PGPASSWORD
    )
    
    cursor = conn.cursor(cursor_factory=RealDictCursor)
    
    # Query 1: Count total events for this execution
    cursor.execute("""
        SELECT event_type, COUNT(*) as count 
        FROM event_log 
        WHERE execution_id = %s 
        GROUP BY event_type 
        ORDER BY count DESC
    """, (EXECUTION_ID,))
    
    event_counts = cursor.fetchall()
    print("Event type counts:")
    for row in event_counts:
        print(f"   {row['event_type']}: {row['count']}")
    
    # Query 2: Loop-specific events with details
    cursor.execute("""
        SELECT event_type, node_name, status, 
               SUBSTRING(input_context::text, 1, 100) as context_preview,
               created_at
        FROM event_log 
        WHERE execution_id = %s 
        AND (event_type IN ('loop_iteration', 'end_loop', 'action_completed', 'execution_complete')
             OR node_name = 'city_loop')
        ORDER BY created_at
    """, (EXECUTION_ID,))

Database connection failed: No module named 'psycopg2'
   â†’ Check if PostgreSQL is running on localhost:5432
   â†’ Verify database credentials (noetl/noetl@noetl)
   â†’ Use Cell 6 for API-based validation instead


In [76]:
# Cell 7: CHAIN ANALYSIS - Track complete loopâ†’childâ†’completion flow  
# DIAGNOSTIC: This shows exactly where the distributed loop chain breaks!

if events and isinstance(events, dict):
    ev = events.get('events') or []
    
    print("CITY_LOOP COMPLETION CHAIN ANALYSIS")
    print("=" * 50)
    
    # Step 1: Count loop_iteration events (should be 3 for London, Paris, Berlin)
    loop_iterations = [e for e in ev if e.get('event_type') == 'loop_iteration' and e.get('node_name') == 'city_loop']
    print(f"Step 1 - Loop iterations spawned: {len(loop_iterations)} (Expected: 3)")
    
    if len(loop_iterations) != 3:
        print(f"BREAK POINT: Expected 3 loop iterations, got {len(loop_iterations)}")
        print("   â†’ Check if city_loop step has distribution: true and correct cities list")
    
    child_execution_ids = []
    for i, e in enumerate(loop_iterations):
        ctx = e.get('input_context', {})
        child_id = ctx.get('child_execution_id')
        if child_id:
            child_execution_ids.append(child_id)
        print(f"   Iteration {i+1}: Child execution {child_id}")
    
    print(f"\nStep 2 - Child executions to track: {child_execution_ids}")
    
    # Step 3: Check which children completed
    completed_children = []
    for child_id in child_execution_ids:
        child_complete = [e for e in ev if e.get('execution_id') == child_id and e.get('event_type') == 'execution_complete']
        if child_complete:
            completed_children.append(child_id)
            print(f"   COMPLETED: Child {child_id} completed")
        else:
            print(f"   NOT COMPLETED: Child {child_id} NOT completed - BREAK POINT!")
            print(f"      â†’ Run: evaluate_broker_for_execution('{child_id}') to trigger completion")
    
    print(f"\nStep 3 - Completed children: {len(completed_children)}/{len(child_execution_ids)}")
    
    if len(completed_children) < len(child_execution_ids):
        print("BREAK POINT: Not all child executions completed")
        print("   â†’ Child executions finished but didn't emit execution_complete events")
        print("   â†’ Need manual broker evaluation for missing children")
    
    # Step 4: Check for end_loop tracking and completion
    city_loop_end_events = [e for e in ev if e.get('event_type') == 'end_loop' and e.get('node_name') == 'city_loop']
    print(f"\nStep 4 - End_loop events for city_loop: {len(city_loop_end_events)}")
    
    if len(city_loop_end_events) == 0:
        print("BREAK POINT: No end_loop events - loop completion mechanism not started")
        print("   â†’ Run: check_and_process_completed_loops(EXECUTION_ID) manually")
    
    for e in city_loop_end_events:
        status = e.get('status')
        result = e.get('output_result')
        print(f"   Status: {status}, Result available: {bool(result)}")
    
    # Step 5: Check final city_loop action_completed event
    city_loop_completed = [e for e in ev 
                          if e.get('event_type') == 'action_completed' 
                          and e.get('node_name') == 'city_loop' 
                          and e.get('status') == 'COMPLETED']
    
    print(f"\nStep 5 - Final city_loop completion: {len(city_loop_completed)} event(s)")
    
    if len(city_loop_completed) == 0:
        print("BREAK POINT: No final city_loop completion event")
        print("   â†’ Loop completion mechanism didn't emit final aggregated result")
    
    for e in city_loop_completed:
        result = e.get('output_result')
        if result:
            print(f"   Final aggregated result: {json.dumps(result, indent=4)}")
    
    # Step 6: Check if subsequent steps received the aggregated data
    aggregate_events = [e for e in ev if e.get('node_name') == 'aggregate_alerts_task']
    print(f"\nStep 6 - Aggregate alerts events: {len(aggregate_events)}")
    
    empty_input_found = False
    for e in aggregate_events:
        if e.get('event_type') == 'action_started':
            ctx = e.get('input_context', {})
            alerts_param = ctx.get('task', {}).get('with', {}).get('alerts')
            print(f"   Alerts input to aggregate_alerts_task: {alerts_param}")
            if not alerts_param or alerts_param == "":
                empty_input_found = True
    
    if empty_input_found:
        print("BREAK POINT: aggregate_alerts_task received empty input")
        print("   â†’ city_loop results not passed to next step - template resolution issue")
    
    # Step 7: Check postgres task execution
    postgres_events = [e for e in ev if e.get('node_name') == 'store_summary_postgres_task']
    postgres_errors = [e for e in postgres_events if e.get('event_type') == 'action_error']
    
    print(f"\nStep 7 - Postgres storage events: {len(postgres_events)} (errors: {len(postgres_errors)})")
    
    if len(postgres_errors) > 0:
        print("BREAK POINT: Postgres task failed")
        for e in postgres_errors:
            error = e.get('error', 'Unknown error')
            print(f"   Error: {error}")
        print("   â†’ Check database parameters and template variable resolution")
    
    if len(postgres_events) == 0:
        print("BREAK POINT: Postgres task never executed")
        print("   â†’ Previous steps failed, postgres task not reached")
        
else:
    print('Events not loaded - cannot analyze loop completion chain.')

CITY_LOOP COMPLETION CHAIN ANALYSIS
Step 1 - Loop iterations spawned: 3 (Expected: 3)
   Iteration 1: Child execution 222504588958236672
   Iteration 2: Child execution 222504589000179712
   Iteration 3: Child execution 222504589050511360

Step 2 - Child executions to track: ['222504588958236672', '222504589000179712', '222504589050511360']
   NOT COMPLETED: Child 222504588958236672 NOT completed - BREAK POINT!
      â†’ Run: evaluate_broker_for_execution('222504588958236672') to trigger completion
   NOT COMPLETED: Child 222504589000179712 NOT completed - BREAK POINT!
      â†’ Run: evaluate_broker_for_execution('222504589000179712') to trigger completion
   NOT COMPLETED: Child 222504589050511360 NOT completed - BREAK POINT!
      â†’ Run: evaluate_broker_for_execution('222504589050511360') to trigger completion

Step 3 - Completed children: 0/3
BREAK POINT: Not all child executions completed
   â†’ Child executions finished but didn't emit execution_complete events
   â†’ Need manua

In [77]:
# Cell 9: MANUAL CHILD COMPLETION - Force child execution completion events
# DIAGNOSTIC: Use this when Cell 7/8 shows missing execution_complete events

print("MANUAL CHILD EXECUTION COMPLETION")
print("=" * 50)

# First get child execution IDs from loop_iteration events
if events and isinstance(events, dict):
    ev = events.get('events') or []
    loop_iterations = [e for e in ev if e.get('event_type') == 'loop_iteration' and e.get('node_name') == 'city_loop']
    
    child_execution_ids = []
    for e in loop_iterations:
        ctx = e.get('input_context', {})
        child_id = ctx.get('child_execution_id')
        if child_id:
            child_execution_ids.append(child_id)
    
    print(f"Found {len(child_execution_ids)} child executions to check:")
    for i, child_id in enumerate(child_execution_ids):
        print(f"   {i+1}. {child_id}")
    
    # Check which ones need completion
    incomplete_children = []
    for child_id in child_execution_ids:
        child_complete = [e for e in ev if e.get('execution_id') == child_id and e.get('event_type') == 'execution_complete']
        if not child_complete:
            incomplete_children.append(child_id)
    
    print(f"\nChildren needing completion: {len(incomplete_children)}")
    
    if len(incomplete_children) == 0:
        print("All children already completed - no manual intervention needed")
    else:
        print("The following children need manual completion:")
        for child_id in incomplete_children:
            print(f"   â†’ {child_id}")
        
        # Manual completion
        print(f"\nTriggering manual completion for {len(incomplete_children)} children...")
        
        for child_id in incomplete_children:
            try:
                print(f"\n   Processing child: {child_id}")
                
                # Call the broker evaluation function directly
                response = requests.post(f'{BASE}/broker/evaluate/{child_id}')
                
                if response.status_code == 200:
                    result = response.json()
                    print(f"   Manual completion triggered successfully")
                    print(f"      Response: {result}")
                else:
                    print(f"   Failed to trigger completion: {response.status_code}")
                    print(f"      Error: {response.text}")
                    
            except Exception as e:
                print(f"   Exception during manual completion: {e}")
        
        print(f"\nManual completion attempts finished")
        print("   â†’ Wait 2-3 seconds then re-run Cell 6 to check for new execution_complete events")
        print("   â†’ If successful, you should see end_loop and final action_completed events")

else:
    print("Events not available - run Cell 2 first to load execution data")

MANUAL CHILD EXECUTION COMPLETION
Found 3 child executions to check:
   1. 222504588958236672
   2. 222504589000179712
   3. 222504589050511360

Children needing completion: 3
The following children need manual completion:
   â†’ 222504588958236672
   â†’ 222504589000179712
   â†’ 222504589050511360

Triggering manual completion for 3 children...

   Processing child: 222504588958236672
   Exception during manual completion: name 'requests' is not defined

   Processing child: 222504589000179712
   Exception during manual completion: name 'requests' is not defined

   Processing child: 222504589050511360
   Exception during manual completion: name 'requests' is not defined

Manual completion attempts finished
   â†’ Wait 2-3 seconds then re-run Cell 6 to check for new execution_complete events
   â†’ If successful, you should see end_loop and final action_completed events


In [78]:
# Cell 10: MANUAL LOOP COMPLETION - Force loop completion mechanism
# DIAGNOSTIC: Use this when Cell 6/7 shows missing end_loop events

print("MANUAL LOOP COMPLETION TRIGGER")
print("=" * 50)

print("Step 1 - Triggering manual loop completion check...")

try:
    # Call the loop completion check function directly
    response = requests.post(f'{BASE}/broker/check-loops/{EXECUTION_ID}')
    
    if response.status_code == 200:
        result = response.json()
        print("Manual loop completion check triggered successfully")
        print(f"   Response: {result}")
        
        # Check if any loops were processed
        if 'processed_loops' in result:
            processed = result['processed_loops']
            print(f"   Processed {len(processed)} loop(s):")
            for loop in processed:
                print(f"      â†’ {loop}")
        
    elif response.status_code == 404:
        print("Loop completion endpoint not available")
        print("   â†’ Falling back to broker evaluation...")
        
        # Fallback: trigger broker evaluation which includes loop completion
        response = requests.post(f'{BASE}/broker/evaluate/{EXECUTION_ID}')
        
        if response.status_code == 200:
            result = response.json()
            print("Broker evaluation triggered (includes loop completion)")
            print(f"   Response: {result}")
        else:
            print(f"Broker evaluation failed: {response.status_code}")
            print(f"   Error: {response.text}")
            
    else:
        print(f"Manual loop completion failed: {response.status_code}")
        print(f"   Error: {response.text}")

except Exception as e:
    print(f"Exception during manual loop completion: {e}")
    print("   â†’ Check if NoETL server is running")
    print("   â†’ Verify BASE url is correct")

print(f"\nStep 2 - Post-completion validation:")
print("   â†’ Wait 2-3 seconds then re-run Cell 6 to check results")
print("   â†’ Look for new end_loop events with aggregated results")
print("   â†’ Check if city_loop action_completed event appears")
print("   â†’ Verify if subsequent steps (aggregate_alerts_task) now have input data")

print(f"\nStep 3 - If loop completion worked, you should see:")
print("   end_loop event for city_loop with aggregated weather data")
print("   action_completed event for city_loop with COMPLETED status") 
print("   action_started event for aggregate_alerts_task with alerts input")
print("   Events for store_summary_postgres_task execution")

MANUAL LOOP COMPLETION TRIGGER
Step 1 - Triggering manual loop completion check...
Exception during manual loop completion: name 'requests' is not defined
   â†’ Check if NoETL server is running
   â†’ Verify BASE url is correct

Step 2 - Post-completion validation:
   â†’ Wait 2-3 seconds then re-run Cell 6 to check results
   â†’ Look for new end_loop events with aggregated results
   â†’ Check if city_loop action_completed event appears
   â†’ Verify if subsequent steps (aggregate_alerts_task) now have input data

Step 3 - If loop completion worked, you should see:
   end_loop event for city_loop with aggregated weather data
   action_completed event for city_loop with COMPLETED status
   action_started event for aggregate_alerts_task with alerts input
   Events for store_summary_postgres_task execution


In [79]:
# Cell 11: TROUBLESHOOTING SUMMARY - Common issues and fixes
# DIAGNOSTIC: Reference guide for distributed loop execution problems

print("DISTRIBUTED LOOP TROUBLESHOOTING GUIDE")
print("=" * 60)

print("QUICK DIAGNOSIS CHECKLIST:")
print("Step 1 - Run Cell 2: Load execution data")
print("Step 2 - Run Cell 6: Check for missing end_loop/execution_complete events")  
print("Step 3 - Run Cell 7: Analyze complete chain from loopâ†’childâ†’completion")
print("Step 4 - Run Cell 8: Direct database validation (bypasses API)")
print("Step 5 - Run Cell 9: Manual child completion (if needed)")
print("Step 6 - Run Cell 10: Manual loop completion (if needed)")

print(f"\nCOMMON BREAK POINTS & FIXES:")

print(f"\nBREAK POINT 1: No loop_iteration events")
print("   â†’ Problem: city_loop step not configured for distribution")
print("   â†’ Fix: Add 'distribution: true' to city_loop step")
print("   â†’ Check: examples/weather/weather_loop_example.yaml")

print(f"\nBREAK POINT 2: Child executions not completing")
print("   â†’ Problem: Child executions finish but don't emit execution_complete events")
print("   â†’ Fix: Run Cell 9 to manually trigger child completion")
print("   â†’ API: POST /api/broker/evaluate/{child_execution_id}")

print(f"\nBREAK POINT 3: No end_loop events")
print("   â†’ Problem: Loop completion mechanism not triggered")
print("   â†’ Fix: Run Cell 10 to manually trigger loop completion")
print("   â†’ API: POST /api/broker/check-loops/{execution_id}")

print(f"\nBREAK POINT 4: Empty input to aggregate_alerts_task")
print("   â†’ Problem: city_loop results not passed to next step")
print("   â†’ Fix: Check template variable resolution in playbook")
print("   â†’ Look for: '{{ city_loop }}' in aggregate_alerts_task")

print(f"\nBREAK POINT 5: Postgres task errors")
print("   â†’ Problem: Database connection or template variable issues")
print("   â†’ Fix: Check hardcoded database parameters in workbook task")
print("   â†’ Parameters: host=localhost, port=5432, database=noetl, user=noetl, password=noetl")

print(f"\nBREAK POINT 6: Empty weather_alert_summary table")
print("   â†’ Problem: Postgres INSERT failed silently")
print("   â†’ Fix: Check Cell 8 for postgres task errors")
print("   â†’ Verify: INSERT statement and data format")

print(f"\nMANUAL INTERVENTION SEQUENCE:")
print("   1. Run Cell 6 â†’ Identify missing events")
print("   2. If missing execution_complete â†’ Run Cell 9")
print("   3. If missing end_loop â†’ Run Cell 10") 
print("   4. Wait 2-3 seconds â†’ Re-run Cell 6")
print("   5. If still issues â†’ Run Cell 8 for database check")
print("   6. Check weather_alert_summary table for final results")

print(f"\nSUCCESS INDICATORS:")
print("   3 loop_iteration events (London, Paris, Berlin)")
print("   3 execution_complete events (one per child)")
print("   1 end_loop event with aggregated weather data")
print("   1 city_loop action_completed event")
print("   aggregate_alerts_task with non-empty alerts input")
print("   store_summary_postgres_task completed successfully")
print("   weather_alert_summary table contains temperature data")

print(f"\nESCALATION:")
print("   If all manual fixes fail:")
print("   â†’ Check NoETL server logs: logs/server.log")
print("   â†’ Check worker logs: logs/worker_*.log") 
print("   â†’ Verify child playbook registration: city_process.yaml")
print("   â†’ Test child playbook individually: noetl execute playbook city_process.yaml")

print(f"\nKEY FUNCTIONS:")
print("   evaluate_broker_for_execution() â†’ Triggers child completion")
print("   check_and_process_completed_loops() â†’ Processes loop aggregation")
print("   Event emission chain â†’ loop_iteration â†’ execution_complete â†’ end_loop â†’ action_completed")

DISTRIBUTED LOOP TROUBLESHOOTING GUIDE
QUICK DIAGNOSIS CHECKLIST:
Step 1 - Run Cell 2: Load execution data
Step 2 - Run Cell 6: Check for missing end_loop/execution_complete events
Step 3 - Run Cell 7: Analyze complete chain from loopâ†’childâ†’completion
Step 4 - Run Cell 8: Direct database validation (bypasses API)
Step 5 - Run Cell 9: Manual child completion (if needed)
Step 6 - Run Cell 10: Manual loop completion (if needed)

COMMON BREAK POINTS & FIXES:

BREAK POINT 1: No loop_iteration events
   â†’ Problem: city_loop step not configured for distribution
   â†’ Fix: Add 'distribution: true' to city_loop step
   â†’ Check: examples/weather/weather_loop_example.yaml

BREAK POINT 2: Child executions not completing
   â†’ Problem: Child executions finish but don't emit execution_complete events
   â†’ Fix: Run Cell 9 to manually trigger child completion
   â†’ API: POST /api/broker/evaluate/{child_execution_id}

BREAK POINT 3: No end_loop events
   â†’ Problem: Loop completion mechan

In [80]:
# Cell 12: FINAL EXECUTION SUMMARY - Overall status and next steps
# DIAGNOSTIC: Complete execution health check and recommendations

print("FINAL EXECUTION HEALTH CHECK")
print("=" * 50)

if events and isinstance(events, dict):
    ev = events.get('events') or []
    
    # Overall execution status
    execution_status = "Unknown"
    execution_result = None
    
    # Check for final execution status
    final_events = [e for e in ev if e.get('event_type') == 'execution_complete' and e.get('execution_id') == EXECUTION_ID]
    if final_events:
        latest_event = final_events[-1]
        execution_status = latest_event.get('status', 'Unknown')
        execution_result = latest_event.get('output_result')
    
    status_indicator = "SUCCESS" if execution_status == "COMPLETED" else "ERROR" if execution_status == "ERROR" else "IN_PROGRESS"
    print(f"Overall Execution Status: {status_indicator} - {execution_status}")
    
    # Health check scores
    health_scores = {
        "Child Spawning": 0,
        "Child Completion": 0, 
        "Loop Aggregation": 0,
        "Data Pipeline": 0,
        "Final Storage": 0
    }
    
    # 1. Child Spawning Check
    loop_iterations = [e for e in ev if e.get('event_type') == 'loop_iteration' and e.get('node_name') == 'city_loop']
    health_scores["Child Spawning"] = min(100, len(loop_iterations) * 33)  # 3 children = 100%
    
    # 2. Child Completion Check
    child_execution_ids = []
    for e in loop_iterations:
        ctx = e.get('input_context', {})
        child_id = ctx.get('child_execution_id')
        if child_id:
            child_execution_ids.append(child_id)
    
    completed_children = 0
    for child_id in child_execution_ids:
        child_complete = [e for e in ev if e.get('execution_id') == child_id and e.get('event_type') == 'execution_complete']
        if child_complete:
            completed_children += 1
    
    if len(child_execution_ids) > 0:
        health_scores["Child Completion"] = int((completed_children / len(child_execution_ids)) * 100)
    
    # 3. Loop Aggregation Check
    end_loop_events = [e for e in ev if e.get('event_type') == 'end_loop' and e.get('node_name') == 'city_loop']
    city_loop_completed = [e for e in ev if e.get('event_type') == 'action_completed' and e.get('node_name') == 'city_loop']
    
    aggregation_score = 0
    if end_loop_events:
        aggregation_score += 50
    if city_loop_completed:
        aggregation_score += 50
    health_scores["Loop Aggregation"] = aggregation_score
    
    # 4. Data Pipeline Check
    aggregate_events = [e for e in ev if e.get('node_name') == 'aggregate_alerts_task']
    pipeline_score = 0
    for e in aggregate_events:
        if e.get('event_type') == 'action_started':
            ctx = e.get('input_context', {})
            alerts_param = ctx.get('task', {}).get('with', {}).get('alerts')
            if alerts_param and alerts_param != "":
                pipeline_score = 100
                break
    health_scores["Data Pipeline"] = pipeline_score
    
    # 5. Final Storage Check  
    postgres_events = [e for e in ev if e.get('node_name') == 'store_summary_postgres_task']
    postgres_completed = [e for e in postgres_events if e.get('event_type') == 'action_completed' and e.get('status') == 'COMPLETED']
    health_scores["Final Storage"] = 100 if postgres_completed else 0
    
    print(f"\nHEALTH SCORES:")
    overall_health = sum(health_scores.values()) // len(health_scores)
    
    for component, score in health_scores.items():
        indicator = "GOOD" if score >= 80 else "WARNING" if score >= 50 else "ERROR"
        print(f"   {indicator:7s} {component:15s}: {score:3d}%")
    
    health_indicator = "EXCELLENT" if overall_health >= 80 else "FAIR" if overall_health >= 50 else "POOR"
    print(f"\nOverall Health: {health_indicator} - {overall_health}%")
    
    # Recommendations based on health scores
    print(f"\nRECOMMENDATIONS:")
    
    if health_scores["Child Spawning"] < 100:
        print("   Child Spawning Issue:")
        print("      â†’ Check city_loop step has distribution: true")
        print("      â†’ Verify cities list in playbook")
        print("      â†’ Reference: Cell 7 for detailed analysis")
    
    if health_scores["Child Completion"] < 100:
        print("   Child Completion Issue:")
        print("      â†’ Run Cell 9 to manually complete children")
        print("      â†’ Check child playbook execution logs")
        print("      â†’ Verify child execution broker evaluation")
    
    if health_scores["Loop Aggregation"] < 100:
        print("   Loop Aggregation Issue:")
        print("      â†’ Run Cell 10 to manually trigger loop completion")
        print("      â†’ Check comprehensive loop completion mechanism")
        print("      â†’ Verify event emission in evaluate_broker_for_execution()")
    
    if health_scores["Data Pipeline"] < 100:
        print("   Data Pipeline Issue:")
        print("      â†’ Check template variable resolution: {{ city_loop }}")
        print("      â†’ Verify aggregate_alerts_task input parameters")
        print("      â†’ Review playbook step dependencies")
    
    if health_scores["Final Storage"] < 100:
        print("   Storage Issue:")
        print("      â†’ Run Cell 8 for database validation")
        print("      â†’ Check postgres task hardcoded parameters")
        print("      â†’ Verify weather_alert_summary table schema")
    
    if overall_health == 100:
        print("   EXECUTION PERFECT - All systems working correctly!")
        print("      â†’ weather_alert_summary should contain weather data")
        print("      â†’ Distributed loop execution completed successfully")
    elif overall_health >= 80:
        print("   EXECUTION MOSTLY SUCCESSFUL - Minor issues detected")
        print("      â†’ Address specific component issues above")
    else:
        print("   EXECUTION HAS MAJOR ISSUES - Multiple failures detected")
        print("      â†’ Follow manual intervention sequence in Cell 11")
        print("      â†’ Consider re-running entire playbook after fixes")

    # Quick action plan
    if overall_health < 100:
        print(f"\nQUICK ACTION PLAN:")
        if health_scores["Child Completion"] < 100:
            print("   1. Run Cell 9 â†’ Manual child completion")
        if health_scores["Loop Aggregation"] < 100:
            print("   2. Run Cell 10 â†’ Manual loop completion")
        print("   3. Wait 2-3 seconds")
        print("   4. Re-run Cell 12 â†’ Check improved health scores")
        print("   5. If still issues â†’ Follow Cell 11 troubleshooting guide")

else:
    print("Cannot perform health check - events not loaded")
    print("   â†’ Run Cell 2 first to load execution data")
    print("   â†’ Verify EXECUTION_ID is set correctly")

FINAL EXECUTION HEALTH CHECK
Overall Execution Status: IN_PROGRESS - Unknown

HEALTH SCORES:
   GOOD    Child Spawning :  99%
   ERROR   Child Completion:   0%
   GOOD    Loop Aggregation: 100%
   ERROR   Data Pipeline  :   0%
   ERROR   Final Storage  :   0%

Overall Health: POOR - 39%

RECOMMENDATIONS:
   Child Spawning Issue:
      â†’ Check city_loop step has distribution: true
      â†’ Verify cities list in playbook
      â†’ Reference: Cell 7 for detailed analysis
   Child Completion Issue:
      â†’ Run Cell 9 to manually complete children
      â†’ Check child playbook execution logs
      â†’ Verify child execution broker evaluation
   Data Pipeline Issue:
      â†’ Check template variable resolution: {{ city_loop }}
      â†’ Verify aggregate_alerts_task input parameters
      â†’ Review playbook step dependencies
   Storage Issue:
      â†’ Run Cell 8 for database validation
      â†’ Check postgres task hardcoded parameters
      â†’ Verify weather_alert_summary table sche

## Loop Completion Troubleshooting Summary

Based on the validation results above, here are the key issues to check:

### Expected Flow for Working Loop:
1. **Loop Iterations**: 3 `loop_iteration` events for city_loop (London, Paris, Berlin)
2. **Child Executions**: 3 child executions spawned with unique execution IDs
3. **Child Completion**: 3 `execution_complete` events from child executions with weather results
4. **Loop Tracking**: `end_loop` event with status `TRACKING` to monitor progress
5. **Loop Completion**: `action_completed` event for city_loop with aggregated results
6. **Next Steps**: aggregate_alerts_task receives aggregated data, then store_summary_postgres_task

### Common Issues:
- **Empty weather_alert_summary**: Usually caused by postgres task failing due to template variable resolution or missing database parameters
- **Missing execution_complete events**: Child executions finish but don't emit completion events - requires manual broker evaluation
- **No aggregated results**: Loop completion mechanism not triggered or children not properly tracked
- **Template resolution errors**: Database parameters like `{{ workload.pg_host }}` resolving to empty strings

### Manual Fixes:
```python
# If child executions completed but no execution_complete events:
from noetl.server.api.event import evaluate_broker_for_execution
await evaluate_broker_for_execution('child_execution_id')

# If loop completion not triggered:
from noetl.server.api.event import check_and_process_completed_loops  
await check_and_process_completed_loops('parent_execution_id')
```

In [5]:
# Cell: WEATHER DATA VERIFICATION - Check if aggregated weather data was stored
# DIAGNOSTIC: Query weather_alert_summary table directly with correct connection

try:
    import psycopg2
    from psycopg2.extras import RealDictCursor
    
    print("WEATHER DATA VERIFICATION")
    print("=" * 40)
    
    # Connect with correct parameters
    conn = psycopg2.connect(
        host=PGHOST,
        port=PGPORT,
        database=PGDATABASE,
        user=PGUSER,
        password=PGPASSWORD
    )
    
    cursor = conn.cursor(cursor_factory=RealDictCursor)
    
    # Check if weather_alert_summary table exists in public schema
    cursor.execute("""
        SELECT EXISTS (
            SELECT FROM information_schema.tables 
            WHERE table_schema = 'public' AND table_name = 'weather_alert_summary'
        )
    """)
    table_exists = cursor.fetchone()[0]
    
    if table_exists:
        print("âœ“ public.weather_alert_summary table exists")
        
        # Count total rows
        cursor.execute("SELECT COUNT(*) as total_rows FROM public.weather_alert_summary")
        total_rows = cursor.fetchone()['total_rows']
        print(f"Total rows in weather_alert_summary: {total_rows}")
        
        if total_rows > 0:
            # Show recent entries
            cursor.execute("""
                SELECT * FROM public.weather_alert_summary 
                ORDER BY id DESC 
                LIMIT 5
            """)
            recent_rows = cursor.fetchall()
            
            print(f"\nRecent weather alert entries ({len(recent_rows)}):")
            for i, row in enumerate(recent_rows):
                row_dict = dict(row)
                print(f"  {i+1}. ID: {row_dict.get('id')}")
                for key, value in row_dict.items():
                    if key != 'id':
                        print(f"     {key}: {value}")
                print()
                
            # Check for entries that might be related to our execution
            cursor.execute("""
                SELECT COUNT(*) as recent_rows 
                FROM public.weather_alert_summary 
                WHERE id > (SELECT COALESCE(MAX(id) - 100, 0) FROM public.weather_alert_summary)
            """)
            recent_count = cursor.fetchone()['recent_rows']
            print(f"Recent entries (last 100 IDs): {recent_count}")
            
            if recent_count > 0:
                print("SUCCESS: Weather data has been stored recently!")
                print("   â†’ Loop completion mechanism appears to be working")
                print("   â†’ Aggregated weather data is being inserted")
            else:
                print("INFO: Weather data exists but may be older")
        else:
            print("ISSUE: weather_alert_summary table is empty")
            print("   â†’ No weather data has been stored")
            print("   â†’ Loop completion mechanism likely failed")
            print("   â†’ Use manual completion cells to trigger aggregation")
    else:
        print("ERROR: public.weather_alert_summary table does not exist")
        print("   â†’ Database schema may not be initialized")
        print("   â†’ Check playbook setup and table creation")
    
    cursor.close()
    conn.close()
    
except Exception as e:
    print(f"Database error: {e}")
    import traceback
    traceback.print_exc()
    print("\nConnection parameters used:")
    print(f"   Host: {PGHOST}")
    print(f"   Port: {PGPORT}")
    print(f"   Database: {PGDATABASE}")
    print(f"   User: {PGUSER}")

WEATHER DATA VERIFICATION
Database error: 0

Connection parameters used:
   Host: localhost
   Port: 30543
   Database: demo_noetl
   User: demo


Traceback (most recent call last):
  File "/var/folders/xm/zwpf18217zd758ds84n46r9h0000gn/T/ipykernel_88105/4213864212.py", line 29, in <module>
    table_exists = cursor.fetchone()[0]
                   ~~~~~~~~~~~~~~~~~^^^
KeyError: 0


In [2]:
# Set execution ID manually
EXECUTION_ID = '222516664267177984'  # Weather loop execution we've been testing
print(f"Using execution ID: {EXECUTION_ID}")

Using execution ID: 222516664267177984


In [4]:
# Database exploration - check what tables and schemas exist
import psycopg2
from psycopg2.extras import RealDictCursor

try:
    print("DATABASE EXPLORATION")
    print("=" * 30)
    
    conn = psycopg2.connect(
        host=PGHOST,
        port=PGPORT,
        database=PGDATABASE,
        user=PGUSER,
        password=PGPASSWORD
    )
    
    cursor = conn.cursor(cursor_factory=RealDictCursor)
    
    # List all schemas
    cursor.execute("""
        SELECT schema_name 
        FROM information_schema.schemata 
        WHERE schema_name NOT IN ('information_schema', 'pg_catalog', 'pg_toast')
        ORDER BY schema_name
    """)
    schemas = cursor.fetchall()
    print("Available schemas:")
    for schema in schemas:
        print(f"  - {schema['schema_name']}")
    
    # List all tables in all schemas
    cursor.execute("""
        SELECT table_schema, table_name 
        FROM information_schema.tables 
        WHERE table_schema NOT IN ('information_schema', 'pg_catalog', 'pg_toast')
        ORDER BY table_schema, table_name
    """)
    tables = cursor.fetchall()
    print(f"\nAvailable tables ({len(tables)}):")
    for table in tables:
        print(f"  - {table['table_schema']}.{table['table_name']}")
    
    # Look for weather-related tables specifically
    cursor.execute("""
        SELECT table_schema, table_name 
        FROM information_schema.tables 
        WHERE table_name LIKE '%weather%' OR table_name LIKE '%alert%'
        ORDER BY table_schema, table_name
    """)
    weather_tables = cursor.fetchall()
    print(f"\nWeather-related tables ({len(weather_tables)}):")
    for table in weather_tables:
        print(f"  - {table['table_schema']}.{table['table_name']}")
    
    # Check event_log table specifically
    cursor.execute("""
        SELECT table_schema, table_name 
        FROM information_schema.tables 
        WHERE table_name = 'event_log'
    """)
    event_tables = cursor.fetchall()
    print(f"\nEvent log tables ({len(event_tables)}):")
    for table in event_tables:
        print(f"  - {table['table_schema']}.{table['table_name']}")
        
        # Count events for our execution
        try:
            cursor.execute(f"""
                SELECT COUNT(*) as count 
                FROM {table['table_schema']}.event_log 
                WHERE execution_id = %s
            """, (EXECUTION_ID,))
            count = cursor.fetchone()['count']
            print(f"    Events for execution {EXECUTION_ID}: {count}")
        except Exception as e:
            print(f"    Error querying event_log: {e}")
    
    cursor.close()
    conn.close()
    
except Exception as e:
    print(f"Database error: {e}")
    import traceback
    traceback.print_exc()

DATABASE EXPLORATION
Available schemas:
  - noetl
  - public

Available tables (21):
  - noetl.attachment
  - noetl.catalog
  - noetl.chat
  - noetl.credential
  - noetl.error_log
  - noetl.event_log
  - noetl.label
  - noetl.member
  - noetl.message
  - noetl.profile
  - noetl.queue
  - noetl.resource
  - noetl.role
  - noetl.runtime
  - noetl.schedule
  - noetl.session
  - noetl.transition
  - noetl.workbook
  - noetl.workflow
  - noetl.workload
  - public.weather_alert_summary

Weather-related tables (1):
  - public.weather_alert_summary

Event log tables (1):
  - noetl.event_log
    Events for execution 222516664267177984: 86887


In [6]:
# INFINITE LOOP ANALYSIS - 86,887 events indicates a problem!
# DIAGNOSTIC: Analyze what's causing the infinite loop

print("INFINITE LOOP ANALYSIS")
print("=" * 50)
print(f"Total events for execution {EXECUTION_ID}: 86,887")
print("This indicates an infinite loop - normal executions should have < 100 events")
print()

try:
    # Analyze event patterns
    cursor.execute("""
        SELECT event_type, COUNT(*) as count 
        FROM noetl.event_log 
        WHERE execution_id = %s 
        GROUP BY event_type 
        ORDER BY count DESC
        LIMIT 10
    """, (EXECUTION_ID,))
    
    event_counts = cursor.fetchall()
    print("TOP EVENT TYPES (showing infinite loop pattern):")
    for row in event_counts:
        count = row['count']
        event_type = row['event_type']
        print(f"   {event_type:20s}: {count:8,} events")
        
        # Highlight problematic patterns
        if count > 1000:
            if event_type == 'loop_iteration':
                print(f"      *** INFINITE LOOP: {count:,} loop iterations! ***")
            elif event_type in ['action_started', 'action_completed']:
                print(f"      *** EXCESSIVE ACTIONS: {count:,} repetitions ***")
    
    print()
    
    # Check loop completion events specifically
    cursor.execute("""
        SELECT COUNT(*) as loop_end_count
        FROM noetl.event_log 
        WHERE execution_id = %s 
        AND event_type = 'end_loop'
    """, (EXECUTION_ID,))
    
    loop_end_count = cursor.fetchone()['loop_end_count']
    print(f"Loop end events: {loop_end_count}")
    
    if loop_end_count == 0:
        print("*** PROBLEM: No end_loop events found! ***")
        print("   â†’ Loop completion mechanism never triggered")
        print("   â†’ Loop keeps iterating indefinitely")
        print("   â†’ This is the root cause of the infinite loop")
    
    # Check execution_complete events from child executions
    cursor.execute("""
        SELECT COUNT(*) as exec_complete_count
        FROM noetl.event_log 
        WHERE execution_id = %s 
        AND event_type = 'execution_complete'
    """, (EXECUTION_ID,))
    
    exec_complete_count = cursor.fetchone()['exec_complete_count']
    print(f"Child execution_complete events: {exec_complete_count}")
    
    if exec_complete_count == 0:
        print("*** PROBLEM: No execution_complete events! ***")
        print("   â†’ Child executions not completing properly")
        print("   â†’ This prevents loop aggregation from triggering")
    
    # Sample recent loop iterations to see the pattern
    cursor.execute("""
        SELECT 
            created_at,
            node_name,
            input_context::json->>'child_execution_id' as child_id
        FROM noetl.event_log 
        WHERE execution_id = %s 
        AND event_type = 'loop_iteration'
        ORDER BY created_at DESC
        LIMIT 5
    """, (EXECUTION_ID,))
    
    recent_iterations = cursor.fetchall()
    print(f"\nRecent loop iterations (last 5 of {event_counts[0]['count']:,}):")
    for iter_event in recent_iterations:
        timestamp = iter_event['created_at'].strftime('%H:%M:%S.%f')[:-3]
        child_id = iter_event['child_id']
        node = iter_event['node_name']
        print(f"   {timestamp} | {node} | Child: {child_id}")
    
    print()
    print("DIAGNOSIS:")
    print("   â†’ The loop is running infinitely because:")
    print("   â†’ 1. Child executions aren't completing (no execution_complete events)")
    print("   â†’ 2. Without child completion, loop aggregation never triggers") 
    print("   â†’ 3. Loop continues spawning new iterations indefinitely")
    print("   â†’ 4. This creates exponential event growth")
    print()
    print("IMMEDIATE ACTION REQUIRED:")
    print("   â†’ STOP the execution immediately to prevent resource exhaustion")
    print("   â†’ Fix the child execution completion mechanism")
    print("   â†’ Investigate why child playbooks aren't emitting execution_complete events")

except Exception as e:
    print(f"Analysis error: {e}")
    import traceback
    traceback.print_exc()

INFINITE LOOP ANALYSIS
Total events for execution 222516664267177984: 86,887
This indicates an infinite loop - normal executions should have < 100 events

TOP EVENT TYPES (showing infinite loop pattern):
   action_completed    :   87,005 events
      *** EXCESSIVE ACTIONS: 87,005 repetitions ***
   end_loop            :        4 events
   loop_iteration      :        3 events
   execution_start     :        1 events

Loop end events: 4
Child execution_complete events: 0
*** PROBLEM: No execution_complete events! ***
   â†’ Child executions not completing properly
   â†’ This prevents loop aggregation from triggering
Analysis error: column "created_at" does not exist
LINE 3:             created_at,
                    ^



Traceback (most recent call last):
  File "/var/folders/xm/zwpf18217zd758ds84n46r9h0000gn/T/ipykernel_88105/1721811430.py", line 71, in <module>
    cursor.execute("""
  File "./.venv/lib/python3.12/site-packages/psycopg2/extras.py", line 236, in execute
    return super().execute(query, vars)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.errors.UndefinedColumn: column "created_at" does not exist
LINE 3:             created_at,
                    ^



In [7]:
# IDENTIFY THE PROBLEMATIC ACTION - Which action completed 87,005 times?
# DIAGNOSTIC: Find the specific action causing the infinite completion loop

print("PROBLEMATIC ACTION IDENTIFICATION")
print("=" * 45)

try:
    # Find which action is completing excessively
    cursor.execute("""
        SELECT 
            node_name,
            status,
            COUNT(*) as completion_count
        FROM noetl.event_log 
        WHERE execution_id = %s 
        AND event_type = 'action_completed'
        GROUP BY node_name, status
        ORDER BY completion_count DESC
        LIMIT 10
    """, (EXECUTION_ID,))
    
    action_counts = cursor.fetchall()
    print("ACTION COMPLETION COUNTS:")
    for row in action_counts:
        node_name = row['node_name']
        status = row['status']
        count = row['completion_count']
        print(f"   {node_name:25s} | {status:10s} | {count:8,} completions")
        
        if count > 1000:
            print(f"      *** INFINITE ACTION: {node_name} is stuck! ***")
    
    # Get the top problematic action
    worst_action = action_counts[0] if action_counts else None
    
    if worst_action:
        problematic_node = worst_action['node_name']
        print(f"\nMOST PROBLEMATIC ACTION: {problematic_node}")
        
        # Get sample of recent completions for this action
        cursor.execute("""
            SELECT 
                SUBSTR(CAST(timestamp AS TEXT), 12, 12) as time_part,
                status,
                error,
                SUBSTR(input_context::text, 1, 200) as context_preview
            FROM noetl.event_log 
            WHERE execution_id = %s 
            AND event_type = 'action_completed'
            AND node_name = %s
            ORDER BY timestamp DESC
            LIMIT 5
        """, (EXECUTION_ID, problematic_node))
        
        recent_completions = cursor.fetchall()
        print(f"\nRecent completions of {problematic_node}:")
        for comp in recent_completions:
            time_part = comp['time_part']
            status = comp['status']
            error = comp['error']
            context = comp['context_preview']
            print(f"   {time_part} | {status:10s} | Error: {error or 'None'}")
            if context:
                print(f"     Context: {context}...")
        
        # Check what triggers this action repeatedly
        cursor.execute("""
            SELECT COUNT(*) as start_count
            FROM noetl.event_log 
            WHERE execution_id = %s 
            AND event_type = 'action_started'
            AND node_name = %s
        """, (EXECUTION_ID, problematic_node))
        
        start_count = cursor.fetchone()['start_count']
        completion_count = worst_action['completion_count']
        
        print(f"\nACTION LIFECYCLE:")
        print(f"   Started: {start_count:,} times")
        print(f"   Completed: {completion_count:,} times")
        
        if start_count != completion_count:
            print(f"   *** MISMATCH: Different start/completion counts! ***")
            print(f"   â†’ This indicates actions are completing multiple times")
            print(f"   â†’ There's likely a bug in the action completion mechanism")
        else:
            print(f"   *** REPEATED EXECUTION: Action is being started {start_count:,} times! ***")
            print(f"   â†’ Something is triggering this action repeatedly")
            print(f"   â†’ Check what condition is causing the re-execution")
    
    print(f"\nROOT CAUSE ANALYSIS:")
    print(f"   â†’ Instead of an infinite loop, we have an action that's:")
    print(f"     1. Either completing multiple times per execution")
    print(f"     2. Or being executed {action_counts[0]['completion_count']:,} separate times")
    print(f"   â†’ This is a different problem than loop iteration infinite loops")
    print(f"   â†’ The fix needs to address why '{action_counts[0]['node_name']}' keeps running")

except Exception as e:
    print(f"Analysis error: {e}")
    import traceback
    traceback.print_exc()

PROBLEMATIC ACTION IDENTIFICATION
Analysis error: current transaction is aborted, commands ignored until end of transaction block



Traceback (most recent call last):
  File "/var/folders/xm/zwpf18217zd758ds84n46r9h0000gn/T/ipykernel_88105/1404613995.py", line 9, in <module>
    cursor.execute("""
  File "./.venv/lib/python3.12/site-packages/psycopg2/extras.py", line 236, in execute
    return super().execute(query, vars)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.errors.InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction block



In [8]:
# Reset database connection and identify problematic action
import psycopg2
from psycopg2.extras import RealDictCursor

print("RESET CONNECTION AND IDENTIFY PROBLEMATIC ACTION")
print("=" * 55)

try:
    # Close old connection if exists
    if 'conn' in globals():
        conn.close()
    
    # Create fresh connection
    conn = psycopg2.connect(
        host=PGHOST,
        port=PGPORT,
        database=PGDATABASE,
        user=PGUSER,
        password=PGPASSWORD
    )
    cursor = conn.cursor(cursor_factory=RealDictCursor)
    
    print("âœ“ Fresh database connection established")
    
    # Find which action is completing excessively
    cursor.execute("""
        SELECT 
            node_name,
            status,
            COUNT(*) as completion_count
        FROM noetl.event_log 
        WHERE execution_id = %s 
        AND event_type = 'action_completed'
        GROUP BY node_name, status
        ORDER BY completion_count DESC
        LIMIT 10
    """, (EXECUTION_ID,))
    
    action_counts = cursor.fetchall()
    print(f"\nACTION COMPLETION COUNTS (Total: {sum(row['completion_count'] for row in action_counts):,}):")
    
    for i, row in enumerate(action_counts):
        node_name = row['node_name']
        status = row['status']
        count = row['completion_count']
        print(f"   {i+1:2d}. {node_name:25s} | {status:10s} | {count:8,} completions")
        
        if count > 1000:
            print(f"      *** RUNAWAY ACTION: {node_name} completed {count:,} times! ***")
    
    # Focus on the worst offender
    if action_counts:
        worst_action = action_counts[0]
        problematic_node = worst_action['node_name']
        completion_count = worst_action['completion_count']
        
        print(f"\nWORST OFFENDER: {problematic_node}")
        print(f"   Completed {completion_count:,} times!")
        
        # Check how many times it was started
        cursor.execute("""
            SELECT COUNT(*) as start_count
            FROM noetl.event_log 
            WHERE execution_id = %s 
            AND event_type = 'action_started'
            AND node_name = %s
        """, (EXECUTION_ID, problematic_node))
        
        start_result = cursor.fetchone()
        start_count = start_result['start_count'] if start_result else 0
        
        print(f"   Started {start_count:,} times")
        
        if start_count == completion_count:
            print(f"   â†’ This action was executed {start_count:,} separate times")
            print(f"   â†’ Something is causing it to be triggered repeatedly")
        else:
            print(f"   â†’ Start/completion mismatch indicates multiple completions per start")
            print(f"   â†’ This suggests a bug in the completion mechanism")
        
        # Check what triggers this action (look for preceding events)
        cursor.execute("""
            SELECT event_type, COUNT(*) as count
            FROM noetl.event_log 
            WHERE execution_id = %s 
            AND node_name = %s
            GROUP BY event_type
            ORDER BY count DESC
        """, (EXECUTION_ID, problematic_node))
        
        event_types = cursor.fetchall()
        print(f"\n   Event types for {problematic_node}:")
        for et in event_types:
            print(f"      {et['event_type']:20s}: {et['count']:8,}")
        
        # ROOT CAUSE DIAGNOSIS
        print(f"\nROOT CAUSE DIAGNOSIS:")
        if problematic_node == 'city_loop':
            print(f"   â†’ The city_loop itself is the problem")
            print(f"   â†’ It's completing but immediately restarting")
            print(f"   â†’ Loop completion mechanism is broken")
        elif 'evaluate' in problematic_node.lower():
            print(f"   â†’ Weather evaluation step is stuck in infinite execution")
            print(f"   â†’ Child execution completion may be broken")
        else:
            print(f"   â†’ Action '{problematic_node}' has a runaway execution pattern")
        
        print(f"\nIMMADIATE FIXES NEEDED:")
        print(f"   1. âœ… DONE: Stopped the server to prevent further damage")
        print(f"   2. ðŸ”§ TODO: Fix the completion mechanism for '{problematic_node}'")
        print(f"   3. ðŸ”§ TODO: Add safeguards to prevent infinite action execution")
        print(f"   4. ðŸ”§ TODO: Investigate why this action doesn't properly complete and stop")

except Exception as e:
    print(f"Error: {e}")
    import traceback
    traceback.print_exc()

RESET CONNECTION AND IDENTIFY PROBLEMATIC ACTION
âœ“ Fresh database connection established

ACTION COMPLETION COUNTS (Total: 87,050):
    1. city_loop                 | COMPLETED  |   87,049 completions
      *** RUNAWAY ACTION: city_loop completed 87,049 times! ***
    2. start                     | COMPLETED  |        1 completions

WORST OFFENDER: city_loop
   Completed 87,049 times!
   Started 0 times
   â†’ Start/completion mismatch indicates multiple completions per start
   â†’ This suggests a bug in the completion mechanism

   Event types for city_loop:
      action_completed    :   87,049
      end_loop            :        4
      loop_iteration      :        3

ROOT CAUSE DIAGNOSIS:
   â†’ The city_loop itself is the problem
   â†’ It's completing but immediately restarting
   â†’ Loop completion mechanism is broken

IMMADIATE FIXES NEEDED:
   1. âœ… DONE: Stopped the server to prevent further damage
   2. ðŸ”§ TODO: Fix the completion mechanism for 'city_loop'
   3. ðŸ”§ TO