# NoETL Execution Validation Notebook

**Validate a playbook execution by inspecting server APIs, database tables, and local logs.**

This notebook has numbered cells to help diagnose where the distributed loop execution chain breaks:

## Cell Reference Guide:
- **Cell 1-3**: Configuration and setup
- **Cell 4-5**: Basic execution and event validation  
- **Cell 6**: LOOP EVENT VALIDATION - Check if loop events are emitted
- **Cell 7**: CHAIN ANALYSIS - Track complete loop→child→completion flow
- **Cell 8**: DATABASE VALIDATION - Query DB directly for loop events
- **Cell 9-10**: Error logs and weather_alert_summary table status
- **Cell 11**: TROUBLESHOOTING GUIDE - Common issues and fixes

## What Each Cell Tells You:
- Uses HTTP API to fetch execution summary and events
- Queries Postgres "noetl.error_log" for recent errors  
- Reads final table "weather_alert_summary" for inserted rows
- Analyzes distributed loop completion chain step-by-step
- Provides specific diagnostics for empty weather_alert_summary table

In [55]:
# Cell 1: Configuration
import os, json, time, pathlib

# Server
HOST = os.environ.get('NOETL_HOST', 'localhost')
PORT = int(os.environ.get('NOETL_PORT', '8082'))
BASE = f'http://{HOST}:{PORT}/api'

# Database parameters — will be populated from the execution's workload; no env fallbacks
PGHOST = ''
PGPORT = 0
PGUSER = ''
PGPASSWORD = ''
PGDATABASE = ''

# If you already know the execution id, set it here (as string or int)
EXECUTION_ID = os.environ.get('NOETL_LAST_EXECUTION_ID') or ''

# Resolve logs directory robustly when running from notebooks/
LOGS_DIR_ENV = os.environ.get('NOETL_LOG_DIR')
LOGS_DIR_CANDIDATES = []
if LOGS_DIR_ENV: LOGS_DIR_CANDIDATES.append(pathlib.Path(LOGS_DIR_ENV))
LOGS_DIR_CANDIDATES += [pathlib.Path('logs'), pathlib.Path('../logs'), pathlib.Path('../../logs')]
LOGS_DIR = next((p for p in LOGS_DIR_CANDIDATES if p.exists()), pathlib.Path('logs'))

print('Server:', BASE)
print('DB (from execution workload):', PGHOST or '(unset)', PGPORT or '(unset)', PGUSER or '(unset)', PGDATABASE or '(unset)')
print('Logs dir:', str(LOGS_DIR), 'exists:', LOGS_DIR.exists())

Server: http://localhost:8082/api
DB (from execution workload): (unset) (unset) (unset) (unset)
Logs dir: ../logs exists: True


In [56]:
# Cell 2: Helper HTTP GET with stdlib fallback
def http_get_json(url: str):
    try:
        import requests  # type: ignore
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        return r.json()
    except Exception:
        import urllib.request, urllib.error
        try:
            with urllib.request.urlopen(url, timeout=30) as resp:
                return json.loads(resp.read().decode('utf-8'))
        except Exception as e:
            print('HTTP error:', e)
            return None

def pretty(obj):
    print(json.dumps(obj, indent=2, ensure_ascii=False))

In [57]:
# Cell 3: Try to infer EXECUTION_ID if not provided
if not EXECUTION_ID:
    # 1) Try status.json in repo root
    st = pathlib.Path('status.json')
    if st.exists():
        try:
            data = json.loads(st.read_text(encoding='utf-8'))
            eid = data.get('id') or data.get('execution_id')
            if eid:
                EXECUTION_ID = str(eid)
        except Exception:
            pass

# 2) Try server /executions to pick the most recent
if not EXECUTION_ID:
    ex_list = http_get_json(f'{BASE}/executions')
    if isinstance(ex_list, list) and ex_list:
        # Items are dicts; keep first
        eid = ex_list[0].get('execution_id') or ex_list[0].get('id')
        if eid:
            EXECUTION_ID = str(eid)

print('EXECUTION_ID =', EXECUTION_ID or '(set me)')

EXECUTION_ID = 222504588622692352


In [58]:
# Cell 4: Fetch execution summary
if EXECUTION_ID:
    summary = http_get_json(f'{BASE}/executions/{EXECUTION_ID}')
    print('Execution summary:')
    pretty(summary)
    # Extract DB connection from the execution's workload (execution_start event)
    try:
        evs = (summary or {}).get('events') or []
        start = next((e for e in evs if e.get('event_type')=='execution_start'), None)
        if start:
            wl = ((start.get('input_context') or {}).get('workload')) or {}
            PGHOST = wl.get('pg_host') or PGHOST
            PGPORT = int(wl.get('pg_port') or PGPORT or 0)
            PGUSER = wl.get('pg_user') or PGUSER
            PGPASSWORD = wl.get('pg_password') or PGPASSWORD
            PGDATABASE = wl.get('pg_db') or PGDATABASE
    except Exception as e:
        print('Failed to extract DB params from workload:', e)
    print('DB resolved:', PGHOST or '(unset)', PGPORT or '(unset)', PGUSER or '(unset)', PGDATABASE or '(unset)')
else:
    print('Please set EXECUTION_ID above.')

Execution summary:
{
  "id": "222504588622692352",
  "playbook_id": "",
  "playbook_name": "Unknown",
  "status": "completed",
  "start_time": "2025-09-05T18:53:47.866730",
  "end_time": "2025-09-05T18:53:51.270330",
  "duration": 3.4036,
  "progress": 100,
  "result": {
    "id": "548ea210-91fe-453f-af42-93cafa7ee842",
    "status": "success",
    "data": {
      "global_alert": false,
      "summary": {
        "alert_cities": [],
        "count": 0
      }
    }
  },
  "error": null,
  "events": [
    {
      "event_id": "222504588639469568",
      "event_type": "execution_start",
      "node_id": "222504588639469568",
      "node_name": "weather_loop_example",
      "node_type": "playbook",
      "status": "in_progress",
      "duration": 0.0,
      "timestamp": "2025-09-05T18:53:47.866730",
      "input_context": {
        "path": "examples/weather/weather_loop_example",
        "version": "0.1.0",
        "workload": {
          "jobId": "{{ job.uuid }}",
          "state": "read

In [59]:
# Cell 6: LOOP EVENT VALIDATION - Check if end_loop and execution_complete events exist
# DIAGNOSTIC: This identifies missing events that indicate where the chain breaks

if events and isinstance(events, dict):
    ev = events.get('events') or []
    
    print("LOOP EVENT VALIDATION")
    print("=" * 50)
    
    # Check for end_loop events
    end_loop_events = [e for e in ev if e.get('event_type') == 'end_loop']
    print(f"End_loop events found: {len(end_loop_events)}")
    
    if len(end_loop_events) == 0:
        print("ISSUE: No end_loop events found - loop completion mechanism not triggered!")
        print("   → Loop iterations may have completed but didn't aggregate results")
        print("   → Child executions may be missing execution_complete events")
        print("   → Run Cell 7 for detailed chain analysis")
    else:
        for e in end_loop_events:
            node_name = e.get('node_name')
            status = e.get('status') 
            result_available = bool(e.get('output_result'))
            print(f"   Loop: {node_name}, Status: {status}, Has result: {result_available}")
    
    # Check for execution_complete events from child executions
    execution_complete_events = [e for e in ev if e.get('event_type') == 'execution_complete']
    print(f"\nExecution_complete events found: {len(execution_complete_events)}")
    
    if len(execution_complete_events) == 0:
        print("ISSUE: No execution_complete events found!")
        print("   → Child executions may have finished but didn't emit completion events")
        print("   → This prevents loop completion mechanism from triggering")
        print("   → Run Cell 9 to manually trigger child completion")
    else:
        for e in execution_complete_events:
            exec_id = e.get('execution_id')
            status = e.get('status')
            return_value = e.get('output_result')
            print(f"   Execution {exec_id}: {status}")
            if return_value:
                print(f"      Return value: {json.dumps(return_value, indent=2)}")
    
    # Check for city_loop specific completion
    city_loop_completed = [e for e in ev 
                          if e.get('event_type') == 'action_completed' 
                          and e.get('node_name') == 'city_loop']
    
    print(f"\nCity_loop completion events: {len(city_loop_completed)}")
    
    if len(city_loop_completed) == 0:
        print("ISSUE: No city_loop completion events found!")
        print("   → Loop didn't complete successfully")
        print("   → Aggregated results not available for next steps")
        print("   → Run Cell 10 to manually trigger loop completion")
    else:
        for e in city_loop_completed:
            status = e.get('status')
            result = e.get('output_result')
            print(f"   Status: {status}")
            if result:
                print(f"   Aggregated result: {json.dumps(result, indent=2)}")
    
    # Overall assessment
    print(f"\nOVERALL ASSESSMENT:")
    if len(end_loop_events) > 0 and len(execution_complete_events) > 0 and len(city_loop_completed) > 0:
        print("SUCCESS: All critical loop events found - distributed loop completed correctly")
    else:
        missing = []
        if len(execution_complete_events) == 0: missing.append("execution_complete")
        if len(end_loop_events) == 0: missing.append("end_loop") 
        if len(city_loop_completed) == 0: missing.append("city_loop completion")
        print(f"ISSUES: Missing events: {', '.join(missing)}")
        print("        → This explains why weather_alert_summary table is empty")
        print("        → Use manual intervention cells (9-10) to fix")

else:
    print('Events not loaded - cannot validate loop events.')

LOOP EVENT VALIDATION
End_loop events found: 1
   Loop: city_loop, Status: TRACKING, Has result: False

Execution_complete events found: 0
ISSUE: No execution_complete events found!
   → Child executions may have finished but didn't emit completion events
   → This prevents loop completion mechanism from triggering
   → Run Cell 9 to manually trigger child completion

City_loop completion events: 3
   Status: COMPLETED
   Status: COMPLETED
   Status: COMPLETED

OVERALL ASSESSMENT:
ISSUES: Missing events: execution_complete
        → This explains why weather_alert_summary table is empty
        → Use manual intervention cells (9-10) to fix


In [60]:
# Quick inline validation:
# - Count loop iterations for city_loop
# - Ensure at least one COMPLETED action exists
if events and isinstance(events, dict):
    ev = events.get('events') or []
    loop_iters = [e for e in ev if e.get('event_type') == 'loop_iteration' and e.get('node_name') == 'city_loop']
    completed = [e for e in ev if (e.get('status') or '').lower() in ('completed','success')]
    print('city_loop.iterations =', len(loop_iters))
    print('completed events =', len(completed))
else:
    print('Events not loaded.')


city_loop.iterations = 3
completed events = 16


In [61]:
# Cell 8: DATABASE VALIDATION - Check event_log directly for debugging
# DIAGNOSTIC: This queries the database directly to bypass API issues

try:
    import psycopg2
    from psycopg2.extras import RealDictCursor
    
    print("DIRECT DATABASE EVENT VALIDATION")
    print("=" * 50)
    
    # Connect to database directly
    conn = psycopg2.connect(
        host="localhost",
        port="5432", 
        database="noetl",
        user="noetl",
        password="noetl"
    )
    
    cursor = conn.cursor(cursor_factory=RealDictCursor)
    
    # Query 1: Count total events for this execution
    cursor.execute("""
        SELECT event_type, COUNT(*) as count 
        FROM event_log 
        WHERE execution_id = %s 
        GROUP BY event_type 
        ORDER BY count DESC
    """, (EXECUTION_ID,))
    
    event_counts = cursor.fetchall()
    print("Event type counts:")
    for row in event_counts:
        print(f"   {row['event_type']}: {row['count']}")
    
    # Query 2: Loop-specific events with details
    cursor.execute("""
        SELECT event_type, node_name, status, 
               SUBSTRING(input_context::text, 1, 100) as context_preview,
               created_at
        FROM event_log 
        WHERE execution_id = %s 
        AND (event_type IN ('loop_iteration', 'end_loop', 'action_completed', 'execution_complete')
             OR node_name = 'city_loop')
        ORDER BY created_at
    """, (EXECUTION_ID,))
    
    loop_events = cursor.fetchall()
    print(f"\nLoop-related events ({len(loop_events)}):")
    
    for row in loop_events:
        timestamp = row['created_at'].strftime('%H:%M:%S')
        print(f"   {timestamp} | {row['event_type']:15s} | {row['node_name']:20s} | {row['status'] or 'N/A':10s}")
        if row['context_preview']:
            print(f"              Context: {row['context_preview']}...")
    
    # Query 3: Check for child execution completion
    cursor.execute("""
        SELECT DISTINCT 
            (input_context::json)->>'child_execution_id' as child_id,
            COUNT(*) as event_count
        FROM event_log 
        WHERE execution_id = %s 
        AND event_type = 'loop_iteration'
        AND input_context::json ? 'child_execution_id'
        GROUP BY (input_context::json)->>'child_execution_id'
    """, (EXECUTION_ID,))
    
    child_info = cursor.fetchall()
    print(f"\nChild execution tracking ({len(child_info)} children):")
    
    for row in child_info:
        child_id = row['child_id']
        print(f"   Child {child_id}:")
        
        # Check if this child has execution_complete event
        cursor.execute("""
            SELECT COUNT(*) as complete_count
            FROM event_log 
            WHERE execution_id = %s 
            AND event_type = 'execution_complete'
        """, (child_id,))
        
        complete_result = cursor.fetchone()
        complete_count = complete_result['complete_count'] if complete_result else 0
        
        if complete_count > 0:
            print(f"      COMPLETED ({complete_count} events)")
        else:
            print(f"      NOT completed - MISSING execution_complete event!")
            print(f"         → Run: evaluate_broker_for_execution('{child_id}')")
    
    # Query 4: Check postgres task events
    cursor.execute("""
        SELECT event_type, status, error, 
               SUBSTRING(COALESCE(output_result::text, error, 'No output'), 1, 150) as result_preview
        FROM event_log 
        WHERE execution_id = %s 
        AND node_name = 'store_summary_postgres_task'
        ORDER BY created_at
    """, (EXECUTION_ID,))
    
    postgres_events = cursor.fetchall()
    print(f"\nPostgres task validation ({len(postgres_events)} events):")
    
    if len(postgres_events) == 0:
        print("   NO postgres task events found!")
        print("      → Postgres task never executed - previous steps failed")
    else:
        for row in postgres_events:
            status_indicator = "COMPLETED" if row['status'] == 'COMPLETED' else "ERROR" if row['status'] == 'ERROR' else "IN_PROGRESS"
            print(f"   {status_indicator} {row['event_type']} | Status: {row['status'] or 'N/A'}")
            if row['result_preview']:
                print(f"      Result/Error: {row['result_preview']}")
    
    # Query 5: Check actual weather_alert_summary table
    cursor.execute("""
        SELECT COUNT(*) as row_count 
        FROM weather_alert_summary
    """)
    
    table_result = cursor.fetchone()
    summary_count = table_result['row_count'] if table_result else 0
    
    print(f"\nFinal table validation:")
    print(f"   weather_alert_summary rows: {summary_count}")
    
    if summary_count == 0:
        print("   TABLE IS EMPTY - postgres task failed to insert data")
        print("      → Check postgres task errors above")
        print("      → Verify database parameters in workbook task")
    else:
        print("   Table has data - execution successful!")
        
        # Show sample data
        cursor.execute("""
            SELECT * FROM weather_alert_summary 
            ORDER BY created_at DESC 
            LIMIT 3
        """)
        sample_data = cursor.fetchall()
        print("      Sample rows:")
        for row in sample_data:
            print(f"        {dict(row)}")
    
    cursor.close()
    conn.close()
    
except Exception as e:
    print(f"Database connection failed: {e}")
    print("   → Check if PostgreSQL is running on localhost:5432")
    print("   → Verify database credentials (noetl/noetl@noetl)")
    print("   → Use Cell 6 for API-based validation instead")

Database connection failed: No module named 'psycopg2'
   → Check if PostgreSQL is running on localhost:5432
   → Verify database credentials (noetl/noetl@noetl)
   → Use Cell 6 for API-based validation instead


In [62]:
# Cell 7: CHAIN ANALYSIS - Track complete loop→child→completion flow  
# DIAGNOSTIC: This shows exactly where the distributed loop chain breaks!

if events and isinstance(events, dict):
    ev = events.get('events') or []
    
    print("CITY_LOOP COMPLETION CHAIN ANALYSIS")
    print("=" * 50)
    
    # Step 1: Count loop_iteration events (should be 3 for London, Paris, Berlin)
    loop_iterations = [e for e in ev if e.get('event_type') == 'loop_iteration' and e.get('node_name') == 'city_loop']
    print(f"Step 1 - Loop iterations spawned: {len(loop_iterations)} (Expected: 3)")
    
    if len(loop_iterations) != 3:
        print(f"BREAK POINT: Expected 3 loop iterations, got {len(loop_iterations)}")
        print("   → Check if city_loop step has distribution: true and correct cities list")
    
    child_execution_ids = []
    for i, e in enumerate(loop_iterations):
        ctx = e.get('input_context', {})
        child_id = ctx.get('child_execution_id')
        if child_id:
            child_execution_ids.append(child_id)
        print(f"   Iteration {i+1}: Child execution {child_id}")
    
    print(f"\nStep 2 - Child executions to track: {child_execution_ids}")
    
    # Step 3: Check which children completed
    completed_children = []
    for child_id in child_execution_ids:
        child_complete = [e for e in ev if e.get('execution_id') == child_id and e.get('event_type') == 'execution_complete']
        if child_complete:
            completed_children.append(child_id)
            print(f"   COMPLETED: Child {child_id} completed")
        else:
            print(f"   NOT COMPLETED: Child {child_id} NOT completed - BREAK POINT!")
            print(f"      → Run: evaluate_broker_for_execution('{child_id}') to trigger completion")
    
    print(f"\nStep 3 - Completed children: {len(completed_children)}/{len(child_execution_ids)}")
    
    if len(completed_children) < len(child_execution_ids):
        print("BREAK POINT: Not all child executions completed")
        print("   → Child executions finished but didn't emit execution_complete events")
        print("   → Need manual broker evaluation for missing children")
    
    # Step 4: Check for end_loop tracking and completion
    city_loop_end_events = [e for e in ev if e.get('event_type') == 'end_loop' and e.get('node_name') == 'city_loop']
    print(f"\nStep 4 - End_loop events for city_loop: {len(city_loop_end_events)}")
    
    if len(city_loop_end_events) == 0:
        print("BREAK POINT: No end_loop events - loop completion mechanism not started")
        print("   → Run: check_and_process_completed_loops(EXECUTION_ID) manually")
    
    for e in city_loop_end_events:
        status = e.get('status')
        result = e.get('output_result')
        print(f"   Status: {status}, Result available: {bool(result)}")
    
    # Step 5: Check final city_loop action_completed event
    city_loop_completed = [e for e in ev 
                          if e.get('event_type') == 'action_completed' 
                          and e.get('node_name') == 'city_loop' 
                          and e.get('status') == 'COMPLETED']
    
    print(f"\nStep 5 - Final city_loop completion: {len(city_loop_completed)} event(s)")
    
    if len(city_loop_completed) == 0:
        print("BREAK POINT: No final city_loop completion event")
        print("   → Loop completion mechanism didn't emit final aggregated result")
    
    for e in city_loop_completed:
        result = e.get('output_result')
        if result:
            print(f"   Final aggregated result: {json.dumps(result, indent=4)}")
    
    # Step 6: Check if subsequent steps received the aggregated data
    aggregate_events = [e for e in ev if e.get('node_name') == 'aggregate_alerts_task']
    print(f"\nStep 6 - Aggregate alerts events: {len(aggregate_events)}")
    
    empty_input_found = False
    for e in aggregate_events:
        if e.get('event_type') == 'action_started':
            ctx = e.get('input_context', {})
            alerts_param = ctx.get('task', {}).get('with', {}).get('alerts')
            print(f"   Alerts input to aggregate_alerts_task: {alerts_param}")
            if not alerts_param or alerts_param == "":
                empty_input_found = True
    
    if empty_input_found:
        print("BREAK POINT: aggregate_alerts_task received empty input")
        print("   → city_loop results not passed to next step - template resolution issue")
    
    # Step 7: Check postgres task execution
    postgres_events = [e for e in ev if e.get('node_name') == 'store_summary_postgres_task']
    postgres_errors = [e for e in postgres_events if e.get('event_type') == 'action_error']
    
    print(f"\nStep 7 - Postgres storage events: {len(postgres_events)} (errors: {len(postgres_errors)})")
    
    if len(postgres_errors) > 0:
        print("BREAK POINT: Postgres task failed")
        for e in postgres_errors:
            error = e.get('error', 'Unknown error')
            print(f"   Error: {error}")
        print("   → Check database parameters and template variable resolution")
    
    if len(postgres_events) == 0:
        print("BREAK POINT: Postgres task never executed")
        print("   → Previous steps failed, postgres task not reached")
        
else:
    print('Events not loaded - cannot analyze loop completion chain.')

CITY_LOOP COMPLETION CHAIN ANALYSIS
Step 1 - Loop iterations spawned: 3 (Expected: 3)
   Iteration 1: Child execution 222504588958236672
   Iteration 2: Child execution 222504589000179712
   Iteration 3: Child execution 222504589050511360

Step 2 - Child executions to track: ['222504588958236672', '222504589000179712', '222504589050511360']
   NOT COMPLETED: Child 222504588958236672 NOT completed - BREAK POINT!
      → Run: evaluate_broker_for_execution('222504588958236672') to trigger completion
   NOT COMPLETED: Child 222504589000179712 NOT completed - BREAK POINT!
      → Run: evaluate_broker_for_execution('222504589000179712') to trigger completion
   NOT COMPLETED: Child 222504589050511360 NOT completed - BREAK POINT!
      → Run: evaluate_broker_for_execution('222504589050511360') to trigger completion

Step 3 - Completed children: 0/3
BREAK POINT: Not all child executions completed
   → Child executions finished but didn't emit execution_complete events
   → Need manual broker e

In [63]:
# Cell 9: MANUAL CHILD COMPLETION - Force child execution completion events
# DIAGNOSTIC: Use this when Cell 7/8 shows missing execution_complete events

print("MANUAL CHILD EXECUTION COMPLETION")
print("=" * 50)

# First get child execution IDs from loop_iteration events
if events and isinstance(events, dict):
    ev = events.get('events') or []
    loop_iterations = [e for e in ev if e.get('event_type') == 'loop_iteration' and e.get('node_name') == 'city_loop']
    
    child_execution_ids = []
    for e in loop_iterations:
        ctx = e.get('input_context', {})
        child_id = ctx.get('child_execution_id')
        if child_id:
            child_execution_ids.append(child_id)
    
    print(f"Found {len(child_execution_ids)} child executions to check:")
    for i, child_id in enumerate(child_execution_ids):
        print(f"   {i+1}. {child_id}")
    
    # Check which ones need completion
    incomplete_children = []
    for child_id in child_execution_ids:
        child_complete = [e for e in ev if e.get('execution_id') == child_id and e.get('event_type') == 'execution_complete']
        if not child_complete:
            incomplete_children.append(child_id)
    
    print(f"\nChildren needing completion: {len(incomplete_children)}")
    
    if len(incomplete_children) == 0:
        print("All children already completed - no manual intervention needed")
    else:
        print("The following children need manual completion:")
        for child_id in incomplete_children:
            print(f"   → {child_id}")
        
        # Manual completion
        print(f"\nTriggering manual completion for {len(incomplete_children)} children...")
        
        for child_id in incomplete_children:
            try:
                print(f"\n   Processing child: {child_id}")
                
                # Call the broker evaluation function directly
                response = requests.post(f'{BASE}/broker/evaluate/{child_id}')
                
                if response.status_code == 200:
                    result = response.json()
                    print(f"   Manual completion triggered successfully")
                    print(f"      Response: {result}")
                else:
                    print(f"   Failed to trigger completion: {response.status_code}")
                    print(f"      Error: {response.text}")
                    
            except Exception as e:
                print(f"   Exception during manual completion: {e}")
        
        print(f"\nManual completion attempts finished")
        print("   → Wait 2-3 seconds then re-run Cell 6 to check for new execution_complete events")
        print("   → If successful, you should see end_loop and final action_completed events")

else:
    print("Events not available - run Cell 2 first to load execution data")

MANUAL CHILD EXECUTION COMPLETION
Found 3 child executions to check:
   1. 222504588958236672
   2. 222504589000179712
   3. 222504589050511360

Children needing completion: 3
The following children need manual completion:
   → 222504588958236672
   → 222504589000179712
   → 222504589050511360

Triggering manual completion for 3 children...

   Processing child: 222504588958236672
   Exception during manual completion: name 'requests' is not defined

   Processing child: 222504589000179712
   Exception during manual completion: name 'requests' is not defined

   Processing child: 222504589050511360
   Exception during manual completion: name 'requests' is not defined

Manual completion attempts finished
   → Wait 2-3 seconds then re-run Cell 6 to check for new execution_complete events
   → If successful, you should see end_loop and final action_completed events


In [64]:
# Cell 10: MANUAL LOOP COMPLETION - Force loop completion mechanism
# DIAGNOSTIC: Use this when Cell 6/7 shows missing end_loop events

print("MANUAL LOOP COMPLETION TRIGGER")
print("=" * 50)

print("Step 1 - Triggering manual loop completion check...")

try:
    # Call the loop completion check function directly
    response = requests.post(f'{BASE}/broker/check-loops/{EXECUTION_ID}')
    
    if response.status_code == 200:
        result = response.json()
        print("Manual loop completion check triggered successfully")
        print(f"   Response: {result}")
        
        # Check if any loops were processed
        if 'processed_loops' in result:
            processed = result['processed_loops']
            print(f"   Processed {len(processed)} loop(s):")
            for loop in processed:
                print(f"      → {loop}")
        
    elif response.status_code == 404:
        print("Loop completion endpoint not available")
        print("   → Falling back to broker evaluation...")
        
        # Fallback: trigger broker evaluation which includes loop completion
        response = requests.post(f'{BASE}/broker/evaluate/{EXECUTION_ID}')
        
        if response.status_code == 200:
            result = response.json()
            print("Broker evaluation triggered (includes loop completion)")
            print(f"   Response: {result}")
        else:
            print(f"Broker evaluation failed: {response.status_code}")
            print(f"   Error: {response.text}")
            
    else:
        print(f"Manual loop completion failed: {response.status_code}")
        print(f"   Error: {response.text}")

except Exception as e:
    print(f"Exception during manual loop completion: {e}")
    print("   → Check if NoETL server is running")
    print("   → Verify BASE url is correct")

print(f"\nStep 2 - Post-completion validation:")
print("   → Wait 2-3 seconds then re-run Cell 6 to check results")
print("   → Look for new end_loop events with aggregated results")
print("   → Check if city_loop action_completed event appears")
print("   → Verify if subsequent steps (aggregate_alerts_task) now have input data")

print(f"\nStep 3 - If loop completion worked, you should see:")
print("   end_loop event for city_loop with aggregated weather data")
print("   action_completed event for city_loop with COMPLETED status") 
print("   action_started event for aggregate_alerts_task with alerts input")
print("   Events for store_summary_postgres_task execution")

MANUAL LOOP COMPLETION TRIGGER
Step 1 - Triggering manual loop completion check...
Exception during manual loop completion: name 'requests' is not defined
   → Check if NoETL server is running
   → Verify BASE url is correct

Step 2 - Post-completion validation:
   → Wait 2-3 seconds then re-run Cell 6 to check results
   → Look for new end_loop events with aggregated results
   → Check if city_loop action_completed event appears
   → Verify if subsequent steps (aggregate_alerts_task) now have input data

Step 3 - If loop completion worked, you should see:
   end_loop event for city_loop with aggregated weather data
   action_completed event for city_loop with COMPLETED status
   action_started event for aggregate_alerts_task with alerts input
   Events for store_summary_postgres_task execution


In [65]:
# Cell 11: TROUBLESHOOTING SUMMARY - Common issues and fixes
# DIAGNOSTIC: Reference guide for distributed loop execution problems

print("DISTRIBUTED LOOP TROUBLESHOOTING GUIDE")
print("=" * 60)

print("QUICK DIAGNOSIS CHECKLIST:")
print("Step 1 - Run Cell 2: Load execution data")
print("Step 2 - Run Cell 6: Check for missing end_loop/execution_complete events")  
print("Step 3 - Run Cell 7: Analyze complete chain from loop→child→completion")
print("Step 4 - Run Cell 8: Direct database validation (bypasses API)")
print("Step 5 - Run Cell 9: Manual child completion (if needed)")
print("Step 6 - Run Cell 10: Manual loop completion (if needed)")

print(f"\nCOMMON BREAK POINTS & FIXES:")

print(f"\nBREAK POINT 1: No loop_iteration events")
print("   → Problem: city_loop step not configured for distribution")
print("   → Fix: Add 'distribution: true' to city_loop step")
print("   → Check: examples/weather/weather_loop_example.yaml")

print(f"\nBREAK POINT 2: Child executions not completing")
print("   → Problem: Child executions finish but don't emit execution_complete events")
print("   → Fix: Run Cell 9 to manually trigger child completion")
print("   → API: POST /api/broker/evaluate/{child_execution_id}")

print(f"\nBREAK POINT 3: No end_loop events")
print("   → Problem: Loop completion mechanism not triggered")
print("   → Fix: Run Cell 10 to manually trigger loop completion")
print("   → API: POST /api/broker/check-loops/{execution_id}")

print(f"\nBREAK POINT 4: Empty input to aggregate_alerts_task")
print("   → Problem: city_loop results not passed to next step")
print("   → Fix: Check template variable resolution in playbook")
print("   → Look for: '{{ city_loop }}' in aggregate_alerts_task")

print(f"\nBREAK POINT 5: Postgres task errors")
print("   → Problem: Database connection or template variable issues")
print("   → Fix: Check hardcoded database parameters in workbook task")
print("   → Parameters: host=localhost, port=5432, database=noetl, user=noetl, password=noetl")

print(f"\nBREAK POINT 6: Empty weather_alert_summary table")
print("   → Problem: Postgres INSERT failed silently")
print("   → Fix: Check Cell 8 for postgres task errors")
print("   → Verify: INSERT statement and data format")

print(f"\nMANUAL INTERVENTION SEQUENCE:")
print("   1. Run Cell 6 → Identify missing events")
print("   2. If missing execution_complete → Run Cell 9")
print("   3. If missing end_loop → Run Cell 10") 
print("   4. Wait 2-3 seconds → Re-run Cell 6")
print("   5. If still issues → Run Cell 8 for database check")
print("   6. Check weather_alert_summary table for final results")

print(f"\nSUCCESS INDICATORS:")
print("   3 loop_iteration events (London, Paris, Berlin)")
print("   3 execution_complete events (one per child)")
print("   1 end_loop event with aggregated weather data")
print("   1 city_loop action_completed event")
print("   aggregate_alerts_task with non-empty alerts input")
print("   store_summary_postgres_task completed successfully")
print("   weather_alert_summary table contains temperature data")

print(f"\nESCALATION:")
print("   If all manual fixes fail:")
print("   → Check NoETL server logs: logs/server.log")
print("   → Check worker logs: logs/worker_*.log") 
print("   → Verify child playbook registration: city_process.yaml")
print("   → Test child playbook individually: noetl execute playbook city_process.yaml")

print(f"\nKEY FUNCTIONS:")
print("   evaluate_broker_for_execution() → Triggers child completion")
print("   check_and_process_completed_loops() → Processes loop aggregation")
print("   Event emission chain → loop_iteration → execution_complete → end_loop → action_completed")

DISTRIBUTED LOOP TROUBLESHOOTING GUIDE
QUICK DIAGNOSIS CHECKLIST:
Step 1 - Run Cell 2: Load execution data
Step 2 - Run Cell 6: Check for missing end_loop/execution_complete events
Step 3 - Run Cell 7: Analyze complete chain from loop→child→completion
Step 4 - Run Cell 8: Direct database validation (bypasses API)
Step 5 - Run Cell 9: Manual child completion (if needed)
Step 6 - Run Cell 10: Manual loop completion (if needed)

COMMON BREAK POINTS & FIXES:

BREAK POINT 1: No loop_iteration events
   → Problem: city_loop step not configured for distribution
   → Fix: Add 'distribution: true' to city_loop step
   → Check: examples/weather/weather_loop_example.yaml

BREAK POINT 2: Child executions not completing
   → Problem: Child executions finish but don't emit execution_complete events
   → Fix: Run Cell 9 to manually trigger child completion
   → API: POST /api/broker/evaluate/{child_execution_id}

BREAK POINT 3: No end_loop events
   → Problem: Loop completion mechanism not triggered


In [66]:
# Cell 12: FINAL EXECUTION SUMMARY - Overall status and next steps
# DIAGNOSTIC: Complete execution health check and recommendations

print("FINAL EXECUTION HEALTH CHECK")
print("=" * 50)

if events and isinstance(events, dict):
    ev = events.get('events') or []
    
    # Overall execution status
    execution_status = "Unknown"
    execution_result = None
    
    # Check for final execution status
    final_events = [e for e in ev if e.get('event_type') == 'execution_complete' and e.get('execution_id') == EXECUTION_ID]
    if final_events:
        latest_event = final_events[-1]
        execution_status = latest_event.get('status', 'Unknown')
        execution_result = latest_event.get('output_result')
    
    status_indicator = "SUCCESS" if execution_status == "COMPLETED" else "ERROR" if execution_status == "ERROR" else "IN_PROGRESS"
    print(f"Overall Execution Status: {status_indicator} - {execution_status}")
    
    # Health check scores
    health_scores = {
        "Child Spawning": 0,
        "Child Completion": 0, 
        "Loop Aggregation": 0,
        "Data Pipeline": 0,
        "Final Storage": 0
    }
    
    # 1. Child Spawning Check
    loop_iterations = [e for e in ev if e.get('event_type') == 'loop_iteration' and e.get('node_name') == 'city_loop']
    health_scores["Child Spawning"] = min(100, len(loop_iterations) * 33)  # 3 children = 100%
    
    # 2. Child Completion Check
    child_execution_ids = []
    for e in loop_iterations:
        ctx = e.get('input_context', {})
        child_id = ctx.get('child_execution_id')
        if child_id:
            child_execution_ids.append(child_id)
    
    completed_children = 0
    for child_id in child_execution_ids:
        child_complete = [e for e in ev if e.get('execution_id') == child_id and e.get('event_type') == 'execution_complete']
        if child_complete:
            completed_children += 1
    
    if len(child_execution_ids) > 0:
        health_scores["Child Completion"] = int((completed_children / len(child_execution_ids)) * 100)
    
    # 3. Loop Aggregation Check
    end_loop_events = [e for e in ev if e.get('event_type') == 'end_loop' and e.get('node_name') == 'city_loop']
    city_loop_completed = [e for e in ev if e.get('event_type') == 'action_completed' and e.get('node_name') == 'city_loop']
    
    aggregation_score = 0
    if end_loop_events:
        aggregation_score += 50
    if city_loop_completed:
        aggregation_score += 50
    health_scores["Loop Aggregation"] = aggregation_score
    
    # 4. Data Pipeline Check
    aggregate_events = [e for e in ev if e.get('node_name') == 'aggregate_alerts_task']
    pipeline_score = 0
    for e in aggregate_events:
        if e.get('event_type') == 'action_started':
            ctx = e.get('input_context', {})
            alerts_param = ctx.get('task', {}).get('with', {}).get('alerts')
            if alerts_param and alerts_param != "":
                pipeline_score = 100
                break
    health_scores["Data Pipeline"] = pipeline_score
    
    # 5. Final Storage Check  
    postgres_events = [e for e in ev if e.get('node_name') == 'store_summary_postgres_task']
    postgres_completed = [e for e in postgres_events if e.get('event_type') == 'action_completed' and e.get('status') == 'COMPLETED']
    health_scores["Final Storage"] = 100 if postgres_completed else 0
    
    print(f"\nHEALTH SCORES:")
    overall_health = sum(health_scores.values()) // len(health_scores)
    
    for component, score in health_scores.items():
        indicator = "GOOD" if score >= 80 else "WARNING" if score >= 50 else "ERROR"
        print(f"   {indicator:7s} {component:15s}: {score:3d}%")
    
    health_indicator = "EXCELLENT" if overall_health >= 80 else "FAIR" if overall_health >= 50 else "POOR"
    print(f"\nOverall Health: {health_indicator} - {overall_health}%")
    
    # Recommendations based on health scores
    print(f"\nRECOMMENDATIONS:")
    
    if health_scores["Child Spawning"] < 100:
        print("   Child Spawning Issue:")
        print("      → Check city_loop step has distribution: true")
        print("      → Verify cities list in playbook")
        print("      → Reference: Cell 7 for detailed analysis")
    
    if health_scores["Child Completion"] < 100:
        print("   Child Completion Issue:")
        print("      → Run Cell 9 to manually complete children")
        print("      → Check child playbook execution logs")
        print("      → Verify child execution broker evaluation")
    
    if health_scores["Loop Aggregation"] < 100:
        print("   Loop Aggregation Issue:")
        print("      → Run Cell 10 to manually trigger loop completion")
        print("      → Check comprehensive loop completion mechanism")
        print("      → Verify event emission in evaluate_broker_for_execution()")
    
    if health_scores["Data Pipeline"] < 100:
        print("   Data Pipeline Issue:")
        print("      → Check template variable resolution: {{ city_loop }}")
        print("      → Verify aggregate_alerts_task input parameters")
        print("      → Review playbook step dependencies")
    
    if health_scores["Final Storage"] < 100:
        print("   Storage Issue:")
        print("      → Run Cell 8 for database validation")
        print("      → Check postgres task hardcoded parameters")
        print("      → Verify weather_alert_summary table schema")
    
    if overall_health == 100:
        print("   EXECUTION PERFECT - All systems working correctly!")
        print("      → weather_alert_summary should contain weather data")
        print("      → Distributed loop execution completed successfully")
    elif overall_health >= 80:
        print("   EXECUTION MOSTLY SUCCESSFUL - Minor issues detected")
        print("      → Address specific component issues above")
    else:
        print("   EXECUTION HAS MAJOR ISSUES - Multiple failures detected")
        print("      → Follow manual intervention sequence in Cell 11")
        print("      → Consider re-running entire playbook after fixes")

    # Quick action plan
    if overall_health < 100:
        print(f"\nQUICK ACTION PLAN:")
        if health_scores["Child Completion"] < 100:
            print("   1. Run Cell 9 → Manual child completion")
        if health_scores["Loop Aggregation"] < 100:
            print("   2. Run Cell 10 → Manual loop completion")
        print("   3. Wait 2-3 seconds")
        print("   4. Re-run Cell 12 → Check improved health scores")
        print("   5. If still issues → Follow Cell 11 troubleshooting guide")

else:
    print("Cannot perform health check - events not loaded")
    print("   → Run Cell 2 first to load execution data")
    print("   → Verify EXECUTION_ID is set correctly")

FINAL EXECUTION HEALTH CHECK
Overall Execution Status: IN_PROGRESS - Unknown

HEALTH SCORES:
   GOOD    Child Spawning :  99%
   ERROR   Child Completion:   0%
   GOOD    Loop Aggregation: 100%
   ERROR   Data Pipeline  :   0%
   ERROR   Final Storage  :   0%

Overall Health: POOR - 39%

RECOMMENDATIONS:
   Child Spawning Issue:
      → Check city_loop step has distribution: true
      → Verify cities list in playbook
      → Reference: Cell 7 for detailed analysis
   Child Completion Issue:
      → Run Cell 9 to manually complete children
      → Check child playbook execution logs
      → Verify child execution broker evaluation
   Data Pipeline Issue:
      → Check template variable resolution: {{ city_loop }}
      → Verify aggregate_alerts_task input parameters
      → Review playbook step dependencies
   Storage Issue:
      → Run Cell 8 for database validation
      → Check postgres task hardcoded parameters
      → Verify weather_alert_summary table schema
   EXECUTION HAS MAJO

## Loop Completion Troubleshooting Summary

Based on the validation results above, here are the key issues to check:

### Expected Flow for Working Loop:
1. **Loop Iterations**: 3 `loop_iteration` events for city_loop (London, Paris, Berlin)
2. **Child Executions**: 3 child executions spawned with unique execution IDs
3. **Child Completion**: 3 `execution_complete` events from child executions with weather results
4. **Loop Tracking**: `end_loop` event with status `TRACKING` to monitor progress
5. **Loop Completion**: `action_completed` event for city_loop with aggregated results
6. **Next Steps**: aggregate_alerts_task receives aggregated data, then store_summary_postgres_task

### Common Issues:
- **Empty weather_alert_summary**: Usually caused by postgres task failing due to template variable resolution or missing database parameters
- **Missing execution_complete events**: Child executions finish but don't emit completion events - requires manual broker evaluation
- **No aggregated results**: Loop completion mechanism not triggered or children not properly tracked
- **Template resolution errors**: Database parameters like `{{ workload.pg_host }}` resolving to empty strings

### Manual Fixes:
```python
# If child executions completed but no execution_complete events:
from noetl.api.event import evaluate_broker_for_execution
await evaluate_broker_for_execution('child_execution_id')

# If loop completion not triggered:
from noetl.api.event import check_and_process_completed_loops  
await check_and_process_completed_loops('parent_execution_id')
```