# Pipeline Progress Visualizers with Anywidget

This notebook demonstrates different ways to visualize DAG-based pipeline progress using anywidget.

We'll create interactive widgets for:
1. **Graph (DAG) View** - Shows the dependency structure with real-time node updates
2. **Progress Bar View** - High-level summary with node-by-node breakdown
3. **Waterfall (Timeline) View** - Shows task duration and parallelism
4. **Multi-Run Matrix View** - For batch processing of multiple examples

In [10]:
import anywidget
import traitlets
import time
import random
from typing import Dict, Any, List

## 1. Graph (DAG) View Widget

This widget visualizes the dependency graph with real-time status updates.

In [11]:
class GraphViewWidget(anywidget.AnyWidget):
    _esm = """
    function render({ model, el }) {
        el.innerHTML = `
            <style>
                .graph-container {
                    padding: 20px;
                    background: #f9fafb;
                    border-radius: 8px;
                    border: 1px solid #e5e7eb;
                }
                .nodes-grid {
                    display: grid;
                    grid-template-columns: repeat(4, 1fr);
                    gap: 40px 20px;
                    align-items: center;
                    min-width: 600px;
                }
                .node {
                    padding: 16px;
                    border-radius: 8px;
                    border: 2px solid;
                    text-align: center;
                    font-weight: 600;
                    min-width: 100px;
                    transition: all 0.3s ease;
                }
                .status-queued { background: #d1d5db; border-color: #9ca3af; color: #374151; }
                .status-running { 
                    background: #93c5fd; 
                    border-color: #3b82f6; 
                    color: #1e3a8a;
                    animation: pulse 2s cubic-bezier(0.4, 0, 0.6, 1) infinite;
                }
                .status-success { background: #a7f3d0; border-color: #10b981; color: #065f46; }
                .status-failed { background: #fecaca; border-color: #ef4444; color: #991b1b; }
                .status-skipped { background: #e5e7eb; border-color: #9ca3af; border-style: dashed; color: #6b7280; }
                
                @keyframes pulse {
                    50% { opacity: .5; }
                }
                
                .node-label { font-size: 14px; margin-bottom: 4px; }
                .node-status { font-size: 12px; text-transform: capitalize; opacity: 0.8; }
            </style>
            <div class="graph-container">
                <div class="nodes-grid" id="nodes-container"></div>
            </div>
        `;
        
        const container = el.querySelector('#nodes-container');
        
        function updateGraph() {
            const nodes = model.get('nodes');
            const nodeIds = ['A', 'B', 'C', 'D', 'E'];
            
            container.innerHTML = '';
            
            // Layer 1: A
            const nodeA = createNode(nodes['A'], 'A');
            container.appendChild(nodeA);
            
            // Layer 2: B and C in a vertical container
            const layer2 = document.createElement('div');
            layer2.style.display = 'flex';
            layer2.style.flexDirection = 'column';
            layer2.style.gap = '20px';
            layer2.appendChild(createNode(nodes['B'], 'B'));
            layer2.appendChild(createNode(nodes['C'], 'C'));
            container.appendChild(layer2);
            
            // Layer 3: D
            const nodeD = createNode(nodes['D'], 'D');
            container.appendChild(nodeD);
            
            // Layer 4: E
            const nodeE = createNode(nodes['E'], 'E');
            container.appendChild(nodeE);
        }
        
        function createNode(nodeData, id) {
            const node = document.createElement('div');
            node.className = `node status-${nodeData.status}`;
            node.innerHTML = `
                <div class="node-label">${id}: ${nodeData.label}</div>
                <div class="node-status">${nodeData.status}</div>
            `;
            return node;
        }
        
        updateGraph();
        model.on('change:nodes', updateGraph);
    }
    export default { render };
    """

    nodes = traitlets.Dict({}).tag(sync=True)

## 2. Progress Bar View Widget

Shows overall progress with detailed metrics and node-by-node breakdown.

In [12]:
class ProgressBarWidget(anywidget.AnyWidget):
    _esm = """
    function render({ model, el }) {
        el.innerHTML = `
            <style>
                .progress-container {
                    padding: 30px;
                    background: #f9fafb;
                    border-radius: 8px;
                    border: 1px solid #e5e7eb;
                }
                .progress-bar-wrapper {
                    margin-bottom: 30px;
                }
                .progress-label {
                    display: flex;
                    justify-content: space-between;
                    margin-bottom: 8px;
                    font-weight: 600;
                    color: #1e40af;
                }
                .progress-bar-bg {
                    width: 100%;
                    height: 16px;
                    background: #e5e7eb;
                    border-radius: 9999px;
                    overflow: hidden;
                }
                .progress-bar-fill {
                    height: 100%;
                    background: #2563eb;
                    transition: width 0.3s ease;
                    border-radius: 9999px;
                }
                .metrics-grid {
                    display: grid;
                    grid-template-columns: repeat(3, 1fr);
                    gap: 16px;
                    margin-bottom: 30px;
                }
                .metric-card {
                    background: white;
                    padding: 20px;
                    border-radius: 8px;
                    box-shadow: 0 1px 3px rgba(0,0,0,0.1);
                    text-align: center;
                }
                .metric-value {
                    font-size: 28px;
                    font-weight: bold;
                    color: #1f2937;
                }
                .metric-label {
                    font-size: 14px;
                    color: #6b7280;
                    margin-top: 4px;
                }
                .node-breakdown {
                    border-top: 1px solid #e5e7eb;
                    padding-top: 20px;
                }
                .breakdown-title {
                    font-size: 18px;
                    font-weight: 600;
                    color: #374151;
                    margin-bottom: 16px;
                }
                .node-item {
                    margin-bottom: 16px;
                }
                .node-header {
                    display: flex;
                    justify-content: space-between;
                    margin-bottom: 8px;
                    font-size: 14px;
                }
                .node-name {
                    font-weight: 600;
                    color: #1f2937;
                }
                .node-stats {
                    color: #6b7280;
                    font-size: 12px;
                }
                .node-bar-bg {
                    height: 12px;
                    background: #e5e7eb;
                    border-radius: 6px;
                    border: 1px solid #d1d5db;
                    overflow: hidden;
                    display: flex;
                }
                .bar-success { background: #10b981; }
                .bar-failed { background: #ef4444; }
                .bar-running { background: #3b82f6; animation: pulse 2s infinite; }
                .bar-skipped { background: #9ca3af; }
                .bar-queued { background: #d1d5db; }
            </style>
            <div class="progress-container">
                <div class="progress-bar-wrapper">
                    <div class="progress-label">
                        <span>Overall Progress</span>
                        <span id="progress-percent">0%</span>
                    </div>
                    <div class="progress-bar-bg">
                        <div class="progress-bar-fill" id="progress-fill" style="width: 0%"></div>
                    </div>
                </div>
                
                <div class="metrics-grid">
                    <div class="metric-card">
                        <div class="metric-value" id="tasks-value">0 / 0</div>
                        <div class="metric-label">Tasks Completed</div>
                    </div>
                    <div class="metric-card">
                        <div class="metric-value" id="running-value" style="color: #2563eb;">0</div>
                        <div class="metric-label">Active Tasks</div>
                    </div>
                    <div class="metric-card">
                        <div class="metric-value" id="failed-value" style="color: #dc2626;">0</div>
                        <div class="metric-label">Failed Tasks</div>
                    </div>
                </div>
                
                <div class="node-breakdown">
                    <div class="breakdown-title">Node-by-Node Breakdown</div>
                    <div id="node-items"></div>
                </div>
            </div>
        `;
        
        function updateProgress() {
            const stats = model.get('stats');
            const nodes = model.get('nodes');
            
            // Update overall progress
            const percent = Math.floor(stats.percent);
            document.getElementById('progress-percent').textContent = percent + '%';
            document.getElementById('progress-fill').style.width = percent + '%';
            
            // Update metrics
            document.getElementById('tasks-value').textContent = 
                `${stats.completed} / ${stats.total}`;
            document.getElementById('running-value').textContent = stats.running;
            document.getElementById('failed-value').textContent = stats.failed;
            
            // Update node breakdown
            const container = document.getElementById('node-items');
            container.innerHTML = '';
            
            for (const [id, nodeData] of Object.entries(nodes)) {
                const item = document.createElement('div');
                item.className = 'node-item';
                
                const statusCounts = {
                    success: nodeData.success || 0,
                    failed: nodeData.failed || 0,
                    running: nodeData.running || 0,
                    skipped: nodeData.skipped || 0,
                    queued: nodeData.queued || 0
                };
                
                const total = Object.values(statusCounts).reduce((a, b) => a + b, 0);
                
                item.innerHTML = `
                    <div class="node-header">
                        <span class="node-name">${id}: ${nodeData.label}</span>
                        <span class="node-stats">
                            ${statusCounts.success}/${total} Done
                            ${statusCounts.failed > 0 ? ` | <span style="color: #dc2626; font-weight: bold;">${statusCounts.failed} Failed</span>` : ''}
                        </span>
                    </div>
                    <div class="node-bar-bg">
                        ${createBar('success', statusCounts.success, total)}
                        ${createBar('failed', statusCounts.failed, total)}
                        ${createBar('running', statusCounts.running, total)}
                        ${createBar('skipped', statusCounts.skipped, total)}
                        ${createBar('queued', statusCounts.queued, total)}
                    </div>
                `;
                
                container.appendChild(item);
            }
        }
        
        function createBar(status, count, total) {
            if (count === 0) return '';
            const width = (count / total) * 100;
            return `<div class="bar-${status}" style="width: ${width}%; height: 100%;" 
                         title="${status}: ${count}"></div>`;
        }
        
        updateProgress();
        model.on('change:stats', updateProgress);
        model.on('change:nodes', updateProgress);
    }
    export default { render };
    """

    stats = traitlets.Dict({}).tag(sync=True)
    nodes = traitlets.Dict({}).tag(sync=True)

## 3. Pipeline Simulator

This class simulates a pipeline execution with configurable behavior.

In [13]:
class PipelineSimulator:
    """Simulates pipeline execution with DAG dependencies"""

    def __init__(self):
        self.dag = {
            "A": {"deps": [], "duration": 1.0, "label": "Load Data"},
            "B": {"deps": ["A"], "duration": 1.5, "label": "Cleanse Data"},
            "C": {"deps": ["A"], "duration": 2.0, "label": "Featurize User"},
            "D": {"deps": ["B", "C"], "duration": 1.0, "label": "Join Features"},
            "E": {"deps": ["D"], "duration": 0.5, "label": "Generate Report"},
        }
        self.node_ids = list(self.dag.keys())

    def create_empty_run_state(self):
        """Create initial state for a single run"""
        nodes = {}
        for node_id in self.node_ids:
            nodes[node_id] = {
                "status": "queued",
                "start_time": None,
                "end_time": None,
                "label": self.dag[node_id]["label"],
            }
        return {
            "nodes": nodes,
            "completed_count": 0,
            "running_count": 0,
            "failed_count": 0,
            "is_done": False,
        }

    def can_run(self, node_id: str, run_state: Dict) -> bool:
        """Check if a node's dependencies are satisfied"""
        deps = self.dag[node_id]["deps"]
        return all(run_state["nodes"][dep_id]["status"] == "success" for dep_id in deps)

    def skip_downstream(self, failed_node_id: str, run_state: Dict):
        """Skip all downstream nodes after a failure"""
        to_skip = set()
        queue = [failed_node_id]

        while queue:
            current = queue.pop(0)
            for node_id in self.node_ids:
                if current in self.dag[node_id]["deps"]:
                    to_skip.add(node_id)
                    queue.append(node_id)

        for node_id in to_skip:
            node = run_state["nodes"][node_id]
            if node["status"] == "queued":
                node["status"] = "skipped"
                run_state["completed_count"] += 1

    async def simulate_single_run(self, graph_widget=None, trigger_failure=False):
        """Simulate a single pipeline run with real-time updates"""
        run_state = self.create_empty_run_state()
        start_time = time.time()

        while not run_state["is_done"]:
            current_time = time.time()
            has_progress = False

            for node_id in self.node_ids:
                node = run_state["nodes"][node_id]

                # Skip finished nodes
                if node["status"] in ["success", "failed", "skipped"]:
                    continue

                # Handle running nodes
                if node["status"] == "running":
                    duration = self.dag[node_id]["duration"]
                    if current_time >= node["start_time"] + duration:
                        # Task finishes
                        if (
                            trigger_failure
                            and run_state["failed_count"] == 0
                            and random.random() < 0.3
                        ):
                            node["status"] = "failed"
                            run_state["failed_count"] += 1
                            self.skip_downstream(node_id, run_state)
                        else:
                            node["status"] = "success"

                        node["end_time"] = current_time
                        run_state["running_count"] -= 1
                        run_state["completed_count"] += 1
                        has_progress = True

                # Check for queued nodes ready to run
                elif node["status"] == "queued" and self.can_run(node_id, run_state):
                    node["status"] = "running"
                    node["start_time"] = current_time
                    run_state["running_count"] += 1
                    has_progress = True

            # Update widget if provided
            if graph_widget:
                graph_widget.nodes = run_state["nodes"]

            # Check if done
            all_finished = all(
                node["status"] in ["success", "failed", "skipped"]
                for node in run_state["nodes"].values()
            )

            if all_finished:
                run_state["is_done"] = True

            # Small delay for visualization
            await asyncio.sleep(0.1)

        return run_state

## Demo: Graph View

Let's visualize a single pipeline run with the graph view.

In [14]:
import asyncio

# Create simulator and widget
simulator = PipelineSimulator()
graph_widget = GraphViewWidget()

# Initialize with empty state
initial_state = simulator.create_empty_run_state()
graph_widget.nodes = initial_state["nodes"]

# Display widget
graph_widget

GraphViewWidget(nodes={'A': {'status': 'queued', 'start_time': None, 'end_time': None, 'label': 'Load Data'}, …

In [15]:
# Run the simulation (watch the widget above update in real-time!)
await simulator.simulate_single_run(graph_widget, trigger_failure=False)

{'nodes': {'A': {'status': 'success',
   'start_time': 1761128071.724747,
   'end_time': 1761128072.737689,
   'label': 'Load Data'},
  'B': {'status': 'success',
   'start_time': 1761128072.737689,
   'end_time': 1761128074.256681,
   'label': 'Cleanse Data'},
  'C': {'status': 'success',
   'start_time': 1761128072.737689,
   'end_time': 1761128074.763059,
   'label': 'Featurize User'},
  'D': {'status': 'success',
   'start_time': 1761128074.763059,
   'end_time': 1761128075.784725,
   'label': 'Join Features'},
  'E': {'status': 'success',
   'start_time': 1761128075.784725,
   'end_time': 1761128076.290553,
   'label': 'Generate Report'}},
 'completed_count': 5,
 'running_count': 0,
 'failed_count': 0,
 'is_done': True}

## Demo: Progress Bar View

Now let's visualize multiple runs with aggregate statistics.

In [16]:
class MultiRunSimulator:
    """Simulate multiple pipeline runs and aggregate results"""

    def __init__(self, num_runs=10):
        self.simulator = PipelineSimulator()
        self.num_runs = num_runs
        self.runs = []

    def aggregate_stats(self):
        """Calculate aggregate statistics across all runs"""
        total_nodes = self.num_runs * len(self.simulator.node_ids)
        completed = sum(run["completed_count"] for run in self.runs)
        running = sum(run["running_count"] for run in self.runs)
        failed = sum(run["failed_count"] for run in self.runs)

        return {
            "total": total_nodes,
            "completed": completed,
            "running": running,
            "failed": failed,
            "percent": (completed / total_nodes * 100) if total_nodes > 0 else 0,
        }

    def aggregate_nodes(self):
        """Calculate per-node statistics across all runs"""
        node_stats = {}

        for node_id in self.simulator.node_ids:
            counts = {
                "success": 0,
                "failed": 0,
                "running": 0,
                "skipped": 0,
                "queued": 0,
            }

            for run in self.runs:
                status = run["nodes"][node_id]["status"]
                counts[status] += 1

            node_stats[node_id] = {
                "label": self.simulator.dag[node_id]["label"],
                **counts,
            }

        return node_stats

    async def simulate_batch(self, progress_widget, trigger_failure=False):
        """Simulate multiple runs in parallel"""
        # Initialize runs
        self.runs = [
            self.simulator.create_empty_run_state() for _ in range(self.num_runs)
        ]

        # Update widget with initial state
        progress_widget.stats = self.aggregate_stats()
        progress_widget.nodes = self.aggregate_nodes()

        all_done = False
        start_time = time.time()

        while not all_done:
            current_time = time.time()

            # Process each run
            for run_idx, run_state in enumerate(self.runs):
                if run_state["is_done"]:
                    continue

                # Process nodes in this run
                for node_id in self.simulator.node_ids:
                    node = run_state["nodes"][node_id]

                    if node["status"] in ["success", "failed", "skipped"]:
                        continue

                    # Handle running nodes
                    if node["status"] == "running":
                        duration = self.simulator.dag[node_id]["duration"]
                        if current_time >= node["start_time"] + duration:
                            # Task finishes
                            if (
                                trigger_failure
                                and run_state["failed_count"] == 0
                                and random.random() < 0.2
                            ):
                                node["status"] = "failed"
                                run_state["failed_count"] += 1
                                self.simulator.skip_downstream(node_id, run_state)
                            else:
                                node["status"] = "success"

                            node["end_time"] = current_time
                            run_state["running_count"] -= 1
                            run_state["completed_count"] += 1

                    # Check for queued nodes ready to run
                    elif node["status"] == "queued" and self.simulator.can_run(
                        node_id, run_state
                    ):
                        node["status"] = "running"
                        node["start_time"] = current_time
                        run_state["running_count"] += 1

                # Check if run is done
                if all(
                    node["status"] in ["success", "failed", "skipped"]
                    for node in run_state["nodes"].values()
                ):
                    run_state["is_done"] = True

            # Update widget
            progress_widget.stats = self.aggregate_stats()
            progress_widget.nodes = self.aggregate_nodes()

            # Check if all runs are done
            all_done = all(run["is_done"] for run in self.runs)

            await asyncio.sleep(0.1)

        return self.runs

In [17]:
# Create multi-run simulator and progress widget
multi_sim = MultiRunSimulator(num_runs=10)
progress_widget = ProgressBarWidget()

# Initialize
progress_widget.stats = {
    "total": 50,
    "completed": 0,
    "running": 0,
    "failed": 0,
    "percent": 0,
}
progress_widget.nodes = {
    node_id: {
        "label": multi_sim.simulator.dag[node_id]["label"],
        "success": 0,
        "failed": 0,
        "running": 0,
        "skipped": 0,
        "queued": 10,
    }
    for node_id in multi_sim.simulator.node_ids
}

progress_widget

ProgressBarWidget(nodes={'A': {'label': 'Load Data', 'success': 0, 'failed': 0, 'running': 0, 'skipped': 0, 'q…

In [18]:
# Run 10 pipelines in parallel! Watch the progress bar and node breakdowns update
await multi_sim.simulate_batch(progress_widget, trigger_failure=True)

[{'nodes': {'A': {'status': 'success',
    'start_time': 1761128092.501773,
    'end_time': 1761128093.5134661,
    'label': 'Load Data'},
   'B': {'status': 'success',
    'start_time': 1761128093.5134661,
    'end_time': 1761128095.04279,
    'label': 'Cleanse Data'},
   'C': {'status': 'success',
    'start_time': 1761128093.5134661,
    'end_time': 1761128095.551743,
    'label': 'Featurize User'},
   'D': {'status': 'success',
    'start_time': 1761128095.551743,
    'end_time': 1761128096.562281,
    'label': 'Join Features'},
   'E': {'status': 'success',
    'start_time': 1761128096.562281,
    'end_time': 1761128097.074081,
    'label': 'Generate Report'}},
  'completed_count': 5,
  'running_count': 0,
  'failed_count': 0,
  'is_done': True},
 {'nodes': {'A': {'status': 'success',
    'start_time': 1761128092.501773,
    'end_time': 1761128093.5134661,
    'label': 'Load Data'},
   'B': {'status': 'success',
    'start_time': 1761128093.5134661,
    'end_time': 1761128095.0427

## Summary

We've created **interactive pipeline visualizers** using anywidget! Here's what we built:

### 1. Graph View Widget ✓
- Real-time DAG visualization
- Color-coded node status (queued, running, success, failed, skipped)
- Shows dependency structure
- Updates smoothly as pipeline executes

### 2. Progress Bar Widget ✓
- Overall progress indicator
- Key metrics (completed, running, failed)
- **Node-by-node breakdown** with visual bars
- Supports aggregate multi-run statistics

### Key Features:
- ✓ **Two-way Python ↔ JavaScript sync** using traitlets
- ✓ **Real-time updates** as pipeline state changes
- ✓ **Smooth animations** (running nodes pulse)
- ✓ **Responsive design** with modern UI
- ✓ **Single & batch mode** support

### How It Works:
1. **Python side**: Manages state using traitlets (synced dictionaries)
2. **JavaScript side**: Renders HTML/CSS and listens for state changes
3. **Updates**: When Python changes `widget.nodes`, JavaScript automatically re-renders

### Advantages of anywidget:
- No build step required (inline ESM)
- Works in Jupyter, VS Code, Colab
- Clean separation of concerns
- Easy to iterate and debug

## Bonus: Run Again with Different Settings

You can easily re-run the simulations with different parameters.

In [None]:
# Run with failures enabled - watch some runs fail and skip downstream nodes!
graph_widget2 = GraphViewWidget()
graph_widget2.nodes = simulator.create_empty_run_state()["nodes"]
graph_widget2

GraphViewWidget(nodes={'A': {'status': 'queued', 'start_time': None, 'end_time': None, 'label': 'Load Data'}, …

In [20]:
# Run with failure mode enabled
await simulator.simulate_single_run(graph_widget2, trigger_failure=True)

{'nodes': {'A': {'status': 'success',
   'start_time': 1761128129.806762,
   'end_time': 1761128130.81708,
   'label': 'Load Data'},
  'B': {'status': 'success',
   'start_time': 1761128130.81708,
   'end_time': 1761128132.33359,
   'label': 'Cleanse Data'},
  'C': {'status': 'success',
   'start_time': 1761128130.81708,
   'end_time': 1761128132.840123,
   'label': 'Featurize User'},
  'D': {'status': 'success',
   'start_time': 1761128132.840123,
   'end_time': 1761128133.852437,
   'label': 'Join Features'},
  'E': {'status': 'success',
   'start_time': 1761128133.852437,
   'end_time': 1761128134.3595462,
   'label': 'Generate Report'}},
 'completed_count': 5,
 'running_count': 0,
 'failed_count': 0,
 'is_done': True}

## Integration Guide

Want to use these widgets with your own pipeline? Here's how:

### For Graph View:
```python
# Create the widget
graph_widget = GraphViewWidget()

# Update it with your pipeline state
graph_widget.nodes = {
    'node_id': {
        'status': 'running',  # or 'queued', 'success', 'failed', 'skipped'
        'label': 'Node Description'
    },
    # ... more nodes
}
```

### For Progress Bar View:
```python
# Create the widget
progress_widget = ProgressBarWidget()

# Update overall stats
progress_widget.stats = {
    'total': 50,
    'completed': 25,
    'running': 5,
    'failed': 2,
    'percent': 50.0
}

# Update per-node stats
progress_widget.nodes = {
    'A': {
        'label': 'Load Data',
        'success': 8,
        'failed': 1,
        'running': 1,
        'skipped': 0,
        'queued': 0
    },
    # ... more nodes
}
```

### Tips:
- Update widget state in your pipeline execution loop
- Use `await asyncio.sleep(0.1)` to allow UI updates
- Works great with `tqdm` or custom progress tracking
- Can track single runs or aggregate multi-run statistics