# Quick Start - Interactive Sankey Library

Get started with interactive Sankey diagrams in just a few lines of code!

## 🔄 Important: Restart Kernel

If you see errors about unexpected keyword arguments (like `animation_duration`), please **restart the kernel** to reload the latest library changes:
- Click **Kernel → Restart** or use the restart button in the toolbar
- Then run all cells from the beginning

In [1]:
# Setup
import sys
sys.path.insert(0, '../src')

# Force reload to get latest changes
import importlib
if 'sankey_interactive' in sys.modules:
    importlib.reload(sys.modules['sankey_interactive'])

import pandas as pd
import numpy as np
from sankey_interactive import SankeyDiagram

## Example 1: Simple Flow Diagram

In [2]:
# Create simple data with multiple stages
data = pd.DataFrame({
    'source': ['Raw Materials', 'Raw Materials', 'Raw Materials', 
               'Processing A', 'Processing A', 'Processing B', 
               'Assembly', 'Assembly', 'QA Check'],
    'target': ['Processing A', 'Processing B', 'Processing C',
               'Assembly', 'Quality Check', 'Assembly',
               'QA Check', 'Packaging', 'Shipping'],
    'value': [45, 30, 25, 40, 5, 28, 35, 33, 35]
})

# Create and display
diagram = SankeyDiagram(data)
diagram.render('source', 'target', 'value', 
               title='Manufacturing Process Flow').show()

## Example 2: Time Series with Animation

In [3]:
# Create comprehensive multi-stage energy system data
np.random.seed(42)
dates = pd.date_range('2024-01', periods=6, freq='ME')

# Define a 5-stage energy system
sources = ['Solar', 'Wind', 'Hydro', 'Coal', 'Nuclear']  # Stage 1: Generation
generation_types = ['Peak', 'Base Load', 'Variable']      # Stage 2: Generation Type
storage = ['Battery', 'Pumped Hydro', 'Direct Grid']      # Stage 3: Storage/Distribution
distribution = ['Urban Grid', 'Rural Grid', 'Industrial'] # Stage 4: Distribution
consumers = ['Residential', 'Commercial', 'Industrial', 'Transport']  # Stage 5: End Use

data = []
for date in dates:
    month = date.month
    # Seasonal factors
    solar_factor = 1 + 0.4 * np.sin(2 * np.pi * month / 12)
    wind_factor = 1 + 0.3 * np.cos(2 * np.pi * month / 12)
    
    # Stage 1 → Stage 2: Sources to Generation Types
    for source in sources:
        for gen_type in generation_types:
            if source == 'Solar':
                base = 80 * solar_factor if gen_type == 'Variable' else 20
            elif source == 'Wind':
                base = 70 * wind_factor if gen_type == 'Variable' else 15
            elif source in ['Coal', 'Nuclear']:
                base = 90 if gen_type == 'Base Load' else 30
            else:  # Hydro
                base = 60 if gen_type == 'Peak' else 40
            
            value = base * np.random.uniform(0.8, 1.2)
            if value > 10:  # Only add significant flows
                data.append({
                    'date': date, 'source': source, 'target': gen_type,
                    'value': value, 'stage': '1-2'
                })
    
    # Stage 2 → Stage 3: Generation Types to Storage
    for gen_type in generation_types:
        for store in storage:
            if gen_type == 'Variable' and store == 'Battery':
                base = 60
            elif gen_type == 'Base Load' and store == 'Direct Grid':
                base = 100
            elif gen_type == 'Peak' and store == 'Pumped Hydro':
                base = 50
            else:
                base = 30
            
            value = base * np.random.uniform(0.7, 1.3)
            data.append({
                'date': date, 'source': gen_type, 'target': store,
                'value': value, 'stage': '2-3'
            })
    
    # Stage 3 → Stage 4: Storage to Distribution
    for store in storage:
        for dist in distribution:
            base = np.random.uniform(40, 90)
            data.append({
                'date': date, 'source': store, 'target': dist,
                'value': base, 'stage': '3-4'
            })
    
    # Stage 4 → Stage 5: Distribution to Consumers
    for dist in distribution:
        for consumer in consumers:
            if dist == 'Industrial' and consumer == 'Industrial':
                base = 80
            elif dist == 'Urban Grid' and consumer in ['Residential', 'Commercial']:
                base = 70
            elif dist == 'Rural Grid' and consumer == 'Residential':
                base = 50
            else:
                base = 30
            
            value = base * np.random.uniform(0.8, 1.2)
            if value > 20:  # Only significant flows
                data.append({
                    'date': date, 'source': dist, 'target': consumer,
                    'value': value, 'stage': '4-5'
                })

df = pd.DataFrame(data)

# Create animated diagram with smooth transitions
diagram = SankeyDiagram(df, time_column='date')
diagram.render('source', 'target', 'value', 
               title="5-Stage Energy System: Generation → Storage → Distribution → End Use",
               show_timeline=True,
               animation_duration=1200,  # Slower, smoother transitions (in ms)
               transition_easing='cubic-in-out').show()  # Smooth easing

print(f"✅ Created {len(df)} flow records across 5 stages")

✅ Created 270 flow records across 5 stages


### Animation Speed Options

You can control the transition speed and smoothness:
- `animation_duration`: Speed in milliseconds (300=fast, 800=normal, 1500=slow)
- `transition_easing`: Smoothness style ('linear', 'cubic-in-out', 'elastic', etc.)

In [4]:
# Example: Fast animation
diagram_fast = SankeyDiagram(df, time_column='date')
fig_fast = diagram_fast.render('source', 'target', 'value', 
                                title="⚡ Fast Animation (300ms)",
                                show_timeline=True,
                                animation_duration=300,  # Fast!
                                transition_easing='linear')
fig_fast.show()

# Example: Very smooth, slow animation
diagram_slow = SankeyDiagram(df, time_column='date')
fig_slow = diagram_slow.render('source', 'target', 'value', 
                                title="🐢 Smooth & Slow Animation (1500ms)",
                                show_timeline=True,
                                animation_duration=1500,  # Slow and smooth
                                transition_easing='cubic-in-out')
fig_slow.show()

## Example 3: With Filters and Histogram Coloring

In [5]:
# Add filters
diagram = SankeyDiagram(df, time_column='date')
diagram.add_filter('high_value', lambda d: d[d['value'] > 80])

# Render with histogram coloring and smooth animation
diagram.render('source', 'target', 'value',
               title="High-Value Flows with Color Gradient",
               show_histogram=True,
               show_timeline=True,
               animation_duration=1000,  # Smooth 1-second transitions
               transition_easing='cubic-in-out').show()

## That's it! 🎉

You now have interactive Sankey diagrams with:
- ⏱️ Timeline animation with adjustable speed
- 🌈 Value-based coloring
- 🔍 Dynamic filters
- 🎮 Interactive controls
- ✨ Smooth transitions with customizable easing
- 📏 **Horizontal node spacing based on custom metrics** (e.g., wait time, delay)

### Animation Speed Tips:
- **Fast (300ms)**: Quick transitions, good for presentations
- **Normal (800ms)**: Balanced speed and smoothness (default)
- **Slow (1200-1500ms)**: Very smooth, easy to follow changes

### Node Spacing Tips:
- Use `set_node_spacing_metric()` to control **horizontal spacing** between node stages
- Perfect for visualizing temporal metrics like wait times, processing delays, or duration
- Higher metric values = more horizontal distance to the next stage
- Great for showing bottlenecks or time-intensive processes

Check out `interactive_demos.ipynb` for more advanced examples!

## Example 4: Horizontal Node Spacing Based on Metrics

Control the **horizontal distance** between node stages based on custom metrics like "average wait time" or "processing delay". Stages with longer wait times will have more horizontal space!

In [6]:
# Create sample data with wait times at each stage
np.random.seed(42)
process_data = pd.DataFrame({
    'source': ['Intake', 'Intake', 'Triage', 'Triage', 'Treatment', 'Treatment'],
    'target': ['Triage', 'Registration', 'Treatment', 'Consultation', 'Recovery', 'Discharge'],
    'value': [100, 50, 80, 20, 60, 40],
    'wait_time': [2.5, 1.0, 15.3, 8.2, 45.0, 5.5]  # Average wait time in minutes
})

# Define a spacing metric based on wait time
# This controls HORIZONTAL distance between stages
def wait_time_spacing(df):
    """Map node names to their average wait times"""
    spacing = {}
    
    # Get wait times for each node
    if 'wait_time' in df.columns:
        for _, row in df.iterrows():
            target = row['target']
            wait = row['wait_time']
            if target in spacing:
                spacing[target] = (spacing[target] + wait) / 2  # Average if multiple
            else:
                spacing[target] = wait
    
    return spacing

# Create diagram with horizontal node spacing based on wait time
diagram = SankeyDiagram(process_data)
diagram.set_node_spacing_metric(wait_time_spacing)

# Render with custom spacing
diagram.render('source', 'target', 'value',
               title="Hospital Process Flow - Horizontal Spacing by Wait Time",
               show_timeline=False).show()

# Show the wait times
print("\n📊 Wait Times (affects horizontal spacing between stages):")
for _, row in process_data.iterrows():
    print(f"   {row['target']}: {row['wait_time']} minutes")
print("\nℹ️  Stages with longer wait times have more horizontal distance!")
print("   (e.g., 'Recovery' with 45 min wait → larger gap before next stage)")


📊 Wait Times (affects horizontal spacing between stages):
   Triage: 2.5 minutes
   Registration: 1.0 minutes
   Treatment: 15.3 minutes
   Consultation: 8.2 minutes
   Recovery: 45.0 minutes
   Discharge: 5.5 minutes

ℹ️  Stages with longer wait times have more horizontal distance!
   (e.g., 'Recovery' with 45 min wait → larger gap before next stage)


### Visual Impact

The horizontal spacing feature makes it easy to spot:
- ⏰ **Bottlenecks**: Stages with long wait times are visually stretched out
- ⚡ **Fast processes**: Short delays appear compact
- 🎯 **Critical paths**: Identify where time is being spent in your process

Perfect for process optimization, patient flow analysis, manufacturing throughput, and more!