# Sankey Diagram Showcase

**Note:** Seaborn doesn't have native Sankey diagram support. This notebook demonstrates the best alternatives using **Plotly** (interactive) and **matplotlib** (static) approaches.

## What are Sankey Diagrams?
Sankey diagrams show flows between categories with width proportional to flow quantity. Perfect for visualizing energy flows, customer journeys, or process workflows.


In [5]:
# Import required libraries
import pandas as pd
import numpy as np

try:
    import plotly.graph_objects as go
    import plotly.express as px
    # Try to get plotly version robustly
    try:
        import plotly
        plotly_version = plotly.__version__
    except AttributeError:
        plotly_version = "unknown"
    plotly_available = True
except ModuleNotFoundError:
    plotly_available = False
    plotly_version = None
    print("Warning: Plotly is not installed. Plotly-based Sankey diagrams will be unavailable.")

import matplotlib.pyplot as plt
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(42)

# Configure display settings
plt.rcParams['figure.figsize'] = (14, 8)
sns.set_theme(style="whitegrid")

print("Libraries loaded successfully!")
if plotly_available:
    print(f"Plotly version: {plotly_version}")
else:
    print("Plotly is not available.")
print(f"Pandas version: {pd.__version__}")


Libraries loaded successfully!
Plotly version: 6.3.1
Pandas version: 2.3.3


## Example 1: Customer Journey Sankey Diagram

Track how customers move through different stages of a sales funnel.


In [6]:
# Create customer journey data
customer_journey = {
    'source': [
        'Website Visit', 'Website Visit', 'Website Visit',
        'Product View', 'Product View', 'Product View',
        'Add to Cart', 'Add to Cart',
        'Checkout', 'Checkout'
    ],
    'target': [
        'Product View', 'Leave Site', 'Support',
        'Add to Cart', 'Leave Site', 'Support',
        'Checkout', 'Abandon',
        'Purchase', 'Abandon'
    ],
    'value': [450, 150, 100, 380, 70, 50, 320, 60, 280, 40]
}

df_journey = pd.DataFrame(customer_journey)
print("Customer Journey Data:")
print(df_journey)

# Create mapping for nodes
all_nodes = pd.concat([df_journey['source'], df_journey['target']]).unique()
node_dict = {node: idx for idx, node in enumerate(all_nodes)}

# Create source and target indices
source_indices = df_journey['source'].map(node_dict).tolist()
target_indices = df_journey['target'].map(node_dict).tolist()

# Color palette (using your brand colors)
colors = ['#1A6BFF', '#88A943', '#A80C7C', '#DBAE06', '#008C4A', '#36A5CC']
node_colors = [colors[i % len(colors)] for i in range(len(all_nodes))]

# Create Sankey diagram
fig = go.Figure(data=[go.Sankey(
    node=dict(
        pad=15,
        thickness=20,
        line=dict(color='black', width=0.5),
        label=all_nodes,
        color=node_colors
    ),
    link=dict(
        source=source_indices,
        target=target_indices,
        value=df_journey['value'].tolist(),
        color=['rgba(26, 107, 255, 0.4)' if x < 200 else 'rgba(136, 169, 67, 0.4)' for x in df_journey['value']]
    )
)])

fig.update_layout(
    title='Customer Journey Funnel - Sankey Diagram',
    font=dict(size=12, family='Arial, sans-serif'),
    height=600,
    width=1000,
    plot_bgcolor='white'
)

fig.show()


Customer Journey Data:
          source        target  value
0  Website Visit  Product View    450
1  Website Visit    Leave Site    150
2  Website Visit       Support    100
3   Product View   Add to Cart    380
4   Product View    Leave Site     70
5   Product View       Support     50
6    Add to Cart      Checkout    320
7    Add to Cart       Abandon     60
8       Checkout      Purchase    280
9       Checkout       Abandon     40


## Example 2: Energy Flow Sankey Diagram

Visualize energy distribution across different sectors.


In [7]:
# Create energy flow data
energy_data = {
    'source': [
        'Solar', 'Wind', 'Coal', 'Natural Gas',
        'Renewable', 'Renewable', 'Fossil Fuels', 'Fossil Fuels',
        'Electricity Grid', 'Electricity Grid', 'Electricity Grid'
    ],
    'target': [
        'Renewable', 'Renewable', 'Fossil Fuels', 'Fossil Fuels',
        'Electricity Grid', 'Heat', 'Electricity Grid', 'Heat',
        'Industrial', 'Commercial', 'Residential'
    ],
    'value': [150, 120, 200, 180, 270, 40, 380, 20, 400, 180, 150]
}

df_energy = pd.DataFrame(energy_data)

# Create mapping for nodes
all_nodes_energy = pd.concat([df_energy['source'], df_energy['target']]).unique()
node_dict_energy = {node: idx for idx, node in enumerate(all_nodes_energy)}

# Create indices
source_indices_e = df_energy['source'].map(node_dict_energy).tolist()
target_indices_e = df_energy['target'].map(node_dict_energy).tolist()

# Color mapping for energy types
energy_colors = {
    'Solar': '#DBAE06',
    'Wind': '#1A6BFF',
    'Coal': '#666666',
    'Natural Gas': '#88A943',
    'Renewable': '#008C4A',
    'Fossil Fuels': '#A80C7C',
    'Electricity Grid': '#36A5CC',
    'Heat': '#F2510A',
    'Industrial': '#1A6BFF',
    'Commercial': '#88A943',
    'Residential': '#008C4A'
}

node_colors_energy = [energy_colors.get(node, '#cccccc') for node in all_nodes_energy]

# Create energy Sankey
fig2 = go.Figure(data=[go.Sankey(
    node=dict(
        pad=20,
        thickness=25,
        line=dict(color='darkgray', width=1),
        label=all_nodes_energy,
        color=node_colors_energy
    ),
    link=dict(
        source=source_indices_e,
        target=target_indices_e,
        value=df_energy['value'].tolist(),
        color=['rgba(216, 174, 6, 0.3)' for _ in range(len(df_energy))]
    )
)])

fig2.update_layout(
    title='Energy Flow Distribution - Sankey Diagram',
    font=dict(size=12, family='Arial, sans-serif'),
    height=700,
    width=1000,
    plot_bgcolor='white'
)

fig2.show()


## Example 3: Product Category Flow Sankey

Analyze how products move through different categories and price tiers.


In [8]:
# Create product category data
product_data = {
    'source': [
        'Electronics', 'Electronics', 'Electronics',
        'Clothing', 'Clothing', 'Clothing',
        'Home & Garden', 'Home & Garden', 'Home & Garden',
        'Budget', 'Budget', 'Mid-Range', 'Mid-Range', 'Premium', 'Premium'
    ],
    'target': [
        'Budget', 'Mid-Range', 'Premium',
        'Budget', 'Mid-Range', 'Premium',
        'Budget', 'Mid-Range', 'Premium',
        'In Stock', 'Out of Stock',
        'In Stock', 'Out of Stock',
        'In Stock', 'Discontinued'
    ],
    'value': [45, 85, 70, 120, 95, 35, 60, 75, 30, 100, 25, 150, 30, 130, 20]
}

df_products = pd.DataFrame(product_data)

# Create mapping for nodes
all_nodes_prod = pd.concat([df_products['source'], df_products['target']]).unique()
node_dict_prod = {node: idx for idx, node in enumerate(all_nodes_prod)}

# Create indices
source_indices_p = df_products['source'].map(node_dict_prod).tolist()
target_indices_p = df_products['target'].map(node_dict_prod).tolist()

# Color mapping for product tiers
product_colors = {
    'Electronics': '#1A6BFF',
    'Clothing': '#88A943',
    'Home & Garden': '#008C4A',
    'Budget': '#FFC21A',
    'Mid-Range': '#F2510A',
    'Premium': '#A80C7C',
    'In Stock': '#43a047',
    'Out of Stock': '#f4511e',
    'Discontinued': '#666666'
}

node_colors_prod = [product_colors.get(node, '#cccccc') for node in all_nodes_prod]

# Create product Sankey
fig3 = go.Figure(data=[go.Sankey(
    node=dict(
        pad=15,
        thickness=20,
        line=dict(color='gray', width=0.5),
        label=all_nodes_prod,
        color=node_colors_prod
    ),
    link=dict(
        source=source_indices_p,
        target=target_indices_p,
        value=df_products['value'].tolist(),
        hovertemplate='%{source.label} → %{target.label}<br>Quantity: %{value}<extra></extra>'
    )
)])

fig3.update_layout(
    title='Product Category & Price Tier Distribution',
    font=dict(size=11, family='Arial, sans-serif'),
    height=600,
    width=1000,
    plot_bgcolor='white'
)

fig3.show()


## Key Takeaways for Sankey Diagrams

✅ **Best Use Cases:**
- Process flows and workflows
- Customer journeys
- Energy or resource distribution
- Data migrations or transformations
- Supply chain tracking

✅ **Libraries Compared:**
- **Plotly** (used here): Interactive, zoom/pan, hover details
- **Matplotlib (via `matplotlib-sankey`)**: Static, lightweight
- **Altair**: Declarative, great for ggplot-style designs
- **HoloViews**: Great for linked visualizations

✅ **Design Tips:**
- Use distinct colors for different node types
- Order nodes logically (left to right flow)
- Limit the number of nodes for clarity (typically < 20)
- Make link thickness proportional to flow values
