# Week 6: Advanced Visualization & Dashboards

## Overview
Master creating production-ready dashboards and visualizations for large-scale marketing data using Plotly, Dash, and Redshift.

## Learning Objectives
- Create interactive dashboards with large datasets
- Implement efficient aggregation before visualization
- Build real-time monitoring dashboards
- Optimize visualization performance
- Design executive-level dashboard templates
- Deploy production dashboards

## Prerequisites
- Redshift cluster access
- Large marketing dataset
- Understanding of data aggregation
- Basic web development knowledge (helpful)

## Table of Contents
1. [Setup and Environment](#setup)
2. [Visualization Best Practices for Large Data](#best-practices)
3. [Aggregation Strategies](#aggregation)
4. [Interactive Visualizations with Plotly](#plotly)
5. [Real-Time Monitoring Dashboards](#realtime)
6. [Redshift → Pandas → Visualization Pipeline](#pipeline)
7. [Performance Optimization](#performance)
8. [Executive Dashboard Templates](#templates)
9. [Real-World Project: Live Marketing Dashboard](#project)
10. [Exercises](#exercises)

## 1. Setup and Environment <a name="setup"></a>

In [None]:
# Install required packages
!pip install -q plotly dash pandas numpy redshift_connector
!pip install -q dash-bootstrap-components kaleido
!pip install -q sqlalchemy psycopg2-binary
!pip install -q jupyter-dash

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import dash
from dash import dcc, html, Input, Output, State
import dash_bootstrap_components as dbc
from jupyter_dash import JupyterDash
import redshift_connector
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Configure Plotly
import plotly.io as pio
pio.templates.default = "plotly_white"

print("✓ Libraries imported successfully")

### Redshift Connection Setup

In [None]:
import os
from getpass import getpass

# Database configuration
REDSHIFT_CONFIG = {
    'host': os.getenv('REDSHIFT_HOST', 'your-cluster.redshift.amazonaws.com'),
    'port': int(os.getenv('REDSHIFT_PORT', '5439')),
    'database': os.getenv('REDSHIFT_DB', 'marketing_db'),
    'user': os.getenv('REDSHIFT_USER', input('Redshift username: ')),
    'password': os.getenv('REDSHIFT_PASSWORD', getpass('Redshift password: '))
}

class DataLoader:
    """Efficient data loading for visualizations"""
    
    def __init__(self, config):
        self.config = config
        self.conn = None
    
    def connect(self):
        """Establish database connection"""
        self.conn = redshift_connector.connect(**self.config)
        return self.conn
    
    def query(self, sql):
        """Execute query and return DataFrame"""
        if not self.conn:
            self.connect()
        
        cursor = self.conn.cursor()
        cursor.execute(sql)
        
        result = cursor.fetchall()
        columns = [desc[0] for desc in cursor.description]
        
        return pd.DataFrame(result, columns=columns)
    
    def load_for_viz(self, table, agg_cols, metrics, filters=None, 
                     limit=10000, order_by=None):
        """
        Load pre-aggregated data optimized for visualization
        
        Args:
            table: Source table name
            agg_cols: Columns to group by
            metrics: Dict of {alias: aggregation}
            filters: WHERE clause conditions
            limit: Maximum rows to return
            order_by: Order clause
        """
        # Build aggregations
        agg_exprs = [f"{expr} as {alias}" for alias, expr in metrics.items()]
        
        # Build query
        query = f"""
        SELECT 
            {', '.join(agg_cols)},
            {', '.join(agg_exprs)}
        FROM {table}
        """
        
        if filters:
            query += f" WHERE {filters}"
        
        query += f" GROUP BY {', '.join(agg_cols)}"
        
        if order_by:
            query += f" ORDER BY {order_by}"
        
        if limit:
            query += f" LIMIT {limit}"
        
        return self.query(query)
    
    def close(self):
        if self.conn:
            self.conn.close()

# Initialize data loader
loader = DataLoader(REDSHIFT_CONFIG)
print("✓ Data loader initialized")

## 2. Visualization Best Practices for Large Data <a name="best-practices"></a>

### Key Principles for Large-Scale Visualizations

1. **Aggregate Before Visualizing**
   - Never load raw data for visualization
   - Pre-aggregate in database (Redshift)
   - Keep viz data < 10,000 points

2. **Choose Appropriate Chart Types**
   - Time series: Line charts with aggregation
   - Categories: Bar charts (limit to top N)
   - Distributions: Histograms with binning
   - Relationships: Scatter plots with sampling

3. **Optimize Rendering**
   - Use WebGL for large scatter plots
   - Implement progressive loading
   - Cache aggregated data

4. **Ensure Interactivity**
   - Drill-down capabilities
   - Dynamic filtering
   - Responsive design

In [None]:
def demonstrate_aggregation_need():
    """
    Demonstrate why aggregation is critical for large datasets
    """
    
    # Simulate large dataset
    np.random.seed(42)
    n_points = 1_000_000
    
    dates = pd.date_range('2023-01-01', periods=365, freq='D')
    df_raw = pd.DataFrame({
        'date': np.random.choice(dates, n_points),
        'revenue': np.random.lognormal(3, 1, n_points),
        'channel': np.random.choice(['A', 'B', 'C'], n_points)
    })
    
    # BAD: Try to plot 1M points (slow, unusable)
    # fig = px.scatter(df_raw, x='date', y='revenue')  # DON'T DO THIS!
    
    # GOOD: Aggregate first
    df_agg = df_raw.groupby('date').agg({
        'revenue': ['sum', 'mean', 'count']
    }).reset_index()
    df_agg.columns = ['date', 'total_revenue', 'avg_revenue', 'transactions']
    
    # Now plot aggregated data (fast, clear)
    fig = make_subplots(
        rows=2, cols=1,
        subplot_titles=('Daily Total Revenue', 'Daily Avg Revenue')
    )
    
    fig.add_trace(
        go.Scatter(x=df_agg['date'], y=df_agg['total_revenue'], 
                   mode='lines', name='Total Revenue'),
        row=1, col=1
    )
    
    fig.add_trace(
        go.Scatter(x=df_agg['date'], y=df_agg['avg_revenue'], 
                   mode='lines', name='Avg Revenue'),
        row=2, col=1
    )
    
    fig.update_layout(height=600, showlegend=False)
    fig.show()
    
    print(f"Raw data: {len(df_raw):,} points")
    print(f"Aggregated data: {len(df_agg):,} points")
    print(f"Reduction: {(1 - len(df_agg)/len(df_raw))*100:.1f}%")

# demonstrate_aggregation_need()

## 3. Aggregation Strategies <a name="aggregation"></a>

In [None]:
class AggregationStrategy:
    """
    Smart aggregation strategies for different visualization types
    """
    
    @staticmethod
    def time_series_daily(loader, table, date_col, metric_col, filters=None):
        """
        Aggregate time series data by day
        """
        query = f"""
        SELECT 
            DATE({date_col}) as date,
            COUNT(*) as events,
            SUM({metric_col}) as total,
            AVG({metric_col}) as average,
            PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY {metric_col}) as median,
            STDDEV({metric_col}) as std_dev
        FROM {table}
        """
        
        if filters:
            query += f" WHERE {filters}"
        
        query += f"""
        GROUP BY DATE({date_col})
        ORDER BY date
        """
        
        return loader.query(query)
    
    @staticmethod
    def time_series_hourly(loader, table, timestamp_col, metric_col, 
                           last_n_days=7, filters=None):
        """
        Aggregate time series data by hour (for recent data)
        """
        query = f"""
        SELECT 
            DATE_TRUNC('hour', {timestamp_col}) as hour,
            COUNT(*) as events,
            SUM({metric_col}) as total,
            AVG({metric_col}) as average
        FROM {table}
        WHERE {timestamp_col} >= CURRENT_DATE - {last_n_days}
        """
        
        if filters:
            query += f" AND {filters}"
        
        query += f"""
        GROUP BY DATE_TRUNC('hour', {timestamp_col})
        ORDER BY hour
        """
        
        return loader.query(query)
    
    @staticmethod
    def top_n_categories(loader, table, category_col, metric_col, 
                        n=10, filters=None):
        """
        Get top N categories by metric
        """
        query = f"""
        SELECT 
            {category_col},
            COUNT(*) as count,
            SUM({metric_col}) as total,
            AVG({metric_col}) as average,
            SUM({metric_col}) * 100.0 / SUM(SUM({metric_col})) OVER () as percentage
        FROM {table}
        """
        
        if filters:
            query += f" WHERE {filters}"
        
        query += f"""
        GROUP BY {category_col}
        ORDER BY total DESC
        LIMIT {n}
        """
        
        return loader.query(query)
    
    @staticmethod
    def distribution_bins(loader, table, metric_col, bins=30, filters=None):
        """
        Create histogram bins for distribution visualization
        """
        query = f"""
        WITH bounds AS (
            SELECT 
                MIN({metric_col}) as min_val,
                MAX({metric_col}) as max_val,
                (MAX({metric_col}) - MIN({metric_col})) / {bins} as bin_width
            FROM {table}
            WHERE {metric_col} IS NOT NULL
            {f'AND {filters}' if filters else ''}
        ),
        binned AS (
            SELECT 
                FLOOR(({metric_col} - b.min_val) / b.bin_width) as bin_num,
                b.min_val + FLOOR(({metric_col} - b.min_val) / b.bin_width) * b.bin_width as bin_start
            FROM {table}, bounds b
            WHERE {metric_col} IS NOT NULL
            {f'AND {filters}' if filters else ''}
        )
        SELECT 
            bin_start,
            COUNT(*) as frequency
        FROM binned
        GROUP BY bin_start
        ORDER BY bin_start
        """
        
        return loader.query(query)
    
    @staticmethod
    def multi_dimensional(loader, table, dimensions, metrics, filters=None, limit=1000):
        """
        Multi-dimensional aggregation
        
        Args:
            dimensions: List of dimension columns
            metrics: Dict of {name: aggregation_expression}
        """
        metric_exprs = [f"{expr} as {name}" for name, expr in metrics.items()]
        
        query = f"""
        SELECT 
            {', '.join(dimensions)},
            {', '.join(metric_exprs)}
        FROM {table}
        """
        
        if filters:
            query += f" WHERE {filters}"
        
        query += f"""
        GROUP BY {', '.join(dimensions)}
        ORDER BY {list(metrics.keys())[0]} DESC
        LIMIT {limit}
        """
        
        return loader.query(query)

# Example usage
# agg = AggregationStrategy()
# daily_data = agg.time_series_daily(loader, 'marketing_events', 'timestamp', 'revenue')
# top_channels = agg.top_n_categories(loader, 'marketing_events', 'channel', 'revenue', n=10)

## 4. Interactive Visualizations with Plotly <a name="plotly"></a>

In [None]:
class MarketingVisualizations:
    """
    Library of marketing-specific visualizations
    All optimized for large datasets
    """
    
    @staticmethod
    def revenue_trend(df, date_col='date', revenue_col='total_revenue'):
        """
        Revenue trend with moving average
        """
        # Calculate moving average
        df = df.copy()
        df['ma_7'] = df[revenue_col].rolling(window=7, min_periods=1).mean()
        df['ma_30'] = df[revenue_col].rolling(window=30, min_periods=1).mean()
        
        fig = go.Figure()
        
        # Daily revenue
        fig.add_trace(go.Scatter(
            x=df[date_col],
            y=df[revenue_col],
            mode='lines',
            name='Daily Revenue',
            line=dict(color='lightgray', width=1),
            opacity=0.5
        ))
        
        # 7-day MA
        fig.add_trace(go.Scatter(
            x=df[date_col],
            y=df['ma_7'],
            mode='lines',
            name='7-Day MA',
            line=dict(color='blue', width=2)
        ))
        
        # 30-day MA
        fig.add_trace(go.Scatter(
            x=df[date_col],
            y=df['ma_30'],
            mode='lines',
            name='30-Day MA',
            line=dict(color='red', width=2)
        ))
        
        fig.update_layout(
            title='Revenue Trend Analysis',
            xaxis_title='Date',
            yaxis_title='Revenue ($)',
            hovermode='x unified',
            height=500
        )
        
        return fig
    
    @staticmethod
    def channel_performance_comparison(df, channel_col='channel', 
                                       metrics=['total', 'average', 'count']):
        """
        Multi-metric channel comparison
        """
        n_metrics = len(metrics)
        
        fig = make_subplots(
            rows=1, cols=n_metrics,
            subplot_titles=[m.replace('_', ' ').title() for m in metrics]
        )
        
        for i, metric in enumerate(metrics, 1):
            fig.add_trace(
                go.Bar(
                    x=df[channel_col],
                    y=df[metric],
                    name=metric,
                    showlegend=False
                ),
                row=1, col=i
            )
        
        fig.update_layout(
            title='Channel Performance Comparison',
            height=400,
            showlegend=False
        )
        
        return fig
    
    @staticmethod
    def conversion_funnel(stages_df, stage_col='stage', count_col='count'):
        """
        Conversion funnel visualization
        """
        fig = go.Figure(go.Funnel(
            y=stages_df[stage_col],
            x=stages_df[count_col],
            textinfo="value+percent initial",
            textposition="inside"
        ))
        
        fig.update_layout(
            title='Conversion Funnel',
            height=500
        )
        
        return fig
    
    @staticmethod
    def cohort_retention_heatmap(cohort_df, cohort_col='cohort', 
                                 period_col='period', retention_col='retention'):
        """
        Cohort retention heatmap
        """
        # Pivot data for heatmap
        pivot_df = cohort_df.pivot(
            index=cohort_col,
            columns=period_col,
            values=retention_col
        )
        
        fig = go.Figure(data=go.Heatmap(
            z=pivot_df.values,
            x=pivot_df.columns,
            y=pivot_df.index,
            colorscale='Blues',
            text=pivot_df.values,
            texttemplate='%{text:.1f}%',
            textfont={"size": 10},
            colorbar=dict(title="Retention %")
        ))
        
        fig.update_layout(
            title='Cohort Retention Analysis',
            xaxis_title='Period',
            yaxis_title='Cohort',
            height=600
        )
        
        return fig
    
    @staticmethod
    def kpi_cards(metrics_dict):
        """
        Create KPI indicator cards
        
        Args:
            metrics_dict: {title: {'value': X, 'delta': Y, 'prefix': '$'}}
        """
        n_metrics = len(metrics_dict)
        cols = min(4, n_metrics)
        rows = (n_metrics + cols - 1) // cols
        
        fig = make_subplots(
            rows=rows, cols=cols,
            specs=[[{'type': 'indicator'}] * cols for _ in range(rows)],
            subplot_titles=list(metrics_dict.keys())
        )
        
        for i, (title, data) in enumerate(metrics_dict.items(), 1):
            row = (i - 1) // cols + 1
            col = (i - 1) % cols + 1
            
            fig.add_trace(
                go.Indicator(
                    mode="number+delta",
                    value=data['value'],
                    delta={'reference': data.get('delta', data['value']),
                           'relative': True},
                    number={'prefix': data.get('prefix', '')},
                    domain={'x': [0, 1], 'y': [0, 1]}
                ),
                row=row, col=col
            )
        
        fig.update_layout(height=200 * rows)
        
        return fig

# Example usage
# viz = MarketingVisualizations()
# Sample data
# df_daily = pd.DataFrame({
#     'date': pd.date_range('2023-01-01', periods=90),
#     'total_revenue': np.random.lognormal(10, 0.5, 90)
# })
# fig = viz.revenue_trend(df_daily)
# fig.show()

## 5. Real-Time Monitoring Dashboards <a name="realtime"></a>

In [None]:
def create_realtime_dashboard():
    """
    Create a real-time monitoring dashboard using Dash
    Updates every 30 seconds
    """
    
    # Initialize Dash app
    app = JupyterDash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP])
    
    # Layout
    app.layout = dbc.Container([
        dbc.Row([
            dbc.Col([
                html.H1("Marketing Performance Dashboard"),
                html.P("Real-time monitoring", className="text-muted")
            ])
        ], className="mb-4"),
        
        # KPI Cards
        dbc.Row([
            dbc.Col([
                dbc.Card([
                    dbc.CardBody([
                        html.H4("Total Revenue", className="card-title"),
                        html.H2(id="revenue-kpi", children="$0"),
                        html.P(id="revenue-delta", className="text-success")
                    ])
                ])
            ], width=3),
            
            dbc.Col([
                dbc.Card([
                    dbc.CardBody([
                        html.H4("Conversions", className="card-title"),
                        html.H2(id="conversions-kpi", children="0"),
                        html.P(id="conversions-delta", className="text-success")
                    ])
                ])
            ], width=3),
            
            dbc.Col([
                dbc.Card([
                    dbc.CardBody([
                        html.H4("Active Users", className="card-title"),
                        html.H2(id="users-kpi", children="0"),
                        html.P(id="users-delta", className="text-info")
                    ])
                ])
            ], width=3),
            
            dbc.Col([
                dbc.Card([
                    dbc.CardBody([
                        html.H4("Avg Order Value", className="card-title"),
                        html.H2(id="aov-kpi", children="$0"),
                        html.P(id="aov-delta", className="text-warning")
                    ])
                ])
            ], width=3),
        ], className="mb-4"),
        
        # Charts
        dbc.Row([
            dbc.Col([
                dcc.Graph(id="revenue-chart")
            ], width=8),
            
            dbc.Col([
                dcc.Graph(id="channel-chart")
            ], width=4),
        ], className="mb-4"),
        
        dbc.Row([
            dbc.Col([
                dcc.Graph(id="hourly-chart")
            ], width=12),
        ]),
        
        # Auto-refresh interval
        dcc.Interval(
            id='interval-component',
            interval=30*1000,  # 30 seconds
            n_intervals=0
        )
    ], fluid=True)
    
    # Callbacks for real-time updates
    @app.callback(
        [
            Output('revenue-kpi', 'children'),
            Output('conversions-kpi', 'children'),
            Output('users-kpi', 'children'),
            Output('aov-kpi', 'children'),
            Output('revenue-chart', 'figure'),
            Output('channel-chart', 'figure'),
            Output('hourly-chart', 'figure'),
        ],
        Input('interval-component', 'n_intervals')
    )
    def update_dashboard(n):
        """
        Update all dashboard components
        In production, this would query Redshift
        """
        
        # Simulate data fetch (replace with actual Redshift queries)
        current_time = datetime.now()
        
        # KPIs
        revenue = f"${np.random.randint(50000, 100000):,}"
        conversions = f"{np.random.randint(1000, 2000):,}"
        users = f"{np.random.randint(5000, 10000):,}"
        aov = f"${np.random.randint(40, 80)}"
        
        # Revenue trend (last 24 hours)
        hours = pd.date_range(end=current_time, periods=24, freq='H')
        revenue_data = pd.DataFrame({
            'hour': hours,
            'revenue': np.random.lognormal(9, 0.3, 24)
        })
        
        revenue_fig = go.Figure()
        revenue_fig.add_trace(go.Scatter(
            x=revenue_data['hour'],
            y=revenue_data['revenue'],
            mode='lines+markers',
            fill='tozeroy',
            name='Revenue'
        ))
        revenue_fig.update_layout(
            title='Revenue (Last 24 Hours)',
            xaxis_title='Time',
            yaxis_title='Revenue ($)',
            height=300
        )
        
        # Channel distribution
        channels = ['Google', 'Facebook', 'Email', 'Organic', 'Direct']
        values = np.random.dirichlet(np.ones(5)) * 100
        
        channel_fig = go.Figure(data=[go.Pie(
            labels=channels,
            values=values,
            hole=.3
        )])
        channel_fig.update_layout(
            title='Channel Distribution',
            height=300
        )
        
        # Hourly conversions
        hourly_fig = go.Figure(data=[
            go.Bar(
                x=revenue_data['hour'],
                y=np.random.poisson(50, 24),
                name='Conversions'
            )
        ])
        hourly_fig.update_layout(
            title='Hourly Conversions',
            xaxis_title='Hour',
            yaxis_title='Conversions',
            height=300
        )
        
        return revenue, conversions, users, aov, revenue_fig, channel_fig, hourly_fig
    
    return app

# Create and run dashboard
# app = create_realtime_dashboard()
# app.run_server(mode='inline', port=8050)

## 6. Redshift → Pandas → Visualization Pipeline <a name="pipeline"></a>

In [None]:
class VisualizationPipeline:
    """
    Complete pipeline from Redshift to interactive visualizations
    Optimized for performance and scalability
    """
    
    def __init__(self, loader):
        self.loader = loader
        self.cache = {}  # Simple in-memory cache
        self.viz = MarketingVisualizations()
    
    def get_daily_metrics(self, table, date_from=None, date_to=None, cache_key='daily'):
        """
        Get daily aggregated metrics from Redshift
        """
        if cache_key in self.cache:
            return self.cache[cache_key]
        
        filters = []
        if date_from:
            filters.append(f"timestamp >= '{date_from}'")
        if date_to:
            filters.append(f"timestamp <= '{date_to}'")
        
        filter_str = ' AND '.join(filters) if filters else None
        
        query = f"""
        SELECT 
            DATE(timestamp) as date,
            COUNT(*) as events,
            COUNT(DISTINCT user_id) as unique_users,
            SUM(CASE WHEN converted = 1 THEN 1 ELSE 0 END) as conversions,
            SUM(revenue) as total_revenue,
            AVG(revenue) as avg_revenue,
            SUM(page_views) as total_pageviews,
            AVG(time_on_site) as avg_time_on_site
        FROM {table}
        {f'WHERE {filter_str}' if filter_str else ''}
        GROUP BY DATE(timestamp)
        ORDER BY date
        """
        
        df = self.loader.query(query)
        self.cache[cache_key] = df
        return df
    
    def get_channel_performance(self, table, date_from=None):
        """
        Get channel performance metrics
        """
        filter_str = f"timestamp >= '{date_from}'" if date_from else None
        
        query = f"""
        SELECT 
            channel,
            COUNT(*) as events,
            COUNT(DISTINCT user_id) as unique_users,
            SUM(revenue) as total_revenue,
            AVG(revenue) as avg_revenue,
            SUM(CASE WHEN converted = 1 THEN 1 ELSE 0 END) as conversions,
            SUM(CASE WHEN converted = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(*) as conversion_rate,
            SUM(revenue) / COUNT(DISTINCT user_id) as revenue_per_user
        FROM {table}
        {f'WHERE {filter_str}' if filter_str else ''}
        GROUP BY channel
        ORDER BY total_revenue DESC
        """
        
        return self.loader.query(query)
    
    def create_executive_dashboard(self, table, lookback_days=30):
        """
        Create comprehensive executive dashboard
        """
        date_from = (datetime.now() - timedelta(days=lookback_days)).strftime('%Y-%m-%d')
        
        # Get data
        print("Loading daily metrics...")
        daily_df = self.get_daily_metrics(table, date_from=date_from)
        
        print("Loading channel performance...")
        channel_df = self.get_channel_performance(table, date_from=date_from)
        
        # Create visualizations
        print("Creating visualizations...")
        
        # 1. Revenue trend
        fig_revenue = self.viz.revenue_trend(daily_df, 'date', 'total_revenue')
        
        # 2. Channel comparison
        fig_channels = self.viz.channel_performance_comparison(
            channel_df, 'channel', ['total_revenue', 'conversions', 'conversion_rate']
        )
        
        # 3. KPIs
        total_revenue = daily_df['total_revenue'].sum()
        total_conversions = daily_df['conversions'].sum()
        total_users = daily_df['unique_users'].sum()
        avg_aov = total_revenue / total_conversions if total_conversions > 0 else 0
        
        kpis = {
            'Total Revenue': {
                'value': total_revenue,
                'delta': total_revenue * 0.9,  # Compare to previous period
                'prefix': '$'
            },
            'Conversions': {
                'value': total_conversions,
                'delta': total_conversions * 0.95
            },
            'Unique Users': {
                'value': total_users,
                'delta': total_users * 0.88
            },
            'Avg Order Value': {
                'value': avg_aov,
                'delta': avg_aov * 1.05,
                'prefix': '$'
            }
        }
        
        fig_kpis = self.viz.kpi_cards(kpis)
        
        print("Dashboard ready!")
        
        return {
            'kpis': fig_kpis,
            'revenue_trend': fig_revenue,
            'channel_performance': fig_channels,
            'data': {
                'daily': daily_df,
                'channels': channel_df
            }
        }
    
    def export_dashboard_html(self, dashboard, filename='dashboard.html'):
        """
        Export dashboard to standalone HTML file
        """
        html_content = f"""
        <!DOCTYPE html>
        <html>
        <head>
            <title>Marketing Performance Dashboard</title>
            <script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
            <style>
                body {{ font-family: Arial, sans-serif; margin: 20px; }}
                h1 {{ color: #333; }}
                .chart {{ margin: 20px 0; }}
            </style>
        </head>
        <body>
            <h1>Marketing Performance Dashboard</h1>
            <div class="chart" id="kpis"></div>
            <div class="chart" id="revenue"></div>
            <div class="chart" id="channels"></div>
            <script>
                var kpis = {dashboard['kpis'].to_json()};
                var revenue = {dashboard['revenue_trend'].to_json()};
                var channels = {dashboard['channel_performance'].to_json()};
                
                Plotly.newPlot('kpis', kpis.data, kpis.layout);
                Plotly.newPlot('revenue', revenue.data, revenue.layout);
                Plotly.newPlot('channels', channels.data, channels.layout);
            </script>
        </body>
        </html>
        """
        
        with open(filename, 'w') as f:
            f.write(html_content)
        
        print(f"Dashboard exported to {filename}")

# Example usage
# pipeline = VisualizationPipeline(loader)
# dashboard = pipeline.create_executive_dashboard('marketing_events', lookback_days=30)
# dashboard['revenue_trend'].show()
# pipeline.export_dashboard_html(dashboard, 'my_dashboard.html')

## 7. Performance Optimization <a name="performance"></a>

In [None]:
"""
PERFORMANCE OPTIMIZATION TECHNIQUES

1. Database-Side Optimizations:
   - Always aggregate in Redshift, not in Pandas
   - Use materialized views for common queries
   - Add appropriate sort/dist keys
   - Use UNLOAD for large data exports

2. Data Transfer Optimizations:
   - Minimize data transferred from database
   - Use compression (Parquet)
   - Batch queries when possible
   - Implement connection pooling

3. Visualization Optimizations:
   - Limit data points (< 10k per chart)
   - Use WebGL for scatter plots
   - Implement progressive loading
   - Cache aggregated results

4. Dashboard Optimizations:
   - Load data asynchronously
   - Implement lazy loading
   - Use callbacks efficiently
   - Minimize re-renders
"""

class PerformanceOptimizer:
    """
    Tools for optimizing visualization performance
    """
    
    @staticmethod
    def downsample_timeseries(df, date_col, value_cols, target_points=500):
        """
        Intelligently downsample time series data
        Preserves trends while reducing data points
        """
        if len(df) <= target_points:
            return df
        
        # Calculate bin size
        bin_size = len(df) // target_points
        
        # Create bins
        df = df.copy()
        df['bin'] = df.index // bin_size
        
        # Aggregate by bin
        agg_dict = {date_col: 'first'}
        for col in value_cols:
            agg_dict[col] = 'mean'
        
        downsampled = df.groupby('bin').agg(agg_dict).reset_index(drop=True)
        
        print(f"Downsampled from {len(df):,} to {len(downsampled):,} points")
        return downsampled
    
    @staticmethod
    def create_materialized_view(loader, view_name, query):
        """
        Create materialized view in Redshift for faster access
        """
        create_query = f"""
        CREATE MATERIALIZED VIEW {view_name} AS
        {query}
        """
        
        # Drop if exists
        loader.query(f"DROP MATERIALIZED VIEW IF EXISTS {view_name}")
        
        # Create
        loader.query(create_query)
        
        print(f"Materialized view {view_name} created")
    
    @staticmethod
    def benchmark_query(loader, query, iterations=3):
        """
        Benchmark query performance
        """
        import time
        
        times = []
        for i in range(iterations):
            start = time.time()
            result = loader.query(query)
            end = time.time()
            times.append(end - start)
        
        print(f"Query performance:")
        print(f"  Avg time: {np.mean(times):.2f}s")
        print(f"  Min time: {np.min(times):.2f}s")
        print(f"  Max time: {np.max(times):.2f}s")
        print(f"  Rows returned: {len(result):,}")
        
        return result

# Example usage
# optimizer = PerformanceOptimizer()
# 
# # Create materialized view for daily metrics
# daily_metrics_query = '''
# SELECT 
#     DATE(timestamp) as date,
#     channel,
#     COUNT(*) as events,
#     SUM(revenue) as revenue
# FROM marketing_events
# GROUP BY DATE(timestamp), channel
# '''
# optimizer.create_materialized_view(loader, 'daily_channel_metrics', daily_metrics_query)

## 8. Executive Dashboard Templates <a name="templates"></a>

In [None]:
def create_executive_template():
    """
    Production-ready executive dashboard template
    Clean, professional design with key metrics
    """
    
    app = JupyterDash(__name__, external_stylesheets=[dbc.themes.LUX])
    
    app.layout = dbc.Container([
        # Header
        dbc.Row([
            dbc.Col([
                html.H1("Executive Marketing Dashboard", 
                       style={'color': '#2c3e50', 'fontWeight': 'bold'}),
                html.P(f"Last updated: {datetime.now().strftime('%Y-%m-%d %H:%M')}",
                      className="text-muted")
            ], width=8),
            dbc.Col([
                dbc.ButtonGroup([
                    dbc.Button("Today", id="btn-today", size="sm"),
                    dbc.Button("Week", id="btn-week", size="sm"),
                    dbc.Button("Month", id="btn-month", size="sm", active=True),
                    dbc.Button("Quarter", id="btn-quarter", size="sm"),
                ])
            ], width=4, className="text-end")
        ], className="mb-4", style={'borderBottom': '2px solid #ecf0f1', 'paddingBottom': '20px'}),
        
        # KPI Row
        dbc.Row([
            dbc.Col([
                dbc.Card([
                    dbc.CardBody([
                        html.P("Total Revenue", className="text-muted mb-2"),
                        html.H2("$1.2M", className="mb-0", style={'color': '#27ae60'}),
                        html.Small("↑ 12.5% vs last period", style={'color': '#27ae60'})
                    ])
                ], className="shadow-sm")
            ], width=3),
            
            dbc.Col([
                dbc.Card([
                    dbc.CardBody([
                        html.P("Conversions", className="text-muted mb-2"),
                        html.H2("24,567", className="mb-0", style={'color': '#3498db'}),
                        html.Small("↑ 8.3% vs last period", style={'color': '#27ae60'})
                    ])
                ], className="shadow-sm")
            ], width=3),
            
            dbc.Col([
                dbc.Card([
                    dbc.CardBody([
                        html.P("Conversion Rate", className="text-muted mb-2"),
                        html.H2("3.2%", className="mb-0", style={'color': '#e74c3c'}),
                        html.Small("↓ 0.2% vs last period", style={'color': '#e74c3c'})
                    ])
                ], className="shadow-sm")
            ], width=3),
            
            dbc.Col([
                dbc.Card([
                    dbc.CardBody([
                        html.P("ROI", className="text-muted mb-2"),
                        html.H2("245%", className="mb-0", style={'color': '#9b59b6'}),
                        html.Small("↑ 15.7% vs last period", style={'color': '#27ae60'})
                    ])
                ], className="shadow-sm")
            ], width=3),
        ], className="mb-4"),
        
        # Main Charts
        dbc.Row([
            dbc.Col([
                dbc.Card([
                    dbc.CardHeader("Revenue Trend"),
                    dbc.CardBody([
                        dcc.Graph(id="main-revenue-chart", config={'displayModeBar': False})
                    ])
                ], className="shadow-sm")
            ], width=8),
            
            dbc.Col([
                dbc.Card([
                    dbc.CardHeader("Channel Mix"),
                    dbc.CardBody([
                        dcc.Graph(id="channel-pie-chart", config={'displayModeBar': False})
                    ])
                ], className="shadow-sm")
            ], width=4),
        ], className="mb-4"),
        
        # Secondary Charts
        dbc.Row([
            dbc.Col([
                dbc.Card([
                    dbc.CardHeader("Channel Performance"),
                    dbc.CardBody([
                        dcc.Graph(id="channel-bar-chart", config={'displayModeBar': False})
                    ])
                ], className="shadow-sm")
            ], width=6),
            
            dbc.Col([
                dbc.Card([
                    dbc.CardHeader("Device Breakdown"),
                    dbc.CardBody([
                        dcc.Graph(id="device-chart", config={'displayModeBar': False})
                    ])
                ], className="shadow-sm")
            ], width=6),
        ])
    ], fluid=True, style={'backgroundColor': '#f8f9fa', 'padding': '20px'})
    
    # Add callbacks here for interactivity
    
    return app

# Create template
# app = create_executive_template()
# app.run_server(mode='inline', port=8051)

## 9. Real-World Project: Live Marketing Dashboard <a name="project"></a>

In [None]:
"""
PROJECT: Build Production Marketing Performance Dashboard

Requirements:
1. Connect to Redshift database with 10M+ marketing events
2. Create real-time dashboard updating every 60 seconds
3. Display key metrics:
   - Revenue (total, daily trend, YoY comparison)
   - Conversions (total, rate, funnel)
   - Channel performance (ROI, spend, revenue)
   - User engagement (sessions, bounce rate, time on site)
4. Enable filtering by:
   - Date range
   - Channel
   - Device
   - Country
5. Export capabilities (PDF, CSV)
6. Mobile-responsive design
7. < 3 second load time

Deliverables:
- Functional Dash application
- Deployment instructions
- Performance benchmarks
- User documentation
"""

# Your implementation here
# This is a complete project - students should build this from scratch
# using all the techniques learned in this notebook

class ProductionMarketingDashboard:
    """
    Production-ready marketing dashboard
    Student project template
    """
    
    def __init__(self, redshift_config):
        self.loader = DataLoader(redshift_config)
        self.app = None
    
    def build_app(self):
        """Build the Dash application"""
        # TODO: Implement dashboard
        pass
    
    def setup_callbacks(self):
        """Setup all dashboard callbacks"""
        # TODO: Implement callbacks
        pass
    
    def run(self, host='0.0.0.0', port=8050):
        """Run the dashboard"""
        # TODO: Implement run logic
        pass

# Usage:
# dashboard = ProductionMarketingDashboard(REDSHIFT_CONFIG)
# dashboard.build_app()
# dashboard.run()

## 10. Exercises <a name="exercises"></a>

### Exercise 1: Channel Performance Dashboard

**Task:** Create an interactive dashboard that:
1. Shows performance of all marketing channels
2. Allows filtering by date range
3. Displays metrics: Revenue, ROI, Conversions, CPA
4. Updates visualizations based on filters
5. Handles 1M+ events efficiently

In [None]:
# Your solution here


### Exercise 2: Real-Time Monitoring

**Task:** Build a real-time monitoring dashboard that:
1. Updates every 30 seconds
2. Shows current hour's performance
3. Alerts on anomalies (revenue drop > 20%)
4. Displays rolling 24-hour metrics

In [None]:
# Your solution here


### Exercise 3: Cohort Analysis Visualization

**Task:** Create cohort retention visualization:
1. Query cohort data from Redshift
2. Calculate retention rates
3. Create interactive heatmap
4. Add drill-down capabilities

In [None]:
# Your solution here


### Exercise 4: Performance Optimization

**Task:** Optimize a slow dashboard:
1. Benchmark current performance
2. Identify bottlenecks
3. Implement optimizations (caching, materialized views, etc.)
4. Measure improvements
5. Document optimizations

In [None]:
# Your solution here


## Summary

In this notebook, you learned:

1. **Visualization Best Practices**
   - Always aggregate before visualizing
   - Choose appropriate chart types
   - Optimize for large datasets

2. **Dashboard Development**
   - Building with Plotly and Dash
   - Real-time updates
   - Interactive filtering
   - Professional templates

3. **Performance Optimization**
   - Database-side aggregation
   - Caching strategies
   - Progressive loading
   - Materialized views

4. **Production Deployment**
   - Complete pipelines
   - Error handling
   - Export capabilities
   - Mobile responsiveness

## Next Steps

- Week 7: Advanced Statistics for Big Data
- Deploy your dashboard to production
- Add more advanced features
- Implement automated reporting

## Additional Resources

- [Plotly Documentation](https://plotly.com/python/)
- [Dash Documentation](https://dash.plotly.com/)
- [Redshift Performance Tuning](https://docs.aws.amazon.com/redshift/latest/dg/c-optimizing-query-performance.html)
- [Dashboard Design Best Practices](https://www.tableau.com/learn/articles/dashboard-design-principles)