# Week 8: Advanced A/B Testing Platform

## Overview
Build a production-ready, scalable A/B testing platform using Redshift, Python, and modern statistical methods.

## Learning Objectives
- Build scalable A/B testing infrastructure
- Implement multi-variant testing at scale
- Design sequential testing systems
- Create automated monitoring and alerting
- Generate automated reports
- Deploy production A/B testing pipelines

## Prerequisites
- Redshift cluster access
- Strong statistics background
- Understanding of A/B testing principles
- Python programming skills

## Table of Contents
1. [Setup and Environment](#setup)
2. [A/B Testing Platform Architecture](#architecture)
3. [Experiment Configuration System](#config)
4. [Multi-Variant Testing at Scale](#multivariant)
5. [Sequential Testing Implementation](#sequential)
6. [Automated Monitoring System](#monitoring)
7. [Automated Reporting](#reporting)
8. [Redshift-Based Analytics](#analytics)
9. [Real-World Project: Production Platform](#project)
10. [Exercises](#exercises)

## 1. Setup and Environment <a name="setup"></a>

In [None]:
# Install required packages
!pip install -q pandas numpy scipy statsmodels
!pip install -q redshift_connector sqlalchemy
!pip install -q plotly dash dash-bootstrap-components
!pip install -q pyyaml python-dateutil
!pip install -q apscheduler  # For scheduling
!pip install -q jinja2  # For report templates

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from scipy import stats
from scipy.stats import norm, beta
import statsmodels.api as sm
from statsmodels.stats.power import TTestIndPower
from statsmodels.stats.proportion import proportion_effectsize
import redshift_connector
import yaml
import json
from typing import Dict, List, Optional, Tuple
from dataclasses import dataclass, field, asdict
from enum import Enum
import warnings
warnings.filterwarnings('ignore')

# Visualization
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

print("✓ Libraries imported successfully")

### Database Connection

In [None]:
import os
from getpass import getpass

REDSHIFT_CONFIG = {
    'host': os.getenv('REDSHIFT_HOST', 'your-cluster.redshift.amazonaws.com'),
    'port': int(os.getenv('REDSHIFT_PORT', '5439')),
    'database': os.getenv('REDSHIFT_DB', 'marketing_db'),
    'user': os.getenv('REDSHIFT_USER', input('Redshift username: ')),
    'password': os.getenv('REDSHIFT_PASSWORD', getpass('Redshift password: '))
}

class DatabaseConnection:
    """Managed database connection"""
    
    def __init__(self, config):
        self.config = config
        self.conn = None
    
    def connect(self):
        if not self.conn:
            self.conn = redshift_connector.connect(**self.config)
        return self.conn
    
    def query(self, sql, params=None):
        """Execute query and return DataFrame"""
        conn = self.connect()
        cursor = conn.cursor()
        
        if params:
            cursor.execute(sql, params)
        else:
            cursor.execute(sql)
        
        result = cursor.fetchall()
        columns = [desc[0] for desc in cursor.description]
        
        return pd.DataFrame(result, columns=columns)
    
    def execute(self, sql, params=None):
        """Execute non-query SQL"""
        conn = self.connect()
        cursor = conn.cursor()
        
        if params:
            cursor.execute(sql, params)
        else:
            cursor.execute(sql)
        
        conn.commit()
    
    def close(self):
        if self.conn:
            self.conn.close()
            self.conn = None

db = DatabaseConnection(REDSHIFT_CONFIG)
print("✓ Database connection initialized")

## 2. A/B Testing Platform Architecture <a name="architecture"></a>

### Platform Components

```
┌─────────────────────────────────────────────────────────┐
│                  A/B Testing Platform                   │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  1. Experiment Configuration                           │
│     - Metadata storage                                 │
│     - Variant definitions                              │
│     - Success metrics                                  │
│                                                         │
│  2. Data Collection                                    │
│     - Event tracking                                   │
│     - Assignment logging                               │
│     - Metric calculation                               │
│                                                         │
│  3. Statistical Analysis                               │
│     - Frequentist tests                                │
│     - Bayesian analysis                                │
│     - Sequential testing                               │
│                                                         │
│  4. Monitoring & Alerting                              │
│     - Real-time dashboards                             │
│     - Anomaly detection                                │
│     - Automated alerts                                 │
│                                                         │
│  5. Reporting                                          │
│     - Automated reports                                │
│     - Visualizations                                   │
│     - Decision recommendations                         │
│                                                         │
└─────────────────────────────────────────────────────────┘
```

### Database Schema

In [None]:
def create_ab_testing_schema(db_conn):
    """
    Create database schema for A/B testing platform
    """
    
    # Experiments table
    db_conn.execute("""
    CREATE TABLE IF NOT EXISTS experiments (
        experiment_id VARCHAR(255) PRIMARY KEY,
        name VARCHAR(500),
        description TEXT,
        hypothesis TEXT,
        status VARCHAR(50),
        start_date TIMESTAMP,
        end_date TIMESTAMP,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        created_by VARCHAR(255),
        primary_metric VARCHAR(255),
        secondary_metrics TEXT,  -- JSON array
        target_sample_size INTEGER,
        min_detectable_effect FLOAT,
        confidence_level FLOAT DEFAULT 0.95,
        power FLOAT DEFAULT 0.80
    )
    DISTSTYLE ALL
    """)
    
    # Variants table
    db_conn.execute("""
    CREATE TABLE IF NOT EXISTS experiment_variants (
        variant_id VARCHAR(255) PRIMARY KEY,
        experiment_id VARCHAR(255),
        variant_name VARCHAR(255),
        traffic_allocation FLOAT,
        is_control BOOLEAN,
        configuration TEXT,  -- JSON
        FOREIGN KEY (experiment_id) REFERENCES experiments(experiment_id)
    )
    DISTSTYLE ALL
    """)
    
    # Assignments table
    db_conn.execute("""
    CREATE TABLE IF NOT EXISTS experiment_assignments (
        assignment_id BIGINT IDENTITY(1,1),
        experiment_id VARCHAR(255),
        variant_id VARCHAR(255),
        user_id VARCHAR(255),
        session_id VARCHAR(255),
        assigned_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        PRIMARY KEY (assignment_id),
        FOREIGN KEY (experiment_id) REFERENCES experiments(experiment_id),
        FOREIGN KEY (variant_id) REFERENCES experiment_variants(variant_id)
    )
    DISTKEY (user_id)
    SORTKEY (assigned_at, experiment_id)
    """)
    
    # Events table
    db_conn.execute("""
    CREATE TABLE IF NOT EXISTS experiment_events (
        event_id BIGINT IDENTITY(1,1),
        experiment_id VARCHAR(255),
        variant_id VARCHAR(255),
        user_id VARCHAR(255),
        session_id VARCHAR(255),
        event_type VARCHAR(255),
        event_value FLOAT,
        event_metadata TEXT,  -- JSON
        occurred_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        PRIMARY KEY (event_id)
    )
    DISTKEY (user_id)
    SORTKEY (occurred_at, experiment_id)
    """)
    
    # Results table (cached analysis results)
    db_conn.execute("""
    CREATE TABLE IF NOT EXISTS experiment_results (
        result_id BIGINT IDENTITY(1,1),
        experiment_id VARCHAR(255),
        analysis_date DATE,
        variant_id VARCHAR(255),
        metric_name VARCHAR(255),
        sample_size INTEGER,
        mean_value FLOAT,
        std_dev FLOAT,
        ci_lower FLOAT,
        ci_upper FLOAT,
        PRIMARY KEY (result_id)
    )
    DISTSTYLE ALL
    """)
    
    print("✓ A/B testing schema created")

# create_ab_testing_schema(db)

## 3. Experiment Configuration System <a name="config"></a>

In [None]:
class ExperimentStatus(Enum):
    """Experiment lifecycle states"""
    DRAFT = "draft"
    ACTIVE = "active"
    PAUSED = "paused"
    COMPLETED = "completed"
    ARCHIVED = "archived"

@dataclass
class Variant:
    """Experiment variant configuration"""
    variant_id: str
    name: str
    traffic_allocation: float
    is_control: bool = False
    configuration: Dict = field(default_factory=dict)
    
    def __post_init__(self):
        if not 0 <= self.traffic_allocation <= 1:
            raise ValueError("Traffic allocation must be between 0 and 1")

@dataclass
class Metric:
    """Experiment metric definition"""
    name: str
    metric_type: str  # 'conversion', 'revenue', 'engagement'
    aggregation: str  # 'mean', 'sum', 'rate'
    is_primary: bool = False
    
@dataclass
class Experiment:
    """Complete experiment configuration"""
    experiment_id: str
    name: str
    hypothesis: str
    variants: List[Variant]
    primary_metric: Metric
    secondary_metrics: List[Metric] = field(default_factory=list)
    status: ExperimentStatus = ExperimentStatus.DRAFT
    start_date: Optional[datetime] = None
    end_date: Optional[datetime] = None
    target_sample_size: int = 10000
    min_detectable_effect: float = 0.05
    confidence_level: float = 0.95
    power: float = 0.80
    created_by: str = "system"
    description: str = ""
    
    def __post_init__(self):
        # Validate traffic allocation sums to 1
        total_traffic = sum(v.traffic_allocation for v in self.variants)
        if not np.isclose(total_traffic, 1.0):
            raise ValueError(f"Traffic allocations must sum to 1, got {total_traffic}")
        
        # Ensure exactly one control
        controls = sum(1 for v in self.variants if v.is_control)
        if controls != 1:
            raise ValueError("Must have exactly one control variant")
    
    def to_dict(self):
        """Convert to dictionary for storage"""
        return {
            'experiment_id': self.experiment_id,
            'name': self.name,
            'description': self.description,
            'hypothesis': self.hypothesis,
            'status': self.status.value,
            'start_date': self.start_date,
            'end_date': self.end_date,
            'created_by': self.created_by,
            'primary_metric': self.primary_metric.name,
            'secondary_metrics': json.dumps([m.name for m in self.secondary_metrics]),
            'target_sample_size': self.target_sample_size,
            'min_detectable_effect': self.min_detectable_effect,
            'confidence_level': self.confidence_level,
            'power': self.power
        }

class ExperimentManager:
    """
    Manages experiment lifecycle and configuration
    """
    
    def __init__(self, db_conn):
        self.db = db_conn
    
    def create_experiment(self, experiment: Experiment) -> str:
        """
        Create new experiment in database
        """
        # Insert experiment
        exp_data = experiment.to_dict()
        
        cols = ', '.join(exp_data.keys())
        placeholders = ', '.join(['%s'] * len(exp_data))
        
        sql = f"""
        INSERT INTO experiments ({cols})
        VALUES ({placeholders})
        """
        
        self.db.execute(sql, list(exp_data.values()))
        
        # Insert variants
        for variant in experiment.variants:
            self.db.execute("""
            INSERT INTO experiment_variants 
            (variant_id, experiment_id, variant_name, traffic_allocation, 
             is_control, configuration)
            VALUES (%s, %s, %s, %s, %s, %s)
            """, (
                variant.variant_id,
                experiment.experiment_id,
                variant.name,
                variant.traffic_allocation,
                variant.is_control,
                json.dumps(variant.configuration)
            ))
        
        print(f"✓ Experiment {experiment.experiment_id} created")
        return experiment.experiment_id
    
    def get_experiment(self, experiment_id: str) -> Experiment:
        """
        Load experiment configuration
        """
        # Load experiment
        exp_df = self.db.query("""
        SELECT * FROM experiments WHERE experiment_id = %s
        """, (experiment_id,))
        
        if exp_df.empty:
            raise ValueError(f"Experiment {experiment_id} not found")
        
        # Load variants
        var_df = self.db.query("""
        SELECT * FROM experiment_variants WHERE experiment_id = %s
        """, (experiment_id,))
        
        # Reconstruct experiment object
        # (Implementation details omitted for brevity)
        
        return None  # Would return reconstructed Experiment object
    
    def start_experiment(self, experiment_id: str):
        """
        Activate experiment
        """
        self.db.execute("""
        UPDATE experiments
        SET status = %s, start_date = CURRENT_TIMESTAMP
        WHERE experiment_id = %s
        """, (ExperimentStatus.ACTIVE.value, experiment_id))
        
        print(f"✓ Experiment {experiment_id} started")
    
    def stop_experiment(self, experiment_id: str):
        """
        Complete experiment
        """
        self.db.execute("""
        UPDATE experiments
        SET status = %s, end_date = CURRENT_TIMESTAMP
        WHERE experiment_id = %s
        """, (ExperimentStatus.COMPLETED.value, experiment_id))
        
        print(f"✓ Experiment {experiment_id} stopped")
    
    def list_active_experiments(self) -> pd.DataFrame:
        """
        Get all active experiments
        """
        return self.db.query("""
        SELECT 
            experiment_id,
            name,
            status,
            start_date,
            primary_metric,
            target_sample_size
        FROM experiments
        WHERE status = 'active'
        ORDER BY start_date DESC
        """)

# Example usage
# manager = ExperimentManager(db)
# 
# # Create experiment
# experiment = Experiment(
#     experiment_id='exp_001',
#     name='Homepage CTA Test',
#     hypothesis='Changing CTA color to green will increase conversions',
#     variants=[
#         Variant('control', 'Blue CTA (Control)', 0.5, is_control=True),
#         Variant('treatment', 'Green CTA', 0.5)
#     ],
#     primary_metric=Metric('conversion_rate', 'conversion', 'rate', is_primary=True),
#     target_sample_size=10000,
#     min_detectable_effect=0.05
# )
# 
# manager.create_experiment(experiment)

## 4. Multi-Variant Testing at Scale <a name="multivariant"></a>

In [None]:
class MultiVariantAnalyzer:
    """
    Statistical analysis for multi-variant experiments
    Handles A/B/n testing with multiple comparisons correction
    """
    
    def __init__(self, db_conn):
        self.db = db_conn
    
    def get_variant_metrics(self, experiment_id: str, metric_name: str) -> pd.DataFrame:
        """
        Get aggregated metrics for all variants
        """
        query = f"""
        WITH variant_data AS (
            SELECT 
                ev.variant_id,
                v.variant_name,
                v.is_control,
                COUNT(DISTINCT ea.user_id) as users,
                COUNT(*) as events,
                AVG(ev.event_value) as mean_value,
                STDDEV(ev.event_value) as std_value,
                SUM(ev.event_value) as total_value
            FROM experiment_events ev
            JOIN experiment_assignments ea 
                ON ev.experiment_id = ea.experiment_id 
                AND ev.user_id = ea.user_id
            JOIN experiment_variants v
                ON ev.variant_id = v.variant_id
            WHERE ev.experiment_id = %s
                AND ev.event_type = %s
            GROUP BY ev.variant_id, v.variant_name, v.is_control
        )
        SELECT * FROM variant_data
        ORDER BY is_control DESC, variant_name
        """
        
        return self.db.query(query, (experiment_id, metric_name))
    
    def pairwise_comparisons(self, experiment_id: str, metric_name: str,
                            correction='bonferroni') -> List[Dict]:
        """
        Perform pairwise comparisons between all variants
        
        Args:
            correction: 'bonferroni', 'holm', or 'none'
        """
        # Get variant metrics
        df = self.get_variant_metrics(experiment_id, metric_name)
        
        # Identify control
        control = df[df['is_control'] == True].iloc[0]
        treatments = df[df['is_control'] == False]
        
        # Number of comparisons
        n_comparisons = len(treatments)
        
        results = []
        
        for idx, treatment in treatments.iterrows():
            # T-test
            # Note: In production, would load actual data for test
            # Here we use summary statistics
            
            # Calculate t-statistic
            mean_diff = treatment['mean_value'] - control['mean_value']
            se = np.sqrt(
                control['std_value']**2 / control['users'] +
                treatment['std_value']**2 / treatment['users']
            )
            t_stat = mean_diff / se
            
            # Degrees of freedom (Welch's t-test)
            df_welch = (
                (control['std_value']**2/control['users'] + 
                 treatment['std_value']**2/treatment['users'])**2 /
                (
                    (control['std_value']**2/control['users'])**2/(control['users']-1) +
                    (treatment['std_value']**2/treatment['users'])**2/(treatment['users']-1)
                )
            )
            
            # P-value
            p_value = 2 * (1 - stats.t.cdf(abs(t_stat), df_welch))
            
            # Effect size (Cohen's d)
            pooled_std = np.sqrt(
                (control['std_value']**2 + treatment['std_value']**2) / 2
            )
            cohens_d = mean_diff / pooled_std
            
            # Confidence interval
            ci_95 = (
                mean_diff - 1.96 * se,
                mean_diff + 1.96 * se
            )
            
            # Relative lift
            relative_lift = mean_diff / control['mean_value'] * 100
            
            results.append({
                'control': control['variant_name'],
                'treatment': treatment['variant_name'],
                'control_mean': control['mean_value'],
                'treatment_mean': treatment['mean_value'],
                'mean_diff': mean_diff,
                'relative_lift_pct': relative_lift,
                'ci_95_lower': ci_95[0],
                'ci_95_upper': ci_95[1],
                'p_value': p_value,
                'cohens_d': cohens_d,
                't_statistic': t_stat
            })
        
        # Apply multiple testing correction
        if correction == 'bonferroni':
            adjusted_alpha = 0.05 / n_comparisons
            for r in results:
                r['adjusted_p_value'] = min(r['p_value'] * n_comparisons, 1.0)
                r['significant_adjusted'] = r['adjusted_p_value'] < 0.05
        
        elif correction == 'holm':
            # Sort by p-value
            results_sorted = sorted(results, key=lambda x: x['p_value'])
            for i, r in enumerate(results_sorted, 1):
                adjusted_alpha = 0.05 / (n_comparisons - i + 1)
                r['adjusted_alpha'] = adjusted_alpha
                r['significant_adjusted'] = r['p_value'] < adjusted_alpha
        
        return results
    
    def visualize_multivariant_results(self, results: List[Dict]):
        """
        Visualize multi-variant test results
        """
        df_results = pd.DataFrame(results)
        
        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=(
                'Relative Lift vs Control',
                'Effect Sizes (Cohen\'s d)',
                'P-values',
                'Confidence Intervals'
            )
        )
        
        # 1. Relative lift
        fig.add_trace(
            go.Bar(
                x=df_results['treatment'],
                y=df_results['relative_lift_pct'],
                name='Lift %',
                marker_color=['green' if x > 0 else 'red' 
                             for x in df_results['relative_lift_pct']]
            ),
            row=1, col=1
        )
        
        # 2. Effect sizes
        fig.add_trace(
            go.Bar(
                x=df_results['treatment'],
                y=df_results['cohens_d'],
                name="Cohen's d",
                marker_color='blue'
            ),
            row=1, col=2
        )
        
        # 3. P-values
        fig.add_trace(
            go.Bar(
                x=df_results['treatment'],
                y=df_results['p_value'],
                name='P-value',
                marker_color=['green' if x < 0.05 else 'red' 
                             for x in df_results['p_value']]
            ),
            row=2, col=1
        )
        fig.add_hline(y=0.05, line_dash="dash", row=2, col=1,
                     annotation_text="α=0.05")
        
        # 4. Confidence intervals
        for i, row in df_results.iterrows():
            fig.add_trace(
                go.Scatter(
                    x=[row['treatment'], row['treatment']],
                    y=[row['ci_95_lower'], row['ci_95_upper']],
                    mode='lines+markers',
                    name=row['treatment'],
                    showlegend=False
                ),
                row=2, col=2
            )
        fig.add_hline(y=0, line_dash="dash", row=2, col=2)
        
        fig.update_layout(
            title='Multi-Variant Test Results',
            height=800,
            showlegend=False
        )
        
        fig.show()

# Example usage
# mva = MultiVariantAnalyzer(db)
# results = mva.pairwise_comparisons('exp_001', 'conversion', correction='bonferroni')
# mva.visualize_multivariant_results(results)

## 5. Sequential Testing Implementation <a name="sequential"></a>

In [None]:
class SequentialTester:
    """
    Sequential testing framework
    Allows continuous monitoring with controlled error rates
    """
    
    def __init__(self, alpha=0.05, power=0.80, max_n=100000):
        self.alpha = alpha
        self.power = power
        self.max_n = max_n
        self.boundaries = self._calculate_boundaries()
    
    def _calculate_boundaries(self, n_looks=20):
        """
        Calculate sequential testing boundaries (O'Brien-Fleming)
        """
        # Information fractions
        info_fractions = np.linspace(0.05, 1.0, n_looks)
        
        # O'Brien-Fleming spending function
        z_alpha = stats.norm.ppf(1 - self.alpha/2)
        
        boundaries = []
        for t in info_fractions:
            # Boundary for this information fraction
            z_boundary = z_alpha / np.sqrt(t)
            p_boundary = 2 * (1 - stats.norm.cdf(z_boundary))
            
            boundaries.append({
                'info_fraction': t,
                'n': int(t * self.max_n),
                'z_boundary': z_boundary,
                'p_boundary': p_boundary
            })
        
        return pd.DataFrame(boundaries)
    
    def check_stopping_condition(self, n_observed, p_value, z_stat):
        """
        Check if experiment can be stopped
        
        Returns:
            dict with decision and reasoning
        """
        # Find closest boundary
        info_fraction = n_observed / self.max_n
        
        # Get appropriate boundary
        boundary = self.boundaries[
            self.boundaries['info_fraction'] >= info_fraction
        ].iloc[0]
        
        # Check if we can stop
        can_stop = abs(z_stat) >= boundary['z_boundary']
        
        decision = {
            'can_stop': can_stop,
            'n_observed': n_observed,
            'info_fraction': info_fraction,
            'p_value': p_value,
            'z_stat': z_stat,
            'z_boundary': boundary['z_boundary'],
            'p_boundary': boundary['p_boundary'],
            'reason': self._get_reason(can_stop, z_stat, boundary)
        }
        
        return decision
    
    def _get_reason(self, can_stop, z_stat, boundary):
        """Generate human-readable decision reason"""
        if can_stop:
            direction = "positive" if z_stat > 0 else "negative"
            return f"Significant {direction} effect detected. "\
                   f"Z-statistic ({z_stat:.3f}) exceeds boundary ({boundary['z_boundary']:.3f})."
        else:
            return f"Continue testing. "\
                   f"Z-statistic ({z_stat:.3f}) has not crossed boundary ({boundary['z_boundary']:.3f})."
    
    def visualize_sequential_test(self, test_history: pd.DataFrame):
        """
        Visualize sequential test progress
        
        Args:
            test_history: DataFrame with columns [n, z_stat, p_value]
        """
        fig = go.Figure()
        
        # Plot boundaries
        fig.add_trace(go.Scatter(
            x=self.boundaries['n'],
            y=self.boundaries['z_boundary'],
            mode='lines',
            name='Upper Boundary',
            line=dict(color='red', dash='dash')
        ))
        
        fig.add_trace(go.Scatter(
            x=self.boundaries['n'],
            y=-self.boundaries['z_boundary'],
            mode='lines',
            name='Lower Boundary',
            line=dict(color='red', dash='dash')
        ))
        
        # Plot test statistic over time
        fig.add_trace(go.Scatter(
            x=test_history['n'],
            y=test_history['z_stat'],
            mode='lines+markers',
            name='Z-statistic',
            line=dict(color='blue', width=2)
        ))
        
        fig.update_layout(
            title='Sequential Testing Progress',
            xaxis_title='Sample Size',
            yaxis_title='Z-statistic',
            hovermode='x unified',
            height=500
        )
        
        fig.show()

# Example usage
# seq_tester = SequentialTester(alpha=0.05, power=0.80, max_n=100000)
# 
# # Check at current sample size
# decision = seq_tester.check_stopping_condition(
#     n_observed=25000,
#     p_value=0.03,
#     z_stat=2.17
# )
# 
# print(f"Can stop: {decision['can_stop']}")
# print(f"Reason: {decision['reason']}")

## 6. Automated Monitoring System <a name="monitoring"></a>

In [None]:
class ExperimentMonitor:
    """
    Automated monitoring and alerting for running experiments
    """
    
    def __init__(self, db_conn):
        self.db = db_conn
        self.alerts = []
    
    def check_all_experiments(self) -> List[Dict]:
        """
        Run health checks on all active experiments
        """
        # Get active experiments
        experiments = self.db.query("""
        SELECT experiment_id, name, start_date, target_sample_size
        FROM experiments
        WHERE status = 'active'
        """)
        
        all_checks = []
        
        for _, exp in experiments.iterrows():
            checks = self.check_experiment(exp['experiment_id'])
            checks['experiment_name'] = exp['name']
            all_checks.append(checks)
        
        return all_checks
    
    def check_experiment(self, experiment_id: str) -> Dict:
        """
        Comprehensive health check for single experiment
        """
        checks = {
            'experiment_id': experiment_id,
            'timestamp': datetime.now(),
            'issues': [],
            'warnings': [],
            'health_score': 100
        }
        
        # 1. Check sample size
        sample_check = self._check_sample_size(experiment_id)
        if sample_check['has_issue']:
            checks['issues'].append(sample_check)
            checks['health_score'] -= 20
        
        # 2. Check traffic distribution
        traffic_check = self._check_traffic_distribution(experiment_id)
        if traffic_check['has_issue']:
            checks['issues'].append(traffic_check)
            checks['health_score'] -= 15
        
        # 3. Check metric anomalies
        anomaly_check = self._check_metric_anomalies(experiment_id)
        if anomaly_check['has_issue']:
            checks['warnings'].append(anomaly_check)
            checks['health_score'] -= 10
        
        # 4. Check SRM (Sample Ratio Mismatch)
        srm_check = self._check_srm(experiment_id)
        if srm_check['has_issue']:
            checks['issues'].append(srm_check)
            checks['health_score'] -= 25
        
        # 5. Check data freshness
        freshness_check = self._check_data_freshness(experiment_id)
        if freshness_check['has_issue']:
            checks['warnings'].append(freshness_check)
            checks['health_score'] -= 5
        
        return checks
    
    def _check_sample_size(self, experiment_id: str) -> Dict:
        """Check if sample size is on track"""
        
        # Get target and actual
        result = self.db.query("""
        SELECT 
            e.target_sample_size,
            DATEDIFF(day, e.start_date, CURRENT_DATE) as days_running,
            COUNT(DISTINCT ea.user_id) as current_sample
        FROM experiments e
        LEFT JOIN experiment_assignments ea ON e.experiment_id = ea.experiment_id
        WHERE e.experiment_id = %s
        GROUP BY e.target_sample_size, e.start_date
        """, (experiment_id,))
        
        if result.empty:
            return {'has_issue': True, 'message': 'No data found'}
        
        row = result.iloc[0]
        
        # Check if on pace
        expected_rate = row['target_sample_size'] / 14  # Assume 2-week test
        actual_rate = row['current_sample'] / max(row['days_running'], 1)
        
        if actual_rate < expected_rate * 0.5:
            return {
                'has_issue': True,
                'type': 'sample_size',
                'message': f'Sample size growing too slowly. '
                          f'Current: {row["current_sample"]}, '
                          f'Target: {row["target_sample_size"]}'
            }
        
        return {'has_issue': False}
    
    def _check_traffic_distribution(self, experiment_id: str) -> Dict:
        """Check if traffic is distributed as expected"""
        
        result = self.db.query("""
        SELECT 
            v.variant_id,
            v.traffic_allocation as expected,
            COUNT(DISTINCT ea.user_id) * 1.0 / 
                SUM(COUNT(DISTINCT ea.user_id)) OVER () as actual
        FROM experiment_variants v
        LEFT JOIN experiment_assignments ea 
            ON v.variant_id = ea.variant_id
        WHERE v.experiment_id = %s
        GROUP BY v.variant_id, v.traffic_allocation
        """, (experiment_id,))
        
        # Check if any variant deviates > 5%
        max_deviation = (result['actual'] - result['expected']).abs().max()
        
        if max_deviation > 0.05:
            return {
                'has_issue': True,
                'type': 'traffic_distribution',
                'message': f'Traffic distribution off by {max_deviation*100:.1f}%'
            }
        
        return {'has_issue': False}
    
    def _check_srm(self, experiment_id: str) -> Dict:
        """
        Check for Sample Ratio Mismatch
        Critical data quality issue
        """
        
        result = self.db.query("""
        SELECT 
            v.variant_id,
            v.traffic_allocation,
            COUNT(DISTINCT ea.user_id) as observed
        FROM experiment_variants v
        LEFT JOIN experiment_assignments ea 
            ON v.variant_id = ea.variant_id
        WHERE v.experiment_id = %s
        GROUP BY v.variant_id, v.traffic_allocation
        """, (experiment_id,))
        
        # Chi-square test
        total = result['observed'].sum()
        expected = result['traffic_allocation'] * total
        observed = result['observed']
        
        chi2_stat = ((observed - expected)**2 / expected).sum()
        dof = len(result) - 1
        p_value = 1 - stats.chi2.cdf(chi2_stat, dof)
        
        if p_value < 0.001:  # Strict threshold for SRM
            return {
                'has_issue': True,
                'type': 'srm',
                'message': f'Sample Ratio Mismatch detected (p={p_value:.6f}). '
                          f'Investigate randomization!',
                'severity': 'critical'
            }
        
        return {'has_issue': False}
    
    def _check_metric_anomalies(self, experiment_id: str) -> Dict:
        """Check for unusual metric values"""
        # Implementation would check for outliers, sudden drops, etc.
        return {'has_issue': False}
    
    def _check_data_freshness(self, experiment_id: str) -> Dict:
        """Check if data is up to date"""
        
        result = self.db.query("""
        SELECT MAX(occurred_at) as latest_event
        FROM experiment_events
        WHERE experiment_id = %s
        """, (experiment_id,))
        
        if result.empty or result['latest_event'].iloc[0] is None:
            return {'has_issue': True, 'message': 'No recent events'}
        
        latest = pd.to_datetime(result['latest_event'].iloc[0])
        hours_old = (datetime.now() - latest).total_seconds() / 3600
        
        if hours_old > 6:
            return {
                'has_issue': True,
                'type': 'data_freshness',
                'message': f'No events in {hours_old:.1f} hours'
            }
        
        return {'has_issue': False}
    
    def send_alert(self, check_result: Dict):
        """
        Send alert for issues
        In production: email, Slack, PagerDuty, etc.
        """
        if check_result['health_score'] < 80:
            print(f"⚠️  ALERT: {check_result['experiment_name']}")
            print(f"   Health Score: {check_result['health_score']}/100")
            for issue in check_result['issues']:
                print(f"   - {issue['message']}")

# Example usage
# monitor = ExperimentMonitor(db)
# checks = monitor.check_all_experiments()
# for check in checks:
#     monitor.send_alert(check)

## 7. Automated Reporting <a name="reporting"></a>

In [None]:
class ExperimentReporter:
    """
    Generate automated experiment reports
    """
    
    def __init__(self, db_conn):
        self.db = db_conn
    
    def generate_report(self, experiment_id: str) -> Dict:
        """
        Generate comprehensive experiment report
        """
        report = {
            'experiment_id': experiment_id,
            'generated_at': datetime.now(),
            'metadata': self._get_metadata(experiment_id),
            'summary': self._get_summary(experiment_id),
            'variant_performance': self._get_variant_performance(experiment_id),
            'statistical_tests': self._run_statistical_tests(experiment_id),
            'visualizations': self._create_visualizations(experiment_id),
            'recommendation': self._generate_recommendation(experiment_id)
        }
        
        return report
    
    def _get_metadata(self, experiment_id: str) -> Dict:
        """Get experiment configuration"""
        result = self.db.query("""
        SELECT * FROM experiments WHERE experiment_id = %s
        """, (experiment_id,))
        
        return result.to_dict('records')[0] if not result.empty else {}
    
    def _get_summary(self, experiment_id: str) -> Dict:
        """Get high-level summary statistics"""
        result = self.db.query("""
        SELECT 
            COUNT(DISTINCT user_id) as total_users,
            COUNT(DISTINCT session_id) as total_sessions,
            COUNT(*) as total_events,
            MIN(assigned_at) as first_assignment,
            MAX(assigned_at) as last_assignment
        FROM experiment_assignments
        WHERE experiment_id = %s
        """, (experiment_id,))
        
        return result.to_dict('records')[0] if not result.empty else {}
    
    def _get_variant_performance(self, experiment_id: str) -> pd.DataFrame:
        """Get detailed variant metrics"""
        return self.db.query("""
        SELECT 
            v.variant_name,
            v.is_control,
            COUNT(DISTINCT ea.user_id) as users,
            COUNT(DISTINCT CASE WHEN ee.event_type = 'conversion' 
                  THEN ea.user_id END) as conversions,
            AVG(CASE WHEN ee.event_type = 'revenue' 
                THEN ee.event_value END) as avg_revenue,
            SUM(CASE WHEN ee.event_type = 'revenue' 
                THEN ee.event_value ELSE 0 END) as total_revenue
        FROM experiment_variants v
        JOIN experiment_assignments ea ON v.variant_id = ea.variant_id
        LEFT JOIN experiment_events ee 
            ON ea.user_id = ee.user_id 
            AND ea.experiment_id = ee.experiment_id
        WHERE v.experiment_id = %s
        GROUP BY v.variant_name, v.is_control
        """, (experiment_id,))
    
    def _run_statistical_tests(self, experiment_id: str) -> Dict:
        """Run statistical analysis"""
        # Use MultiVariantAnalyzer
        mva = MultiVariantAnalyzer(self.db)
        results = mva.pairwise_comparisons(experiment_id, 'conversion')
        
        return {
            'pairwise_comparisons': results
        }
    
    def _create_visualizations(self, experiment_id: str) -> Dict:
        """Create report visualizations"""
        # Implementation would create and save plots
        return {}
    
    def _generate_recommendation(self, experiment_id: str) -> Dict:
        """
        Generate data-driven recommendation
        """
        # Get test results
        tests = self._run_statistical_tests(experiment_id)
        
        # Simple decision logic
        best_variant = None
        max_lift = 0
        
        for result in tests['pairwise_comparisons']:
            if result['p_value'] < 0.05 and result['relative_lift_pct'] > max_lift:
                best_variant = result['treatment']
                max_lift = result['relative_lift_pct']
        
        if best_variant:
            recommendation = {
                'decision': 'ship_treatment',
                'variant': best_variant,
                'confidence': 'high',
                'expected_lift': max_lift,
                'reasoning': f'{best_variant} shows {max_lift:.1f}% lift with statistical significance'
            }
        else:
            recommendation = {
                'decision': 'keep_control',
                'confidence': 'medium',
                'reasoning': 'No treatment variant shows significant improvement'
            }
        
        return recommendation
    
    def export_to_html(self, report: Dict, filepath: str):
        """
        Export report to HTML
        """
        html_template = """
        <!DOCTYPE html>
        <html>
        <head>
            <title>Experiment Report: {experiment_id}</title>
            <style>
                body {{ font-family: Arial, sans-serif; margin: 40px; }}
                h1 {{ color: #2c3e50; }}
                .metric {{ padding: 20px; background: #ecf0f1; margin: 10px 0; }}
                .recommendation {{ padding: 20px; background: #2ecc71; color: white; }}
                table {{ border-collapse: collapse; width: 100%; }}
                th, td {{ border: 1px solid #ddd; padding: 8px; text-align: left; }}
                th {{ background-color: #3498db; color: white; }}
            </style>
        </head>
        <body>
            <h1>Experiment Report</h1>
            <h2>{name}</h2>
            <p><strong>Generated:</strong> {generated_at}</p>
            
            <h3>Summary</h3>
            <div class="metric">
                <p>Total Users: {total_users:,}</p>
                <p>Total Events: {total_events:,}</p>
            </div>
            
            <h3>Recommendation</h3>
            <div class="recommendation">
                <h4>{decision}</h4>
                <p>{reasoning}</p>
            </div>
        </body>
        </html>
        """
        
        # Format template
        html = html_template.format(
            experiment_id=report['experiment_id'],
            name=report['metadata'].get('name', 'Unknown'),
            generated_at=report['generated_at'].strftime('%Y-%m-%d %H:%M:%S'),
            total_users=report['summary'].get('total_users', 0),
            total_events=report['summary'].get('total_events', 0),
            decision=report['recommendation']['decision'].upper(),
            reasoning=report['recommendation']['reasoning']
        )
        
        # Write file
        with open(filepath, 'w') as f:
            f.write(html)
        
        print(f"✓ Report exported to {filepath}")

# Example usage
# reporter = ExperimentReporter(db)
# report = reporter.generate_report('exp_001')
# reporter.export_to_html(report, 'experiment_report.html')

## 8. Real-World Project: Production A/B Testing Pipeline <a name="project"></a>

In [None]:
"""
PROJECT: Build Production A/B Testing Platform

Requirements:
1. Complete experiment lifecycle management
2. Automated daily analysis and reporting
3. Real-time monitoring with alerts
4. Sequential testing support
5. Multi-variant testing
6. Data quality checks (SRM, etc.)
7. Automated decision recommendations
8. Integration with Redshift

Deliverables:
- Working platform code
- Database schema
- Monitoring dashboard
- Automated reports
- Documentation
- Deployment guide
"""

class ABTestingPlatform:
    """
    Complete A/B testing platform
    Integrates all components
    """
    
    def __init__(self, db_config: Dict):
        self.db = DatabaseConnection(db_config)
        self.manager = ExperimentManager(self.db)
        self.analyzer = MultiVariantAnalyzer(self.db)
        self.monitor = ExperimentMonitor(self.db)
        self.reporter = ExperimentReporter(self.db)
        self.seq_tester = SequentialTester()
    
    def initialize_platform(self):
        """Set up platform infrastructure"""
        print("Initializing A/B testing platform...")
        create_ab_testing_schema(self.db)
        print("✓ Platform initialized")
    
    def daily_analysis_job(self):
        """
        Daily automated analysis job
        Run via cron/airflow
        """
        print(f"Starting daily analysis: {datetime.now()}")
        
        # Get active experiments
        active_experiments = self.manager.list_active_experiments()
        
        for _, exp in active_experiments.iterrows():
            exp_id = exp['experiment_id']
            
            print(f"\nAnalyzing: {exp['name']}")
            
            # Run health checks
            health = self.monitor.check_experiment(exp_id)
            
            if health['health_score'] < 80:
                self.monitor.send_alert(health)
            
            # Run statistical analysis
            results = self.analyzer.pairwise_comparisons(exp_id, 'conversion')
            
            # Check if can stop early
            if results:
                decision = self.seq_tester.check_stopping_condition(
                    n_observed=health.get('current_sample', 0),
                    p_value=results[0]['p_value'],
                    z_stat=results[0]['t_statistic']
                )
                
                if decision['can_stop']:
                    print(f"  → Can stop early: {decision['reason']}")
            
            # Generate report
            report = self.reporter.generate_report(exp_id)
            self.reporter.export_to_html(
                report, 
                f"reports/{exp_id}_{datetime.now().strftime('%Y%m%d')}.html"
            )
        
        print("\n✓ Daily analysis complete")
    
    def create_and_launch_experiment(
        self, 
        name: str,
        hypothesis: str,
        variants: List[Variant],
        primary_metric: Metric,
        **kwargs
    ) -> str:
        """
        End-to-end experiment creation and launch
        """
        # Create experiment object
        experiment = Experiment(
            experiment_id=f"exp_{datetime.now().strftime('%Y%m%d%H%M%S')}",
            name=name,
            hypothesis=hypothesis,
            variants=variants,
            primary_metric=primary_metric,
            **kwargs
        )
        
        # Create in database
        exp_id = self.manager.create_experiment(experiment)
        
        # Start experiment
        self.manager.start_experiment(exp_id)
        
        print(f"✓ Experiment {exp_id} launched")
        return exp_id

# Example usage
# platform = ABTestingPlatform(REDSHIFT_CONFIG)
# platform.initialize_platform()
# 
# # Launch experiment
# exp_id = platform.create_and_launch_experiment(
#     name='CTA Color Test',
#     hypothesis='Green CTA increases conversions',
#     variants=[
#         Variant('control', 'Blue', 0.5, is_control=True),
#         Variant('treatment', 'Green', 0.5)
#     ],
#     primary_metric=Metric('conversion', 'conversion', 'rate', is_primary=True)
# )
# 
# # Run daily analysis
# platform.daily_analysis_job()

## 9. Exercises <a name="exercises"></a>

### Exercise 1: Implement SRM Detection

**Task:** Build robust SRM (Sample Ratio Mismatch) detection:
1. Implement chi-square test for SRM
2. Add daily SRM monitoring
3. Create alerts for SRM issues
4. Build debugging tools to investigate SRM causes

In [None]:
# Your solution here


### Exercise 2: Multi-Metric Decision Framework

**Task:** Build system to make decisions with multiple metrics:
1. Define primary and guardrail metrics
2. Implement decision logic considering all metrics
3. Handle trade-offs (e.g., higher conversions but lower revenue)
4. Generate nuanced recommendations

In [None]:
# Your solution here


### Exercise 3: Experiment Scheduling System

**Task:** Build experiment scheduling and conflict detection:
1. Detect overlapping experiments
2. Identify metric pollution risks
3. Recommend optimal scheduling
4. Implement traffic allocation across multiple experiments

In [None]:
# Your solution here


### Exercise 4: Production Deployment

**Task:** Deploy platform to production:
1. Set up automated daily jobs
2. Configure monitoring and alerting
3. Create user documentation
4. Build admin dashboard
5. Implement access controls

In [None]:
# Your solution here


## Summary

In this notebook, you learned:

1. **Platform Architecture**
   - Complete system design
   - Database schema
   - Component integration

2. **Experiment Management**
   - Configuration system
   - Lifecycle management
   - Variant allocation

3. **Advanced Testing**
   - Multi-variant analysis
   - Sequential testing
   - Multiple comparison correction

4. **Automation**
   - Automated monitoring
   - Health checks
   - Alerting systems
   - Report generation

5. **Production Readiness**
   - Error handling
   - Data quality checks
   - Scalability
   - Documentation

## Next Steps

- Deploy platform to production
- Build custom integrations
- Add advanced features
- Train team on platform usage

## Additional Resources

- [Trustworthy Online Controlled Experiments](https://experimentguide.com/)
- [GrowthBook - Open Source Feature Flags](https://www.growthbook.io/)
- [Statsig Documentation](https://docs.statsig.com/)
- [Sample Ratio Mismatch](https://dl.acm.org/doi/10.1145/3292500.3330722)
- [Sequential Testing Methods](https://www.optimizely.com/optimization-glossary/sequential-testing/)