# Discrete Event Simulation with spark-bestfit (RayBackend)

This notebook demonstrates how to use **distribution fitting** to power realistic
**discrete event simulations** for operational decision-making.

## What You'll Learn

1. **Fit distributions** to real operational data (arrivals, service times)
2. **Validate assumptions** - is it really Poisson? Test with KS!
3. **Build queue simulations** using fitted distributions
4. **Run what-if scenarios** to guide staffing decisions
5. **Quantify uncertainty** with confidence intervals

## Business Context

Operations teams often need to answer questions like:
- "If we add 2 more agents, how much will wait times decrease?"
- "What happens to service levels if call volume increases 20%?"
- "Is investing in training (reducing handle time) worth it?"

**The Problem**: These decisions are expensive and hard to reverse. You can't
easily experiment with real operations.

**The Solution**: Fit distributions to historical data, then simulate scenarios
to predict outcomes before committing resources.

## Prerequisites

```bash
pip install spark-bestfit[ray] pandas numpy matplotlib scipy
```

## Setup

In [None]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

import ray
from spark_bestfit import DistributionFitter, DiscreteDistributionFitter
from spark_bestfit.backends.ray import RayBackend

# Initialize Ray (skip if already initialized)
if not ray.is_initialized():
    ray.init(ignore_reinit_error=True)

# Create RayBackend
backend = RayBackend()
print(f"RayBackend initialized with {backend.get_parallelism()} CPUs")

## Part 1: Generate Realistic Call Center Data

In production, this would be historical data from your systems. We'll simulate
realistic call center operations with:

- **Arrivals**: Non-homogeneous Poisson (higher rates during business hours)
- **Service times**: Lognormal (right-skewed, some long calls)
- **Abandonment**: Customers who hang up after waiting too long

In [None]:
np.random.seed(42)

def generate_call_center_data(n_days=30, base_rate=50):
    """
    Generate realistic call center data.
    
    Returns DataFrame with: timestamp, inter_arrival_time, service_time, 
    hour_of_day, day_of_week, abandoned
    """
    records = []
    
    for day in range(n_days):
        day_of_week = day % 7  # 0=Monday, 6=Sunday
        
        # Simulate each hour of operation (8 AM to 8 PM)
        for hour in range(8, 20):
            # Arrival rate varies by hour (peak at 10-11 AM and 2-3 PM)
            hour_factor = 1.0
            if hour in [10, 11]:
                hour_factor = 1.8  # Morning peak
            elif hour in [14, 15]:
                hour_factor = 1.5  # Afternoon peak
            elif hour in [8, 19]:
                hour_factor = 0.6  # Start/end of day slower
            
            # Weekend factor
            if day_of_week >= 5:
                hour_factor *= 0.4  # Much lower on weekends
            
            # Number of calls this hour (Poisson)
            hourly_rate = base_rate * hour_factor
            n_calls = np.random.poisson(hourly_rate)
            
            # Generate inter-arrival times (exponential with some burstiness)
            # Real data often shows slight overdispersion
            if n_calls > 0:
                # Use Weibull instead of pure exponential for slight burstiness
                inter_arrivals = stats.weibull_min.rvs(c=0.9, scale=60/hourly_rate*60, size=n_calls)
                
                # Service times: lognormal (right-skewed)
                # Average ~5 minutes, but some calls much longer
                service_times = stats.lognorm.rvs(s=0.6, scale=300, size=n_calls)  # seconds
                
                # Abandonment probability increases with expected wait
                # (simplified - in reality depends on queue state)
                abandon_prob = 0.05 + 0.02 * (hourly_rate / base_rate - 1)
                abandoned = np.random.binomial(1, abandon_prob, n_calls)
                
                for i in range(n_calls):
                    records.append({
                        'day': day,
                        'hour': hour,
                        'day_of_week': day_of_week,
                        'inter_arrival_seconds': inter_arrivals[i],
                        'service_time_seconds': service_times[i],
                        'abandoned': abandoned[i]
                    })
    
    return pd.DataFrame(records)

# Generate 30 days of call center data
call_data = generate_call_center_data(n_days=30, base_rate=50)

print(f"Generated {len(call_data):,} call records over 30 days")
print(f"\nSample data:")
print(call_data.head(10))

In [None]:
# Summary statistics
print("Call Center Data Summary:")
print(f"\nTotal calls: {len(call_data):,}")
print(f"Abandoned calls: {call_data['abandoned'].sum():,} ({call_data['abandoned'].mean():.1%})")
print(f"\nInter-arrival times (seconds):")
print(f"  Mean: {call_data['inter_arrival_seconds'].mean():.1f}")
print(f"  Median: {call_data['inter_arrival_seconds'].median():.1f}")
print(f"  Std: {call_data['inter_arrival_seconds'].std():.1f}")
print(f"\nService times (seconds):")
print(f"  Mean: {call_data['service_time_seconds'].mean():.1f} ({call_data['service_time_seconds'].mean()/60:.1f} min)")
print(f"  Median: {call_data['service_time_seconds'].median():.1f} ({call_data['service_time_seconds'].median()/60:.1f} min)")
print(f"  Std: {call_data['service_time_seconds'].std():.1f}")

## Part 2: Exploratory Data Analysis

Before fitting distributions, let's understand our data patterns.

In [None]:
# Visualize patterns
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Calls by hour
hourly_calls = call_data.groupby('hour').size()
axes[0, 0].bar(hourly_calls.index, hourly_calls.values, color='steelblue', edgecolor='black')
axes[0, 0].set_xlabel('Hour of Day')
axes[0, 0].set_ylabel('Total Calls')
axes[0, 0].set_title('Call Volume by Hour')
axes[0, 0].set_xticks(range(8, 20))

# Calls by day of week
daily_calls = call_data.groupby('day_of_week').size()
day_names = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
axes[0, 1].bar(range(7), daily_calls.values, color='coral', edgecolor='black')
axes[0, 1].set_xlabel('Day of Week')
axes[0, 1].set_ylabel('Total Calls')
axes[0, 1].set_title('Call Volume by Day of Week')
axes[0, 1].set_xticks(range(7))
axes[0, 1].set_xticklabels(day_names)

# Inter-arrival time distribution
axes[1, 0].hist(call_data['inter_arrival_seconds'], bins=50, density=True, 
                alpha=0.7, color='green', edgecolor='black')
axes[1, 0].set_xlabel('Inter-Arrival Time (seconds)')
axes[1, 0].set_ylabel('Density')
axes[1, 0].set_title('Inter-Arrival Time Distribution')
axes[1, 0].set_xlim(0, 200)

# Service time distribution
axes[1, 1].hist(call_data['service_time_seconds'] / 60, bins=50, density=True,
                alpha=0.7, color='purple', edgecolor='black')
axes[1, 1].set_xlabel('Service Time (minutes)')
axes[1, 1].set_ylabel('Density')
axes[1, 1].set_title('Service Time Distribution')
axes[1, 1].axvline(call_data['service_time_seconds'].mean()/60, color='red', 
                   linestyle='--', label=f"Mean: {call_data['service_time_seconds'].mean()/60:.1f} min")
axes[1, 1].legend()

plt.suptitle('Call Center Data Exploration', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## Part 3: Fit Distributions to Operational Processes

Now we'll use spark-bestfit to identify the best distributions for:

1. **Inter-arrival times** (continuous) - How long between calls?
2. **Service times** (continuous) - How long does each call take?
3. **Hourly call volume** (discrete) - How many calls per hour?

In [None]:
# Create hourly aggregates for discrete fitting
hourly_volumes = call_data.groupby(['day', 'hour']).size().reset_index(name='call_count')

print(f"Call-level data: {len(call_data):,} records")
print(f"Hourly aggregates: {len(hourly_volumes):,} records")

In [None]:
# Fit continuous distributions to inter-arrival and service times
cont_fitter = DistributionFitter(backend=backend)

# Inter-arrival times
interarrival_results = cont_fitter.fit(
    call_data,
    column='inter_arrival_seconds',
    max_distributions=20,
    lazy_metrics=False  # We want KS for validation
)

# Service times
service_results = cont_fitter.fit(
    call_data,
    column='service_time_seconds',
    max_distributions=20,
    lazy_metrics=False
)

print(f"Fitted {interarrival_results.count()} distributions to inter-arrival times")
print(f"Fitted {service_results.count()} distributions to service times")

In [None]:
# Best fits for inter-arrival times
print("INTER-ARRIVAL TIMES - Best Distributions:")
print("\nBy AIC (prediction):")
for i, fit in enumerate(interarrival_results.best(n=5, metric='aic'), 1):
    print(f"  {i}. {fit.distribution}: AIC={fit.aic:.1f}")

print("\nBy KS (goodness-of-fit):")
for i, fit in enumerate(interarrival_results.best(n=5, metric='ks_statistic'), 1):
    pval = fit.pvalue if fit.pvalue else 0
    print(f"  {i}. {fit.distribution}: KS={fit.ks_statistic:.4f}, p={pval:.4f}")

In [None]:
# Best fits for service times
print("SERVICE TIMES - Best Distributions:")
print("\nBy AIC (prediction):")
for i, fit in enumerate(service_results.best(n=5, metric='aic'), 1):
    print(f"  {i}. {fit.distribution}: AIC={fit.aic:.1f}")

print("\nBy KS (goodness-of-fit):")
for i, fit in enumerate(service_results.best(n=5, metric='ks_statistic'), 1):
    pval = fit.pvalue if fit.pvalue else 0
    print(f"  {i}. {fit.distribution}: KS={fit.ks_statistic:.4f}, p={pval:.4f}")

In [None]:
# Fit discrete distributions to hourly call volumes
disc_fitter = DiscreteDistributionFitter(backend=backend)

volume_results = disc_fitter.fit(
    hourly_volumes,
    column='call_count'
)

print("HOURLY CALL VOLUME - Best Distributions:")
print("\nBy AIC:")
for i, fit in enumerate(volume_results.best(n=5, metric='aic'), 1):
    print(f"  {i}. {fit.distribution}: AIC={fit.aic:.1f}")

## Part 4: Validate Common Assumptions

Operations research often assumes:
- Arrivals are **Poisson** -> inter-arrivals are **Exponential**
- Service times are **Exponential** (for M/M/c queues)

Let's test these assumptions with our data!

In [None]:
# Find exponential fit for inter-arrivals
exp_interarrival = None
for fit in interarrival_results.best(n=50, metric='aic'):
    if fit.distribution == 'expon':
        exp_interarrival = fit
        break

# Find exponential fit for service times
exp_service = None
for fit in service_results.best(n=50, metric='aic'):
    if fit.distribution == 'expon':
        exp_service = fit
        break

# Get best overall fits
best_interarrival = interarrival_results.best(n=1, metric='aic')[0]
best_service = service_results.best(n=1, metric='aic')[0]

print("ASSUMPTION TESTING")
print("="*60)

print("\n1. Inter-Arrival Times - Is Exponential appropriate?")
if exp_interarrival:
    print(f"   Exponential: AIC={exp_interarrival.aic:.1f}, KS p-value={exp_interarrival.pvalue:.4f}")
    print(f"   Best fit ({best_interarrival.distribution}): AIC={best_interarrival.aic:.1f}")
    aic_diff = exp_interarrival.aic - best_interarrival.aic
    if aic_diff < 10:
        print(f"   VERDICT: Exponential is ACCEPTABLE (AIC diff={aic_diff:.1f})")
    else:
        print(f"   VERDICT: Exponential is POOR (AIC diff={aic_diff:.1f})")
        print(f"   RECOMMENDATION: Use {best_interarrival.distribution} instead")

print("\n2. Service Times - Is Exponential appropriate?")
if exp_service:
    print(f"   Exponential: AIC={exp_service.aic:.1f}, KS p-value={exp_service.pvalue:.4f}")
    print(f"   Best fit ({best_service.distribution}): AIC={best_service.aic:.1f}")
    aic_diff = exp_service.aic - best_service.aic
    if aic_diff < 10:
        print(f"   VERDICT: Exponential is ACCEPTABLE (AIC diff={aic_diff:.1f})")
    else:
        print(f"   VERDICT: Exponential is POOR (AIC diff={aic_diff:.1f})")
        print(f"   RECOMMENDATION: Use {best_service.distribution} instead")

In [None]:
# Visualize assumption testing
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Inter-arrival comparison
x_ia = np.linspace(0, 150, 200)
ia_data = call_data['inter_arrival_seconds']
ia_data_plot = ia_data[ia_data < 150]

axes[0].hist(ia_data_plot, bins=40, density=True, alpha=0.6, 
             color='steelblue', edgecolor='black', label='Observed')

if exp_interarrival:
    exp_dist = exp_interarrival.get_scipy_dist()
    axes[0].plot(x_ia, exp_dist.pdf(x_ia), 'r--', lw=2, label=f'Exponential (assumed)')

best_ia_dist = best_interarrival.get_scipy_dist()
axes[0].plot(x_ia, best_ia_dist.pdf(x_ia), 'g-', lw=2, label=f'{best_interarrival.distribution} (best fit)')

axes[0].set_xlabel('Inter-Arrival Time (seconds)')
axes[0].set_ylabel('Density')
axes[0].set_title('Inter-Arrival Times: Assumption vs Reality')
axes[0].legend()

# Service time comparison
x_st = np.linspace(0, 1000, 200)
st_data = call_data['service_time_seconds']
st_data_plot = st_data[st_data < 1000]

axes[1].hist(st_data_plot, bins=40, density=True, alpha=0.6,
             color='coral', edgecolor='black', label='Observed')

if exp_service:
    exp_svc_dist = exp_service.get_scipy_dist()
    axes[1].plot(x_st, exp_svc_dist.pdf(x_st), 'r--', lw=2, label=f'Exponential (assumed)')

best_st_dist = best_service.get_scipy_dist()
axes[1].plot(x_st, best_st_dist.pdf(x_st), 'g-', lw=2, label=f'{best_service.distribution} (best fit)')

axes[1].set_xlabel('Service Time (seconds)')
axes[1].set_ylabel('Density')
axes[1].set_title('Service Times: Assumption vs Reality')
axes[1].legend()

plt.suptitle('Testing Common Queueing Theory Assumptions', fontsize=12, fontweight='bold')
plt.tight_layout()
plt.show()

## Part 5: Build Queue Simulation

Now we'll use our fitted distributions to simulate a call center queue.
This is a simple G/G/c queue (general arrival, general service, c servers).

In [None]:
def simulate_queue(n_agents, n_calls, arrival_dist, service_dist, seed=None):
    """
    Simulate a G/G/c queue using fitted distributions.
    
    Args:
        n_agents: Number of agents (servers)
        n_calls: Number of calls to simulate
        arrival_dist: Frozen scipy distribution for inter-arrival times
        service_dist: Frozen scipy distribution for service times
        seed: Random seed
    
    Returns:
        dict with simulation results
    """
    if seed is not None:
        np.random.seed(seed)
    
    # Generate arrivals and service times from fitted distributions
    inter_arrivals = arrival_dist.rvs(size=n_calls)
    service_times = service_dist.rvs(size=n_calls)
    
    # Ensure positive values
    inter_arrivals = np.maximum(inter_arrivals, 0.1)
    service_times = np.maximum(service_times, 1.0)
    
    # Calculate arrival times
    arrival_times = np.cumsum(inter_arrivals)
    
    # Track when each agent becomes free
    agent_free_at = np.zeros(n_agents)
    
    wait_times = []
    
    for i in range(n_calls):
        arrival = arrival_times[i]
        service = service_times[i]
        
        # Find the agent who becomes free first
        next_free_agent = np.argmin(agent_free_at)
        free_time = agent_free_at[next_free_agent]
        
        # Wait time = time until agent is free (0 if agent already free)
        wait = max(0, free_time - arrival)
        wait_times.append(wait)
        
        # Update when this agent will be free
        start_service = max(arrival, free_time)
        agent_free_at[next_free_agent] = start_service + service
    
    wait_times = np.array(wait_times)
    
    return {
        'n_agents': n_agents,
        'n_calls': n_calls,
        'mean_wait': wait_times.mean(),
        'median_wait': np.median(wait_times),
        'p95_wait': np.percentile(wait_times, 95),
        'pct_immediate': (wait_times == 0).mean() * 100,
        'pct_under_60s': (wait_times < 60).mean() * 100,
        'wait_times': wait_times
    }

# Get fitted distributions
arrival_dist = best_interarrival.get_scipy_dist()
service_dist = best_service.get_scipy_dist()

print(f"Using distributions:")
print(f"  Arrivals: {best_interarrival.distribution}")
print(f"  Service: {best_service.distribution}")

In [None]:
# Run baseline simulation (5 agents, peak hour volume)
# Simulate 2000 calls (roughly 2 peak hours)
baseline = simulate_queue(
    n_agents=5,
    n_calls=2000,
    arrival_dist=arrival_dist,
    service_dist=service_dist,
    seed=42
)

print("BASELINE SIMULATION (5 Agents)")
print("="*50)
print(f"Calls simulated: {baseline['n_calls']:,}")
print(f"\nWait Time Metrics:")
print(f"  Mean wait: {baseline['mean_wait']:.1f} seconds ({baseline['mean_wait']/60:.1f} min)")
print(f"  Median wait: {baseline['median_wait']:.1f} seconds")
print(f"  95th percentile: {baseline['p95_wait']:.1f} seconds ({baseline['p95_wait']/60:.1f} min)")
print(f"\nService Level:")
print(f"  Answered immediately: {baseline['pct_immediate']:.1f}%")
print(f"  Answered within 60s: {baseline['pct_under_60s']:.1f}%")

## Part 6: What-If Scenario Analysis

Now the powerful part: simulate different scenarios to guide decisions.

In [None]:
def run_scenario_analysis(arrival_dist, service_dist, n_calls=2000, n_simulations=20):
    """
    Run multiple scenarios with confidence intervals.
    """
    scenarios = {
        'Current (5 agents)': {'n_agents': 5, 'arrival_scale': 1.0, 'service_scale': 1.0},
        'Add 1 agent (6)': {'n_agents': 6, 'arrival_scale': 1.0, 'service_scale': 1.0},
        'Add 2 agents (7)': {'n_agents': 7, 'arrival_scale': 1.0, 'service_scale': 1.0},
        '20% more calls': {'n_agents': 5, 'arrival_scale': 0.83, 'service_scale': 1.0},
        '10% faster service': {'n_agents': 5, 'arrival_scale': 1.0, 'service_scale': 0.9},
    }
    
    results = {}
    
    for name, config in scenarios.items():
        scenario_waits = []
        scenario_svc_levels = []
        
        for sim in range(n_simulations):
            # Scale distributions if needed
            if config['arrival_scale'] != 1.0:
                scaled_arrivals = arrival_dist.rvs(size=n_calls) * config['arrival_scale']
                class ScaledDist:
                    def __init__(self, samples):
                        self.samples = samples
                    def rvs(self, size):
                        return self.samples[:size]
                arr_dist = ScaledDist(scaled_arrivals)
            else:
                arr_dist = arrival_dist
            
            if config['service_scale'] != 1.0:
                scaled_service = service_dist.rvs(size=n_calls) * config['service_scale']
                class ScaledDist2:
                    def __init__(self, samples):
                        self.samples = samples
                    def rvs(self, size):
                        return self.samples[:size]
                svc_dist = ScaledDist2(scaled_service)
            else:
                svc_dist = service_dist
            
            result = simulate_queue(
                n_agents=config['n_agents'],
                n_calls=n_calls,
                arrival_dist=arr_dist,
                service_dist=svc_dist,
                seed=sim * 100 + 42
            )
            scenario_waits.append(result['mean_wait'])
            scenario_svc_levels.append(result['pct_under_60s'])
        
        results[name] = {
            'mean_wait': np.mean(scenario_waits),
            'wait_ci': (np.percentile(scenario_waits, 5), np.percentile(scenario_waits, 95)),
            'service_level': np.mean(scenario_svc_levels),
            'sl_ci': (np.percentile(scenario_svc_levels, 5), np.percentile(scenario_svc_levels, 95))
        }
    
    return results

print("Running scenario analysis (20 simulations each)...")
scenarios = run_scenario_analysis(arrival_dist, service_dist)
print("Done!")

In [None]:
# Display scenario comparison
print("\n" + "="*80)
print("WHAT-IF SCENARIO ANALYSIS")
print("="*80)
print(f"\n{'Scenario':<25} {'Mean Wait (s)':<20} {'Service Level (<60s)':<25}")
print("-"*70)

baseline_wait = scenarios['Current (5 agents)']['mean_wait']
baseline_sl = scenarios['Current (5 agents)']['service_level']

for name, result in scenarios.items():
    wait_str = f"{result['mean_wait']:.1f} ({result['wait_ci'][0]:.1f}-{result['wait_ci'][1]:.1f})"
    sl_str = f"{result['service_level']:.1f}% ({result['sl_ci'][0]:.1f}-{result['sl_ci'][1]:.1f}%)"
    
    # Calculate change from baseline
    if name != 'Current (5 agents)':
        wait_change = ((result['mean_wait'] - baseline_wait) / baseline_wait) * 100
        sl_change = result['service_level'] - baseline_sl
        change_str = f"  [Wait: {wait_change:+.0f}%, SL: {sl_change:+.1f}pp]"
    else:
        change_str = "  [BASELINE]"
    
    print(f"{name:<25} {wait_str:<20} {sl_str:<25}{change_str}")

In [None]:
# Visualize scenario comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

scenario_names = list(scenarios.keys())
x = np.arange(len(scenario_names))

# Mean wait times with CI
waits = [scenarios[s]['mean_wait'] for s in scenario_names]
wait_errs = [[scenarios[s]['mean_wait'] - scenarios[s]['wait_ci'][0] for s in scenario_names],
             [scenarios[s]['wait_ci'][1] - scenarios[s]['mean_wait'] for s in scenario_names]]

colors = ['steelblue', 'green', 'green', 'red', 'orange']
bars1 = axes[0].bar(x, waits, color=colors, edgecolor='black', alpha=0.7)
axes[0].errorbar(x, waits, yerr=wait_errs, fmt='none', color='black', capsize=5)
axes[0].axhline(waits[0], color='gray', linestyle='--', alpha=0.5, label='Baseline')
axes[0].set_xlabel('Scenario')
axes[0].set_ylabel('Mean Wait Time (seconds)')
axes[0].set_title('Wait Time by Scenario')
axes[0].set_xticks(x)
axes[0].set_xticklabels([s.replace(' ', '\n') for s in scenario_names], fontsize=9)

# Service levels with CI
sls = [scenarios[s]['service_level'] for s in scenario_names]
sl_errs = [[scenarios[s]['service_level'] - scenarios[s]['sl_ci'][0] for s in scenario_names],
           [scenarios[s]['sl_ci'][1] - scenarios[s]['service_level'] for s in scenario_names]]

bars2 = axes[1].bar(x, sls, color=colors, edgecolor='black', alpha=0.7)
axes[1].errorbar(x, sls, yerr=sl_errs, fmt='none', color='black', capsize=5)
axes[1].axhline(80, color='red', linestyle='--', alpha=0.7, label='80% Target')
axes[1].set_xlabel('Scenario')
axes[1].set_ylabel('Service Level (% < 60s)')
axes[1].set_title('Service Level by Scenario')
axes[1].set_xticks(x)
axes[1].set_xticklabels([s.replace(' ', '\n') for s in scenario_names], fontsize=9)
axes[1].legend()

plt.suptitle('What-If Scenario Analysis Results', fontsize=12, fontweight='bold')
plt.tight_layout()
plt.show()

## Part 7: Decision Support Report

In [None]:
# Generate decision support report
print("="*70)
print("OPERATIONS DECISION SUPPORT REPORT")
print("="*70)

print(f"\n1. DATA FOUNDATION")
print(f"   Historical data: {len(call_data):,} calls over 30 days")
print(f"   Inter-arrival model: {best_interarrival.distribution} (validated by KS test)")
print(f"   Service time model: {best_service.distribution} (validated by KS test)")

print(f"\n2. CURRENT STATE ANALYSIS")
current = scenarios['Current (5 agents)']
print(f"   Agents: 5")
print(f"   Mean wait: {current['mean_wait']:.1f}s")
print(f"   Service level (<60s): {current['service_level']:.1f}%")

print(f"\n3. SCENARIO RECOMMENDATIONS")

# Hiring recommendation
add1 = scenarios['Add 1 agent (6)']
add2 = scenarios['Add 2 agents (7)']
wait_reduction_1 = (current['mean_wait'] - add1['mean_wait']) / current['mean_wait'] * 100
wait_reduction_2 = (current['mean_wait'] - add2['mean_wait']) / current['mean_wait'] * 100

print(f"\n   OPTION A: Hire 1 Additional Agent")
print(f"   - Wait time reduction: {wait_reduction_1:.0f}%")
print(f"   - Service level improvement: +{add1['service_level'] - current['service_level']:.1f}pp")

print(f"\n   OPTION B: Hire 2 Additional Agents")
print(f"   - Wait time reduction: {wait_reduction_2:.0f}%")
print(f"   - Service level improvement: +{add2['service_level'] - current['service_level']:.1f}pp")

# Training recommendation
training = scenarios['10% faster service']
print(f"\n   OPTION C: Invest in Training (10% handle time reduction)")
print(f"   - Wait time reduction: {(current['mean_wait'] - training['mean_wait']) / current['mean_wait'] * 100:.0f}%")
print(f"   - Service level improvement: +{training['service_level'] - current['service_level']:.1f}pp")
print(f"   - No additional headcount required")

# Risk assessment
growth = scenarios['20% more calls']
print(f"\n4. RISK ASSESSMENT")
print(f"   If call volume increases 20%:")
print(f"   - Mean wait would increase to {growth['mean_wait']:.0f}s (+{(growth['mean_wait']/current['mean_wait']-1)*100:.0f}%)")
print(f"   - Service level would drop to {growth['service_level']:.1f}%")

print(f"\n5. RECOMMENDATION")
if current['service_level'] < 80:
    print("   Current service level is BELOW 80% target.")
    print("   RECOMMEND: Hire 1-2 additional agents or invest in training.")
else:
    print("   Current service level MEETS 80% target.")
    print("   RECOMMEND: Monitor call volume growth; plan for contingency staffing.")

print("\n" + "="*70)

## Summary

This notebook demonstrated the full workflow for simulation-based decision making using RayBackend:

1. **Data Collection**: Loaded operational data (arrivals, service times)
2. **Distribution Fitting**: Used spark-bestfit to identify the best distributions
3. **Assumption Validation**: Tested whether common assumptions (exponential) hold
4. **Queue Simulation**: Built a realistic G/G/c queue using fitted distributions
5. **Scenario Analysis**: Ran what-if simulations with confidence intervals
6. **Decision Support**: Generated actionable recommendations

### Key spark-bestfit Features Used

| Feature | Purpose |
|---------|----------|
| `RayBackend` | Distributed parallel processing |
| `DistributionFitter` | Fit service times, inter-arrival times |
| `DiscreteDistributionFitter` | Fit hourly call volumes |
| `lazy_metrics=False` | Validate with KS/AD tests |
| `get_scipy_dist()` | Sample from fitted distributions |
| `metric='aic'` | Select best predictive model |

### Business Value

- **Avoid costly experiments**: Test scenarios in simulation first
- **Quantify uncertainty**: Confidence intervals on predictions
- **Validate assumptions**: Don't blindly assume exponential distributions
- **Data-driven decisions**: Replace intuition with evidence

In [None]:
# Cleanup
# ray.shutdown()  # Uncomment to shutdown Ray when done
print("Discrete event simulation complete!")