# Statistical Analysis & Hypothesis Testing - Example Usage

This notebook demonstrates the practical application of statistical analysis tools for business decision-making.

## Scenarios Covered:
1. **Marketing A/B Test** - Compare two email campaigns
2. **Regression Analysis** - Identify sales drivers
3. **Power Analysis** - Sample size planning for experiments

In [1]:
# Import libraries
import sys
sys.path.append('./src')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from hypothesis_tests import two_sample_ttest, one_way_anova, chi_square_test
from regression_utils import linear_regression_analysis, check_regression_assumptions
from power_analysis import power_ttest, sample_size_ttest

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

print("‚úÖ All modules imported successfully!")

‚úÖ All modules imported successfully!


## Scenario 1: Marketing A/B Test

**Business Question**: Did the new email campaign perform better than the old one?

**Data**: 
- Campaign A (control): 500 recipients, 65 conversions
- Campaign B (variant): 500 recipients, 85 conversions

In [2]:
# Generate sample data for A/B test
np.random.seed(42)

# Campaign A (control) - 13% conversion rate
campaign_a = np.concatenate([
    np.ones(65),   # conversions
    np.zeros(435)  # no conversions
])

# Campaign B (variant) - 17% conversion rate
campaign_b = np.concatenate([
    np.ones(85),   # conversions
    np.zeros(415)  # no conversions
])

# Perform two-sample t-test
result = two_sample_ttest(campaign_a, campaign_b)

print(result.summary())
print(f"\nüìä Interpretation:")
print(f"Campaign B showed a {result.effect_size:.2f} standard deviation improvement")
print(f"Conversion lift: {(campaign_b.mean() - campaign_a.mean()) * 100:.1f} percentage points")
print(f"Statistical significance: {'YES ‚úÖ' if result.p_value < 0.05 else 'NO ‚ùå'}")


Two-Sample T-Test
Test Statistic: -1.7722
P-value: 0.0767
Effect Size: -0.1121
95% CI: (-0.0843, 0.0043)

Conclusion: No significant difference (p=0.0767 >= 0.05)
None

üìä Interpretation:
Campaign B showed a -0.11 standard deviation improvement
Conversion lift: 4.0 percentage points
Statistical significance: NO ‚ùå


## Scenario 2: Regression Analysis - Sales Drivers

**Business Question**: What factors drive product sales?

**Variables**:
- Price ($)
- Advertising Spend ($1000s)
- Competitor Price ($)
- Sales (units)

In [3]:
# Generate sales data
np.random.seed(42)
n_samples = 100

# Create realistic sales data
price = np.random.uniform(20, 50, n_samples)
ad_spend = np.random.uniform(5, 50, n_samples)
competitor_price = np.random.uniform(25, 55, n_samples)

# Sales = base + price_effect + ad_effect + competitor_effect + noise
sales = (500
         - 8 * price
         + 12 * ad_spend
         + 5 * competitor_price
         + np.random.normal(0, 50, n_samples))

# Create DataFrame
df = pd.DataFrame({
    'price': price,
    'ad_spend': ad_spend,
    'competitor_price': competitor_price,
    'sales': sales
})

# Run regression analysis
X = df[['price', 'ad_spend', 'competitor_price']]
y = df['sales']

results = linear_regression_analysis(X, y)

print(results.summary())
print("\nüìä Key Insights:")
print(f"- $1 price increase ‚Üí {results.coefficients['price']:.1f} unit decrease in sales")
print(f"- $1K ad spend increase ‚Üí {results.coefficients['ad_spend']:.1f} unit increase in sales")
print(f"- $1 competitor price increase ‚Üí {results.coefficients['competitor_price']:.1f} unit increase in sales")
print(f"\nModel explains {results.r_squared*100:.1f}% of sales variation")


REGRESSION ANALYSIS SUMMARY

R¬≤: 0.9304
Adjusted R¬≤: 0.9282
F-statistic: 427.5064 (p-value: 0.0000e+00)
RMSE: 48.1739
MAE: 37.5939

----------------------------------------------------------------------
COEFFICIENTS
----------------------------------------------------------------------
Feature               Coefficient      p-value                    95% CI
----------------------------------------------------------------------
price                     -7.3068   0.0000e+00 [ -8.1866,  -6.4269] ***
ad_spend                  12.2317   0.0000e+00 [ 11.5525,  12.9109] ***
competitor_price           5.5489   0.0000e+00 [  4.8035,   6.2944] ***
----------------------------------------------------------------------
Significance codes: *** p<0.001, ** p<0.01, * p<0.05
None

üìä Key Insights:
- $1 price increase ‚Üí -7.3 unit decrease in sales
- $1K ad spend increase ‚Üí 12.2 unit increase in sales
- $1 competitor price increase ‚Üí 5.5 unit increase in sales

Model explains 93.0% of sales 

## Scenario 3: Power Analysis - Sample Size Planning

**Business Question**: How many customers do we need for our next experiment?

**Goal**: Detect a 5% conversion rate improvement with 80% power

In [5]:
# Calculate required sample size
effect_size = 0.5  # Medium effect (Cohen's d)
alpha = 0.05       # Significance level
power = 0.80       # Desired power

result = sample_size_ttest(effect_size=effect_size, alpha=alpha, power=power)

print(result.summary())
print(f"\nüìä Planning Recommendations:")
print(f"- Minimum sample size per group: {result.sample_size} customers")
print(f"- Total experiment size: {result.sample_size * 2} customers")
print(f"- With this sample size, we have {result.power*100:.0f}% power to detect the effect")
print(f"\nüí° Business Impact:")
smaller_power = power_ttest(100, effect_size, alpha)
print(f"  Running a smaller experiment (n=100) would only have {smaller_power*100:.0f}% power")


Two-Sample T-Test - Power Analysis
Effect Size: 0.500
Significance Level (Œ±): 0.050
Statistical Power: 0.801
Required Sample Size: 128

Recommendation: Large effect size (0.50). Smaller sample sufficient.

Total participants needed: 128 (64 per group)
None

üìä Planning Recommendations:
- Minimum sample size per group: 128 customers
- Total experiment size: 256 customers
- With this sample size, we have 80% power to detect the effect

üí° Business Impact:
  Running a smaller experiment (n=100) would only have nan% power
