# Smart-Charging A/B Test — Notebook

This notebook is a guided skeleton to reproduce the A/B test analysis from the repository. It is organized into sections: setup, data generation, exploratory data analysis (EDA), KPI calculation, statistical tests (proportion z-test, t-test), bootstrap confidence intervals, visualization, and next steps (segmentation / optimization).

## 1) Setup (install dependencies and imports)

If running locally make sure you've installed the `requirements.txt` in a virtual environment.


In [None]:
# Imports
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import statsmodels.api as sm

# add repo root to path if needed
import sys
sys.path.append('..')

print('Imports ok')

## 2) Generate synthetic data (or load existing CSV)

You can either run the provided script from the terminal or call the functions directly from `generate_data.py`.

In [None]:
# Option A: call the script to generate CSV (shell)
# !python ../generate_data.py --output ../data/ab_test_data.csv --n_users 5000 --seed 42

# Option B: import generator functions directly (recommended in notebook)
from generate_data import generate_users, generate_sessions, assign_treatment, simulate_pricing_and_charging
users = generate_users(2000, seed=42)
sessions = generate_sessions(users, sessions_per_user_mean=3, seed=42)
sessions = assign_treatment(sessions, seed=42)
data = simulate_pricing_and_charging(sessions, seed=42)
os.makedirs('../data', exist_ok=True)
data.to_csv('../data/ab_test_data.csv', index=False)
print('Data generated:', data.shape)
data.head()

## 3) Basic EDA

Look at distributions, class balance, and summary statistics for primary KPIs.

In [None]:
df = pd.read_csv('../data/ab_test_data.csv', parse_dates=['session_ts'])
df.info()
display(df.describe(include='all'))
display(df.groupby('group').agg(sessions=('user_id','count'), low_share=('charged_in_low','mean'), avg_kwh=('energy_kwh','mean')))


## 4) KPI definitions

Define the primary metrics used for the experiment analysis.

In [None]:
def compute_kpis(df):
    grouped = df.groupby('group').agg(
        sessions=('user_id','count'),
        low_share=('charged_in_low','mean'),
        avg_kwh=('energy_kwh','mean')
    ).reset_index()
    return grouped

compute_kpis(df)


## 5) Statistical tests

Run a two-sample z-test for proportions (charged_in_low) and a Welch's t-test for average kWh.

In [None]:
def z_test_proportions(df, metric_col='charged_in_low'):
    cont = df[df['group']=='control'][metric_col].astype(int)
    treat = df[df['group']=='treatment'][metric_col].astype(int)
    count = np.array([treat.sum(), cont.sum()])
    nobs = np.array([len(treat), len(cont)])
    stat, pval = sm.stats.proportions_ztest(count, nobs, alternative='two-sided')
    return stat, pval, treat.mean(), cont.mean(), treat.mean()-cont.mean()

def t_test_continuous(df, metric_col='energy_kwh'):
    cont = df[df['group']=='control'][metric_col]
    treat = df[df['group']=='treatment'][metric_col]
    stat, pval = stats.ttest_ind(treat, cont, equal_var=False)
    return stat, pval, treat.mean(), cont.mean(), treat.mean()-cont.mean()

z_stat, z_p, t_mean, c_mean, diff = z_test_proportions(df)
print('Proportion z-test: stat={:.3f}, p={:.4f}, treat_mean={:.4f}, control_mean={:.4f}, diff={:.4f}'.format(z_stat, z_p, t_mean, c_mean, diff))
t_stat, t_p, t_tmean, t_cmean, t_diff = t_test_continuous(df)
print('T-test energy_kwh: stat={:.3f}, p={:.4f}, treat_mean={:.4f}, control_mean={:.4f}, diff={:.4f}'.format(t_stat, t_p, t_tmean, t_cmean, t_diff))


## 6) Bootstrap confidence intervals

Bootstrap the difference in means for robustness and to create non-parametric CIs.

In [None]:
def bootstrap_diff(df, metric_col, n_boot=2000, seed=1):
    rng = np.random.default_rng(seed)
    treat = df[df['group']=='treatment'][metric_col].values
    cont = df[df['group']=='control'][metric_col].values
    diffs = []
    for _ in range(n_boot):
        s1 = rng.choice(treat, size=len(treat), replace=True)
        s0 = rng.choice(cont, size=len(cont), replace=True)
        diffs.append(s1.mean() - s0.mean())
    diffs = np.array(diffs)
    lower = np.percentile(diffs, 2.5)
    upper = np.percentile(diffs, 97.5)
    return lower, upper, diffs

lower, upper, diffs = bootstrap_diff(df, 'energy_kwh', n_boot=2000, seed=2)
print(f'Bootstrap 95% CI for energy_kwh diff: ({lower:.4f}, {upper:.4f})')
plt.hist(diffs, bins=40)
plt.title('Bootstrap distribution of mean difference (energy_kwh)')
plt.show()


## 7) Visualization & simple dashboards

Create quick figures that could be exported to a dashboard (bar charts, time-series).

In [None]:
import matplotlib.dates as mdates
df['date'] = pd.to_datetime(df['session_ts']).dt.date
daily = df.groupby(['date','group']).agg(low_share=('charged_in_low','mean'), avg_kwh=('energy_kwh','mean')).reset_index()
pivot = daily.pivot(index='date', columns='group', values='low_share')
pivot.plot(figsize=(10,4), marker='o')
plt.title('Täglicher Anteil Niedrigpreis-Ladevorgänge nach Gruppe')
plt.ylabel('Anteil in Niedrigpreisfenster')
plt.tight_layout()
plt.show()


## 8) Next steps / extensions

- Segmentation: Heterogeneous Treatment Effects by propensity decile
- Power analysis / sample size calculation
- Integrate with BigQuery and create Looker Studio / Power BI dashboard
- Optimization notebook: constrained charging schedule using `cvxpy` or `scipy.optimize`
