# A/B Test Design & Analysis

Objective:
Evaluate whether a new product recommendation feature improves
conversion and revenue metrics using a simulated A/B test.

This notebook covers:
- Experiment setup
- Random user assignment
- Metric comparison
- Statistical significance testing


### Experiment Context

A new recommendation experience is proposed to improve user conversion.

Because this is historical data, we simulate an A/B test by randomly
assigning users to control and variant groups and comparing outcomes.


### Hypotheses

Null Hypothesis (H₀):
There is no difference in conversion rate between control and variant.

Alternative Hypothesis (H₁):
The variant increases conversion rate compared to control.

Significance level (α): 0.05


In [2]:
import pandas as pd
import numpy as np
from pathlib import Path

BASE_PATH = Path("/Users/sabarish/Desktop/Project_1/eCommerce behavior")

oct_df = pd.read_csv(BASE_PATH / "2019-Oct.csv", nrows=500_000)
nov_df = pd.read_csv(BASE_PATH / "2019-Nov.csv", nrows=500_000)

events = pd.concat([oct_df, nov_df], ignore_index=True)
events["event_time"] = pd.to_datetime(events["event_time"])

events.shape


(1000000, 9)

We build a user-level table to ensure metrics are computed correctly
and to avoid event-level bias.


In [3]:
user_funnel = (
    events
    .pivot_table(
        index="user_id",
        columns="event_type",
        values="event_time",
        aggfunc="min"
    )
    .reset_index()
)

user_funnel = user_funnel.rename(
    columns={
        "view": "view_time",
        "cart": "cart_time",
        "purchase": "purchase_time"
    }
)

user_funnel.head()


event_type,user_id,cart_time,purchase_time,view_time
0,244951053,NaT,NaT,2019-10-01 08:47:35+00:00
1,274969076,NaT,NaT,2019-11-01 06:18:48+00:00
2,275256741,NaT,NaT,2019-11-01 02:23:03+00:00
3,295643776,NaT,NaT,2019-11-01 03:12:38+00:00
4,296465302,NaT,NaT,2019-11-01 04:10:21+00:00


Users are randomly assigned to control and variant groups.
The unit of randomization is user_id.


In [4]:
np.random.seed(42)

users = user_funnel[["user_id"]].copy()
users["ab_group"] = np.where(
    np.random.rand(len(users)) < 0.5,
    "control",
    "variant"
)

users["ab_group"].value_counts()


ab_group
variant    89002
control    88639
Name: count, dtype: int64

In [5]:
user_funnel_ab = user_funnel.merge(users, on="user_id", how="left")
events_ab = events.merge(users, on="user_id", how="left")


We compute metrics separately for control and variant groups.


In [6]:
conversion_by_group = (
    user_funnel_ab
    .assign(purchased=user_funnel_ab["purchase_time"].notna())
    .groupby("ab_group")["purchased"]
    .mean()
)

conversion_by_group


ab_group
control    0.081499
variant    0.082695
Name: purchased, dtype: float64

In [7]:
rpu_by_group = (
    events_ab[events_ab["event_type"] == "purchase"]
    .groupby(["ab_group", "user_id"])["price"]
    .sum()
    .groupby("ab_group")
    .mean()
)

rpu_by_group


ab_group
control    412.252825
variant    412.373826
Name: price, dtype: float64

We test whether observed differences are statistically significant.

- Conversion rate: two-sample z-test (manual implementation)
- Revenue per user: Welch’s t-test


In [9]:
from scipy.stats import norm

conv_summary = (
    user_funnel_ab
    .assign(purchased=user_funnel_ab["purchase_time"].notna())
    .groupby("ab_group")["purchased"]
    .agg(["sum", "count"])
)

x1, n1 = conv_summary.loc["control", "sum"], conv_summary.loc["control", "count"]
x2, n2 = conv_summary.loc["variant", "sum"], conv_summary.loc["variant", "count"]

p1, p2 = x1 / n1, x2 / n2
p_pool = (x1 + x2) / (n1 + n2)

z_stat = (p2 - p1) / np.sqrt(p_pool * (1 - p_pool) * (1/n1 + 1/n2))
p_value_conversion = 2 * (1 - norm.cdf(abs(z_stat)))

z_stat, p_value_conversion


(np.float64(0.917872367892537), np.float64(0.3586856895185506))

In [10]:
from scipy.stats import ttest_ind

rev_control = (
    events_ab[(events_ab["ab_group"] == "control") & (events_ab["event_type"] == "purchase")]
    .groupby("user_id")["price"]
    .sum()
)

rev_variant = (
    events_ab[(events_ab["ab_group"] == "variant") & (events_ab["event_type"] == "purchase")]
    .groupby("user_id")["price"]
    .sum()
)

t_stat, p_value_rpu = ttest_ind(
    rev_control,
    rev_variant,
    equal_var=False,
    nan_policy="omit"
)

t_stat, p_value_rpu


(np.float64(-0.011036252565462094), np.float64(0.991194674201797))

### Experiment Results Summary

- Conversion rates were compared between control and variant groups
- Revenue per user was evaluated as a guardrail metric
- Statistical significance was assessed at α = 0.05

Results from this notebook will be used to make
a final product decision in the next step.
