✅ Scenario:

You are a product manager for an e-commerce website. You want to test whether changing the "Add to Cart" button color from blue to red increases the conversion rate (percentage of visitors who make a purchase).


In [1]:
import pandas as pd
import numpy as np

# Simulate data for 10,000 users
np.random.seed(42)

# Group A: Control group (Blue button)
control_group = pd.DataFrame({
    'group': 'A',
    'converted': np.random.choice([0, 1], size=5000, p=[0.90, 0.10])  # 10% conversion rate
})

# Group B: Test group (Red button)
test_group = pd.DataFrame({
    'group': 'B',
    'converted': np.random.choice([0, 1], size=5000, p=[0.88, 0.12])  # 12% conversion rate
})

# Combine both groups
ab_test_data = pd.concat([control_group, test_group], ignore_index=True)

# Preview the dataset
print(ab_test_data.head())


  group  converted
0     A          0
1     A          1
2     A          0
3     A          0
4     A          0


In [2]:
# Check the number of users in each group
print(ab_test_data['group'].value_counts())

# Calculate the conversion rate for each group
conversion_rates = ab_test_data.groupby('group')['converted'].mean()
print("\nConversion Rates:\n", conversion_rates)


group
A    5000
B    5000
Name: count, dtype: int64

Conversion Rates:
 group
A    0.0958
B    0.1134
Name: converted, dtype: float64


Formulate Hypotheses

    Null Hypothesis (H₀): There is no difference in conversion rates between the control group (blue button) and the test group (red button).
    Alternative Hypothesis (H₁): The conversion rate for the test group (red button) is different from the control group.
    

In [3]:
from scipy.stats import chi2_contingency

# Create a contingency table
contingency_table = pd.crosstab(ab_test_data['group'], ab_test_data['converted'])

# Perform Chi-Square Test
chi2, p_value, _, _ = chi2_contingency(contingency_table)

print("\nChi-Square Test Statistic:", chi2)
print("P-value:", p_value)



Chi-Square Test Statistic: 8.081458194442725
P-value: 0.004472044654175628


Interpret the Results
p<0.05 statistically significant


In [4]:
control_conversion_rate = conversion_rates['A']
test_conversion_rate = conversion_rates['B']

# Calculate lift
lift = (test_conversion_rate - control_conversion_rate) / control_conversion_rate * 100
print("\nLift: {:.2f}%".format(lift))



Lift: 18.37%


In [5]:
import statsmodels.api as sm

# Number of users in each group
n_control = control_group.shape[0]
n_test = test_group.shape[0]

# Conversion counts
conv_control = control_group['converted'].sum()
conv_test = test_group['converted'].sum()

# Proportions
p_control = conv_control / n_control
p_test = conv_test / n_test

# Standard error
se = np.sqrt(p_control * (1 - p_control) / n_control + p_test * (1 - p_test) / n_test)

# Confidence interval
lower_bound = (p_test - p_control) - 1.96 * se
upper_bound = (p_test - p_control) + 1.96 * se

print("\n95% Confidence Interval: [{:.4f}, {:.4f}]".format(lower_bound, upper_bound))



95% Confidence Interval: [0.0056, 0.0296]
