# A/B Testing for Business Decision Making


#### A/B Testing: Comparing proportions to assess business strategies.

Example Problem: You are testing two versions of a webpage (A and B) to compare their conversion rates. Webpage A had 2000 visitors with 300 conversions, while Webpage B had 1800 visitors with 330 conversions. Is webpage B significantly better?

In [80]:
from statsmodels.stats.proportion import proportions_ztest

In [83]:
success_a, success_b = 300, 330
n_a, n_b = 2000, 1800

In [85]:
stat, p_value = proportions_ztest([success_a, success_b], [n_a, n_b])

In [87]:
print(f"Z-Statistic: {stat:.2f}, P-Value: {p_value:.4f}")


Z-Statistic: -2.76, P-Value: 0.0058


In [89]:
if p_value < 0.05:
    print("Reject the null hypothesis: Webpage B performs significantly better.")
else:
    print("Fail to reject the null hypothesis: No significant difference.")

Reject the null hypothesis: Webpage B performs significantly better.


# One-Sample t-Test for Mean Comparison


#### One-Sample t-Test: Validating claims about a population mean.

Example Problem: Your company claims that the average delivery time for online orders is 30 minutes. A random sample of 50 deliveries has an average time of 32 minutes with a standard deviation of 5 minutes. Is the claim accurate?

In [93]:
from scipy.stats import ttest_1samp
import numpy as np

In [95]:
sames_times = np.random.normal(32,5,50)
population_mean = 30

In [99]:
stat, p_value = ttest_1samp(sames_times, population_mean)
print(f"T-Statistic: {stat:.2f}, P-Value: {p_value:.4f}")


T-Statistic: 4.79, P-Value: 0.0000


In [101]:
if p_value < 0.05:
    print("Reject the null hypothesis: The average delivery time is not 30 minutes.")
else:
    print("Fail to reject the null hypothesis: The average delivery time is 30 minutes.")

Reject the null hypothesis: The average delivery time is not 30 minutes.


# Two-Sample t-Test for Comparing Means


#### Two-Sample t-Test: Comparing means between two independent groups.

Example Problem: You want to compare the average sales of two stores (Store A and Store B). Store A’s sales data has a mean of $5000 with a standard deviation of $700 (50 observations), while Store B’s sales data has a mean of $5200 with a standard deviation of $750 (45 observations). Are the sales significantly different?

In [108]:
from scipy.stats import ttest_ind

In [110]:
# sample data
mean_a, std_a, n_a = 5000, 700, 50
mean_b, std_b, n_b = 5200, 750, 45

In [112]:
np.random.seed(42)
sales_a = np.random.normal(mean_a, std_a, n_a)
sales_b = np.random.normal(mean_b, std_b, n_b)

In [116]:
stat, p_value = ttest_ind(sales_a, sales_b)

print(f"T-Statistic: {stat:.2f}, P-Value: {p_value:.4f}")


T-Statistic: -2.88, P-Value: 0.0049


In [120]:
if p_value < 0.05:
    print("Reject the null hypothesis: The average sales are significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference in sales.")

Reject the null hypothesis: The average sales are significantly different.


# Chi-Square Test for Independence


#### Chi-Square Test: Assessing associations between categorical variables.

Example Problem: You are analyzing customer preferences based on two variables: Gender (Male/Female) and Preferred Product (Product A/Product B). Is there a significant association between gender and product preference?

In [124]:
from scipy.stats import chi2_contingency
import pandas as pd

In [126]:
# contingency table
data = {'Product A': [50, 60], 'Product B': [30, 40]}
df = pd.DataFrame(data, index=['Male', 'Female'])

In [132]:
stat, p_value, dof, expected = chi2_contingency(df)

In [134]:
print(f"Chi-Square Statistic: {stat:.2f}, P-Value: {p_value:.4f}")


Chi-Square Statistic: 0.04, P-Value: 0.8508


In [136]:
if p_value < 0.05:
    print("Reject the null hypothesis: Gender and product preference are associated.")
else:
    print("Fail to reject the null hypothesis: No significant association.")

Fail to reject the null hypothesis: No significant association.


# ANOVA for Comparing Multiple Groups


#### ANOVA: Comparing means across multiple groups.


Example Problem: You are comparing the average monthly sales of three regions (North, South, and West). Generate sales data and check if there is a significant difference in sales across regions.

In [147]:
from scipy.stats import f_oneway

In [149]:
# sample data
np.random.seed(42)
north_sales = np.random.normal(5000, 500, 30)
south_sales = np.random.normal(5200, 600, 30)
west_sales = np.random.normal(4800, 400, 30)

In [153]:
stat, p_value = f_oneway(north_sales, south_sales, west_sales)

In [155]:
print(f"F-Statistic: {stat:.2f}, P-Value: {p_value:.4f}")


F-Statistic: 3.64, P-Value: 0.0304


In [157]:
if p_value < 0.05:
    print("Reject the null hypothesis: At least one region has significantly different sales.")
else:
    print("Fail to reject the null hypothesis: No significant difference in sales across regions.")

Reject the null hypothesis: At least one region has significantly different sales.
