## Introduction

# A/B Testing Overview

## 1. What is A/B Testing?
- **Definition**: A/B testing is a method of comparing two versions of a variable (A and B) to determine which one performs better.
- **Purpose**: Used to test changes to a web page, app, or product to optimize for a desired outcome (e.g., click-through rate, conversion rate).

## 2. Key Terminology
- **Hypothesis**: A prediction about which version (A or B) will perform better.
- **Control (A)**: The original version that serves as a baseline.
- **Variant (B)**: The modified version being tested against the control.
- **Conversion**: The desired action (e.g., signing up, purchasing).
- **Conversion Rate**: Percentage of users who complete the conversion out of total visitors.

## 3. Steps in A/B Testing
1. **Define Goal**: What metric are you trying to improve?
2. **Identify Hypothesis**: State what you think will improve with the change.
3. **Determine Sample Size**: How many participants are needed for statistically valid results?
4. **Divide Sample**: Randomly assign users to either group A or group B.
5. **Run Test**: Display each version and collect data on user actions.
6. **Analyze Results**: Compare conversion rates and use statistical tests to determine significance.

## 4. Statistical Analysis
- **Significance Level (α)**: Commonly set at 0.05; represents the probability of rejecting the null hypothesis when it's true.
- **P-Value**: Shows the probability that observed results are due to chance. A p-value < α suggests the result is statistically significant.
- **Confidence Interval (CI)**: Range within which the true effect size is expected to lie with a given confidence level (e.g., 95%).

## 5. Types of Hypothesis Tests
- **Two-Tailed Test**: Used when you want to detect any difference in performance.
- **One-Tailed Test**: Used when you expect a specific direction of improvement.

## 6. Metrics to Measure
- **Absolute Conversion Rate**: (Conversions / Total visitors) x 100%
- **Relative Conversion Rate**: (Conversion Rate B - Conversion Rate A) / Conversion Rate A x 100%
- **Lift**: Improvement seen in variant B over control A.

## 7. Tools for A/B Testing
- **Common Tools**: Google Optimize, Optimizely, VWO, and custom implementations in Python, R, etc.

## 8. Best Practices
- **Only Test One Variable at a Time**: Isolate the variable for reliable results.
- **Ensure Randomization**: Randomly assign users to prevent bias.
- **Monitor Duration**: Run tests long enough to capture meaningful data.
- **Avoid Peeking**: Checking results too early can lead to incorrect conclusions.

## 9. Limitations
- **Sample Bias**: Ensure sample represents the population for accurate results.
- **External Factors**: Seasonality, marketing campaigns, or other external influences may affect results.
- **Limited Scope**: Results may not generalize beyond the tested population or time frame.

---

### Example Hypothesis
- **Hypothesis**: "Changing the button color to green will increase the click-through rate by 5%."

---

### Sample Code for A/B Test in Python
```python
from scipy.stats import ttest_ind

# Assume data for conversions in control (A) and variant (B)
control_conversions = [10, 12, 15, 20]
variant_conversions = [12, 14, 18, 25]

# T-test to check if there's a significant difference
t_stat, p_value = ttest_ind(control_conversions, variant_conversions)
print("P-Value:", p_value)


## Concepts from LunarTech

### 1. A/B Basics:
Also called split testing................used in business to test :
1. new UX features
2. new versions of a product
3. new versions of an algorithm


### 2. Definitions
Control Group........exposed to one version or the current version of product
Experimental Group.......exposed to second or the new version fo product


## 3. A/B Testing Process
1. Hypothesis of A/B Test
2. A/B Test Design/ Power Analysis
3. Run the A/B test
4. Result Analysis/ Statistical Significance
5. Result Analysis/ Practical Significance

## 4. Hypothesis and Primary Metric(Step 1)
Business Hypothesis describes what two products are being compared and what is the desired impact or difference for the business
. how to fix a potential issue in the product
. solution will influence the key performance indicators(KPIs)
Example: changing the color of a button to enhance the performance of a customers 
### Primary Metric:
Measure the performance of the product being tested in the A/B test for the experimental and control groups. It will be used to identify whether there is a **statistically significant difference** between these two groups.
**NB** 
There should be a single primary metric
Answering the metric validity question

### Choosing Primary Metric
**Mertric Validity Question** if this chosen metric were to increase significantly while everything else stays constant, would we achieve our goal and address the problem?
- higher revenue?
- higher engagement?
- more views?
### Revenue Primary Metric

**Conversion rate** = (Number of conversions(purchases made)/ Number of total visitors)x 100%

### Engagement Primary Metric

**CTR(Click Through Rate)** = (Number of clicks/ Number of impressions) x 100%

### Statistical Hypothesis/Hypothesis Testing
Statistical procedure that is used to determine whether there is a significant difference between the observed data and the expected data:
. to test the results of an experiment
. establish statistical significance
. put hypothesis subject to reject under Null Hypothesis(H_0)
. put hypothesis subject to acceptance under Alternative Hypothesis(H_1)
**Example**
Null: CTR of **Learn More** button with Blue color is equal to CTR of green button
Alternative: CTR of **Learn More** button with Blue color is larger to CTR of green button

## 5. A/B Test Design(Step 2)
### Power Analysis...........................
1. Determine **Power** of the test
   - probability of correctly rejecting the null hypothesis
   - probability of not making a type II error(1-beta)
   - beta: Type II error
   - common to pick 80% as the **power** of the A/B test
### (1-beta): power of the test, beta- probability of type II error
### Significance level 
2. Determine **Significance level** of the test
   - probability of correctly rejecting the null hypothesis while the null is true
   - detecting statistically significance while it's not
   - probability of making a type I error(alpha)
   - beta: Type II error
   - common to pick 5% as the **significance level** of the test
### alpha: probability of Type I error, significance level
### Minimum Detectable Effect
3. Determine **Minimum Detectable Effect** of the test
   - What is the substantive to the statistical significance for the business?
   - Proxy that relates to smallest effect that would matter in practice
   - No common level - depends on the business ask
### delta- minimum detectable effect
## Calculating Min Sample Size ## ............................
1. Primary metric of A/B testing is in the form of a binary variable
2. primary metric of the test is in the form of proportions or averages
............"Complete Guide to A/B testing Design, Implementation and Pitfalls"........................
 

## 3. A/B Test(Step 3)
## 4. Results Analysis(step 4)
...............**calculating the min sample size**................
**H_0: mu_con = mu_exp**
**H_1: mu_con != mu_exp**
......................**A/B Test Duration**...................
#### Duration = N/(# visitors per day)
Too small test duration-----------Novelty effects
Too large test duration-----------Maturation effects



In [14]:
## Practice:
import numpy as np
import pandas as pd
from scipy.stats import norm


# ----------------------------- Simulating Click Data for A/B Testing ------------------------------#
N_exp = 10000
N_con = 10000

# Generating Click Data
click_exp = pd.Series(np.random.binomial(1,0.5,size = N_exp))
click_con = pd.Series(np.random.binomial(1,0.2,size = N_con))

# Generate Group Identifier
exp_id = pd.Series(np.repeat("exp", N_exp))
con_id = pd.Series(np.repeat("con", N_con))

df_exp = pd.concat([click_exp,exp_id],axis = 1)
df_con = pd.concat([click_con,con_id],axis = 1)

df_exp.columns = ["click", "group"]
df_con.columns = ["click", "group"]

df_ab_test = pd.concat([df_exp, df_con], axis=0).reset_index(drop=True)
print(df_ab_test)


       click group
0          0   exp
1          0   exp
2          1   exp
3          1   exp
4          1   exp
...      ...   ...
19995      1   con
19996      1   con
19997      1   con
19998      1   con
19999      1   con

[20000 rows x 2 columns]


In [20]:
df_ab_test['group'].unique()

array(['exp', 'con'], dtype=object)

In [None]:
df_ab_test['click'].unique()

In [22]:
# ----------------------------- Statistical Significance in A/B Testing ------------------------------#
# calculating the total number of clicks per group by summing 1's
X_con = df_ab_test.groupby("group")["click"].sum().loc["con"]
X_exp = df_ab_test.groupby("group")["click"].sum().loc["exp"]

# printing this for visibility
print(df_ab_test.groupby("group")["click"].sum())
print("Number of CLicks in Control: ", X_con)
print("Number of CLicks in Experimental: ", X_exp)

# statistical significance level of the test
alpha = 0.05
print("Alpha: significance level is:", alpha )

# computing the estimate of click probability per group
p_con_hat = X_con/N_con
p_exp_hat = X_exp/N_exp
print("Click Probability in Control Group:", p_con_hat)
print("Click Probability in Experimental Group:", p_exp_hat)

# computing the estimate of pooled clicked probability
p_pooled_hat = (X_con+X_exp)/(N_con + N_exp)

# computing the estimate of pooled variance
pooled_variance = p_pooled_hat * (1-p_pooled_hat) * (1/N_con + 1/N_exp)
print("p^_pooled is: ", p_pooled_hat)
print("pooled_variance is: ", pooled_variance)

# computing the standard error of the test
SE = np.sqrt(pooled_variance)
print("Standard Error is: ", SE)

# computing the test statistics of Z-test
Test_stat = (p_con_hat - p_exp_hat)/SE
print("Test Statistics for 2-sample Z-test is:", Test_stat)

#
Z_crit = norm.ppf(1-alpha/2)
print("Z-critical value from Standard Normal distribution: ", Z_crit)

p_value = 2 * norm.sf(abs(Test_stat))
print("P-value of the 2-sample Z-test: ",round(p_value,3))


CI = [round((p_exp_hat - p_con_hat) - SE*Z_crit,3), round((p_exp_hat - p_con_hat) + SE*Z_crit,3)]
print("Confidence Interval of the 2 sample Z-test is: ", CI)




group
con    2037
exp    4922
Name: click, dtype: int32
Number of CLicks in Control:  2037
Number of CLicks in Experimental:  4922
Alpha: significance level is: 0.05
Click Probability in Control Group: 0.2037
Click Probability in Experimental Group: 0.4922
p^_pooled is:  0.34795
pooled_variance is:  4.53761595e-05
Standard Error is:  0.006736182858266245
Test Statistics for 2-sample Z-test is: -42.82840980867524
Z-critical value from Standard Normal distribution:  1.959963984540054
P-value of the 2-sample Z-test:  0.0
Confidence Interval of the 2 sample Z-test is:  [0.275, 0.302]


## A/B Test Insights

### 1. Significant Increase in Click-Through Rate
- The experimental group’s click-through rate (49.22%) is significantly higher than the control group’s rate (20.37%).
- This difference indicates that the change implemented in the experimental setup (e.g., a new feature, design, or wording) positively impacted user engagement.

### 2. Strong Statistical Significance
- The test yielded a Z-score of -42.83, far exceeding the Z-critical threshold of 1.96 for a 5% significance level.
- The p-value of approximately 0.0 is well below the 0.05 threshold, meaning the observed difference is very unlikely to be due to chance.

### 3. Robust Confidence Interval
- The 95% confidence interval for the difference in click rates is [0.275, 0.302], suggesting with high confidence that the experimental setup improves the click rate by 27.5% to 30.2%.
- This entirely positive interval supports the reliability of the experimental group's improved performance.

### 4. Practical Impact
- The experimental setup not only shows statistical significance but also a meaningful practical impact, with nearly double the click-through rate compared to the control group.
- Implementing the change could lead to substantial improvements in user engagement, conversions, or other relevant metrics.

## Recommendations
- **Roll Out the Change**: Given the strong statistical and practical significance, it’s advisable to adopt the experimental setup widely.
- **Further Testing**: Consider testing variations of the successful experimental setup to refine and optimize its effectiveness further.
- **Monitor Performance**: After deployment, continuously monitor click-through rates to ensure sustained improvement, adjusting for any seasonal or external influences over time.
