### üéØ The goal of this exercise is to perform an internal validity (sanity check) to detect whether there is a sample ratio mismatch (SRM). To do this, we will run a Chi-square goodness-of-fit test.

#### About AdSmartABdata dataframe

**Columns Description:**

- **auction_id**: the unique id of the online user who has been presented the BIO. In standard terminologies this is called an impression id. The user may see the BIO questionnaire but choose not to respond. In that case both the yes and no columns are zero.
- **experiment**: which group the user belongs to - control or exposed.
- **control**: users who have been shown a dummy ad
- **exposed**: users who have been shown a creative, an online interactive ad, with the SmartAd brand.
- **date**: the date in YYYY-MM-DD format
- **hour**: the hour of the day in HH format.
- **device_make**: the name of the type of device the user has e.g. Samsung
- **platform_os**: the id of the OS the user has.
- **browser**: the name of the browser the user uses to see the BIO questionnaire.
- **yes**: 1 if the user chooses the ‚ÄúYes‚Äù radio button for the BIO questionnaire.
- **no**: 1 if the user chooses the ‚ÄúNo‚Äù radio button for the BIO questionnaire.

_link:_ https://www.kaggle.com/datasets/osuolaleemmanuel/ad-ab-testing

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
df = pd.read_csv('AdSmartABdata.csv')
df.head()

Unnamed: 0,auction_id,experiment,date,hour,device_make,platform_os,browser,yes,no
0,0008ef63-77a7-448b-bd1e-075f42c55e39,exposed,2020-07-10,8,Generic Smartphone,6,Chrome Mobile,0,0
1,000eabc5-17ce-4137-8efe-44734d914446,exposed,2020-07-07,10,Generic Smartphone,6,Chrome Mobile,0,0
2,0016d14a-ae18-4a02-a204-6ba53b52f2ed,exposed,2020-07-05,2,E5823,6,Chrome Mobile WebView,0,1
3,00187412-2932-4542-a8ef-3633901c98d9,control,2020-07-03,15,Samsung SM-A705FN,6,Facebook,0,0
4,001a7785-d3fe-4e11-a344-c8735acacc2c,control,2020-07-03,15,Generic Smartphone,6,Chrome Mobile,0,0


#### Verify user uniqueness in the dataframe to make sure we are not counting the same client multiple times.

In [5]:
print('are there any duplicades?',df['auction_id'].count() != df['auction_id'].nunique())

are there any duplicades? False


#### Check whether the control and treatment groups follow the expected allocation ratio(50%-50%).

In [6]:
experiment_allocation = df.value_counts('experiment', normalize=True)
experiment_allocation

experiment
control    0.504024
exposed    0.495976
Name: proportion, dtype: float64

In [7]:
control_perc = round(experiment_allocation.iloc[0],4)
exposed_perc = round(experiment_allocation.iloc[1],4)
control_n = df.loc[df['experiment']=='control']['experiment'].count()
exposed_n = df.loc[df['experiment']=='exposed']['experiment'].count()
total_n = control_n + exposed_n

print(f'control: {control_n} -> {control_perc}%')
print(f'exposed: {exposed_n} -> {exposed_perc}%')

control: 4071 -> 0.504%
exposed: 4006 -> 0.496%


#### There is a difference between the control and exposed groups (SRM ‚Äì Sample Ratio Mismatch), so it is necessary to perform a Chi-square goodness-of-fit test to determine whether the difference is statistically significant.

In [8]:
# H0: The allocated traffic aligns with the experimental design, indicating no statistically significant difference
# H1: A statistically significant difference is present

from scipy.stats import chisquare

observed = [control_n, exposed_n]
expected = [total_n/2, total_n/2]
alpha = 0.01

In [9]:
chi = chisquare(observed, f_exp=expected)

print(chi)
if chi[1] < alpha:
    print("SRM may be present")
else:
    print("SRM likely not present")

Power_divergenceResult(statistic=np.float64(0.5230902562832735), pvalue=np.float64(0.4695264353014863))
SRM likely not present


#### ‚úÖ We fail to reject the null hypothesis, meaning that the observed distribution does not differ significantly from the expected one.

#### A/A tests:
- Presents an identical experience to two groups of users
- Reveals bugs in experimental setup
- No statistically significance differences between the metrics
- False positives can still happen at the specified _alpha_ (5% of the time)
- Reveals imbalances in distributions across groups (e.g. browsers, devices, etc.)

In the following results, we observe distribution imbalances across groups, which could indicate improper randomization or be related to one of the previously mentioned issues. The imbalance may lead to invalid results, so the analysis should not proceed with this data. It may be necessary to regenerate the groups.

In [11]:
df.groupby('experiment')['browser'].value_counts(normalize=True)

experiment  browser                   
control     Chrome Mobile                 0.591992
            Facebook                      0.137804
            Samsung Internet              0.120855
            Chrome Mobile WebView         0.071727
            Mobile Safari                 0.060427
            Chrome Mobile iOS             0.008352
            Mobile Safari UI/WKWebView    0.007369
            Pinterest                     0.000491
            Android                       0.000246
            Chrome                        0.000246
            Opera Mini                    0.000246
            Puffin                        0.000246
exposed     Chrome Mobile                 0.535197
            Chrome Mobile WebView         0.298802
            Samsung Internet              0.082876
            Facebook                      0.050674
            Mobile Safari                 0.022716
            Chrome Mobile iOS             0.004244
            Mobile Safari UI/WKWebView    0