                        -- --- In God we trust, everyone else must bring data -----

The importance of data isa unquestionable. ... evolution and change is the essence of life. AB testing is handful in ensuring the we are heading in the right direction rather than taking one step forward and two step backward.

A/B testing (also known as split testing or bucket testing) is a methodology for comparing two versions of a webpage or app against each other to determine which one performs better.

we aim to get a relaible information on how change in experience or how our strategy affects dependent variable.

## Task
Marketing companies want to run successful campaigns, but the market is complex and several options can work. So normally they tun A/B tests, that is a randomized experimentation process wherein two or more versions of a variable (web page, page element, banner, etc.) are shown to different segments of people at the same time to determine which version leaves the maximum impact and drive business metrics.

The companies are interested in answering two questions:

The majority of the people will be exposed to ads (the experimental group). And a small portion of people (the control group) would instead see a Public Service Announcement (PSA) (or nothing) in the exact size and place the ad would normally be.

The idea of the dataset is to analyze the groups, find if the ads were successful, how much the company can make from the ads, and if the difference between the groups is statistically significant.

*Data dictionary:*

    Index : Row index
    user id: User ID (unique)
    test group: If "ad" the person saw the advertisement, if "psa" they only saw the public service announcement
    converted: If a person bought the product then True, else is False
    total ads: Amount of ads seen by person
    most ads day: Day that the person saw the biggest amount of ads
    most ads hour: Hour of day that the person saw the biggest amount of ads
    
*Test key characteristics:*

    Controlled delibrate experiment
    Based on randomized set of users
    Defined hypothesis and ability to measure the succes
    There is a control group with default experience and at least one test group with new experience
*Test Process:*

    1.Import Packages
    2.EDA
    3.Statistical tests implementation 
    4.Analyse the results
    5.Recommend relevant decisions to run the business

### Import necessary Libraries

In [2]:
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df = pd.read_csv("marketing_AB.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,user id,test group,converted,total ads,most ads day,most ads hour
0,0,1069124,ad,False,130,Monday,20
1,1,1119715,ad,False,93,Tuesday,22
2,2,1144181,ad,False,21,Tuesday,18
3,3,1435133,ad,False,355,Tuesday,10
4,4,1015700,ad,False,276,Friday,14


In [4]:
df.info() # to check if id are column is unique()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 588101 entries, 0 to 588100
Data columns (total 7 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   Unnamed: 0     588101 non-null  int64 
 1   user id        588101 non-null  int64 
 2   test group     588101 non-null  object
 3   converted      588101 non-null  bool  
 4   total ads      588101 non-null  int64 
 5   most ads day   588101 non-null  object
 6   most ads hour  588101 non-null  int64 
dtypes: bool(1), int64(4), object(2)
memory usage: 27.5+ MB


In [5]:
# drop the index and u nnecesary column
df.drop(columns = ['Unnamed: 0', 'user id'], inplace = True)
df.head()

Unnamed: 0,test group,converted,total ads,most ads day,most ads hour
0,ad,False,130,Monday,20
1,ad,False,93,Tuesday,22
2,ad,False,21,Tuesday,18
3,ad,False,355,Tuesday,10
4,ad,False,276,Friday,14


### Exploratory Data Analysis

In [6]:
df.shape

(588101, 5)

In [7]:
df.dtypes

test group       object
converted          bool
total ads         int64
most ads day     object
most ads hour     int64
dtype: object

In [8]:
df.isna().sum()

test group       0
converted        0
total ads        0
most ads day     0
most ads hour    0
dtype: int64

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 588101 entries, 0 to 588100
Data columns (total 5 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   test group     588101 non-null  object
 1   converted      588101 non-null  bool  
 2   total ads      588101 non-null  int64 
 3   most ads day   588101 non-null  object
 4   most ads hour  588101 non-null  int64 
dtypes: bool(1), int64(2), object(2)
memory usage: 18.5+ MB


In [10]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
total ads,588101.0,24.820876,43.715181,1.0,4.0,13.0,27.0,2065.0
most ads hour,588101.0,14.469061,4.834634,0.0,11.0,14.0,18.0,23.0


In [11]:
# Unique value in categorical cólumn
cat_column = ["test group", "most ads day", "converted"]
for col in cat_column:
    coll = col.capitalize()
    print(f"{coll}:" ,df[col].unique())

Test group: ['ad' 'psa']
Most ads day: ['Monday' 'Tuesday' 'Friday' 'Saturday' 'Wednesday' 'Sunday' 'Thursday']
Converted: [False  True]


In [12]:
# convert 'Converted' to integer
df['converted'] = df['converted'].astype(int)
df.dtypes

test group       object
converted         int32
total ads         int64
most ads day     object
most ads hour     int64
dtype: object

## Statistical Testing and Interpretation 

Since we expect test group to have higher conversion rate than the control group, One tail test is appropriate. And in this case I am only interested in knowing if the Ads are better alternative to PSA else we stop the asd and channel our resources to other investment

In cases where it is paramount to know if it perform worse or better (Or if there is a significant  difference the two. then Two tail test is appropriate ie Ho: M1 = M2

*H0*: M1 <= M2 The test gp conversion rate is less than or equals control gp

*H1*: M1 > M2 the test group performs significantly better.

In [13]:
# Setting up ou
control_group =df[df['test group']== 'psa'] 
test_group = df[df['test group'] == "ad"]

t_stat, p_val = stats.ttest_ind(test_group['converted'], control_group['converted'], alternative='greater')
print(f"Test Group Analysis - T-statistic: {t_stat}, P-value:{p_val}")

Test Group Analysis - T-statistic: 7.37040597428566, P-value:8.516526313915632e-14


##### Acceptance criteria

In [14]:
if p_val < 0.05:
    print("Reject the null hypothesis: There is a significant difference in conversion rates between the ad and the 'psa' groups")
else:
    print("Fail to reject the null Hypothesis: There is no significant difference in conversion rates between the 'ad' and 'psa'")

Reject the null hypothesis: There is a significant difference in conversion rates between the ad and the 'psa' groups


## (One-way Anova test)

### Most ads Day

A one-way ANOVA (Analysis of Variance) test is appropriate when you want to compare the means of three or more independent groups to determine if at least one group mean is significantly different from the others.

>Assumptions of ANOVA:

*Normality*: The data within each group (each day) should be approximately normally distributed.

*Homogeneity of Variances*: 

    The variances across the groups (days) should be roughly equal. When this assumption is violated (heteroscedasticity), the statistical tests that assume equal variances, like the one-way ANOVA, may give misleading results, particularly by increasing the chances of a Type I error (falsely detecting a difference when there isn't one). This can be checked with tests like Levene’s test.

*Independence*: The observations should be independent of each other.

#### Check the normality of data using shapiro

In [15]:
normality_results = {
    day: stats.shapiro(df[df['most ads day'] == day]['converted'])
    for day in df['most ads day'].unique()
}

# Print normality test results
for day, result in normality_results.items():
    print(f"Normality test for {day} - W-statistic: {result.statistic}, P-value: {result.pvalue}")




Normality test for Monday - W-statistic: 0.16918176412582397, P-value: 0.0
Normality test for Tuesday - W-statistic: 0.15851646661758423, P-value: 0.0
Normality test for Friday - W-statistic: 0.12890702486038208, P-value: 0.0
Normality test for Saturday - W-statistic: 0.12406885623931885, P-value: 0.0
Normality test for Wednesday - W-statistic: 0.13990962505340576, P-value: 0.0
Normality test for Sunday - W-statistic: 0.13806015253067017, P-value: 0.0
Normality test for Thursday - W-statistic: 0.1262480616569519, P-value: 0.0



P-value of 0.0:
A p-value < 0.05 indicates strong evidence against the null hypothesis of normality. This suggests that the conversion data for each day is not normally distributed.
W-statistic:
The W-statistic values you see (ranging from about 0.12 to 0.17) are quite low, which further indicates that the data deviates significantly from a normal distribution.

Conclusion:

    Given the results of the Shapiro-Wilk test, it would be advisable to either switch to a non-parametric test like the Kruskal-Wallis test or apply data transformations if you wish to stick with ANOVA. we can use Levene's test to check equality of variance.

#### Levene's test  (check equality of variance)

Null Hypothesis (H0): The variances across the different groups (days of the week) are equal.
Alternative Hypothesis (H1): At least one group has a variance different from the others.

In [16]:

# Group the data by 'most ads day' and extract the 'converted' values
groups = [group['converted'].values for name, group in df.groupby('most ads day')]

# Performing Levene's test
levene_result = stats.levene(*groups)

# Print the results
print(f"Levene's Test - W-statistic: {levene_result.statistic}, P-value: {levene_result.pvalue}")

Levene's Test - W-statistic: 68.38818386898062, P-value: 1.8032007655359558e-85


The p-value is extremely small (much less than 0.05). This indicates strong evidence against the null hypothesis. Thus, we fail to accept the null hypothesis and conclude that the variances are significantly different across at least some of the groups.

Since the assumption of ANOVA are not met. *It is  recommended we proceed with non parametric alternative the Kruskal-Wallis test to compare the conversion rates across the days of the week.*

#### Kruskal-Wallis test
Null Hypothesis (H0): The medians of the conversion rates across Most ads Days are equal.

Alternative Hypothesis (H1): At least one group has a different median.

In [17]:
# Performing Kruskal-Wallis test
kruskal_result = stats.kruskal(*groups)

# Print the results
print(f"Kruskal-Wallis Test - H-statistic: {kruskal_result.statistic}, P-value: {kruskal_result.pvalue}")


Kruskal-Wallis Test - H-statistic: 410.0471885500196, P-value: 1.9328515571262067e-85


P-value: The extremely small p-value (much less than 0.05) indicates strong evidence against the null hypothesis. Therefore, we fail to accept the null hypothesis: there are significant differences in the medians of the conversion rates across the days most ads seen. the day with most ad seen significantly affect the conversion rate

### Most Ads Hours

Before using One way Anova test, we first test for the three assumption

#### Normality Test (Shapiro Wilk)

    Ho: Distribution of conversion rate follows a normal disitribution
    H1 : 

In [18]:

normality_results = {hour: stats.shapiro(df[df['most ads hour'] == hour]['converted']) for hour in range(24)}

# Print results
for hour, result in normality_results.items():
    print(f"Hour {hour} - W-statistic: {result.statistic}, P-value: {result.pvalue}")




Hour 0 - W-statistic: 0.11272329092025757, P-value: 0.0
Hour 1 - W-statistic: 0.08687299489974976, P-value: 0.0
Hour 2 - W-statistic: 0.056504905223846436, P-value: 0.0
Hour 3 - W-statistic: 0.0741877555847168, P-value: 0.0
Hour 4 - W-statistic: 0.09831768274307251, P-value: 0.0
Hour 5 - W-statistic: 0.12354159355163574, P-value: 0.0
Hour 6 - W-statistic: 0.12899833917617798, P-value: 0.0
Hour 7 - W-statistic: 0.11133384704589844, P-value: 0.0
Hour 8 - W-statistic: 0.11750525236129761, P-value: 0.0
Hour 9 - W-statistic: 0.11610382795333862, P-value: 0.0
Hour 10 - W-statistic: 0.12603217363357544, P-value: 0.0
Hour 11 - W-statistic: 0.1285075545310974, P-value: 0.0
Hour 12 - W-statistic: 0.13546788692474365, P-value: 0.0
Hour 13 - W-statistic: 0.1388574242591858, P-value: 0.0
Hour 14 - W-statistic: 0.15190786123275757, P-value: 0.0
Hour 15 - W-statistic: 0.15781790018081665, P-value: 0.0
Hour 16 - W-statistic: 0.1618971824645996, P-value: 0.0
Hour 17 - W-statistic: 0.15245842933654785, 

with Pvalue > 0.05, we fail to accept the Null Hypothesis. The distribution does not follows a normal distribution


#### Categorizing our hour of the Day

In [19]:
# Define a function to categorize hours into time slots
def categorize_hour(hour):
    if 5 <= hour < 12:
        return 'Morning'
    elif 12 <= hour < 17:
        return 'Afternoon'
    elif 17 <= hour < 21:
        return 'Evening'
    else:
        return 'Night'

# Apply the function to create a new column in the DataFrame
df['time_slot'] = df['most ads hour'].apply(categorize_hour)

# Display the updated DataFrame
print(df.head())

       test group  converted  total ads most ads day  most ads hour  time_slot
0              ad          0        130       Monday             20    Evening
1              ad          0         93      Tuesday             22      Night
2              ad          0         21      Tuesday             18    Evening
3              ad          0        355      Tuesday             10    Morning
4              ad          0        276       Friday             14  Afternoon
...           ...        ...        ...          ...            ...        ...
588096         ad          0          1      Tuesday             23      Night
588097         ad          0          1      Tuesday             23      Night
588098         ad          0          3      Tuesday             23      Night
588099         ad          0          1      Tuesday             23      Night
588100         ad          0          1      Tuesday             23      Night

[588101 rows x 6 columns]


In [None]:
normality_results = {slot: stats.shapiro(df[df['time_slot'] == slot]['converted']) for slot in df['time_slot']}

# Print results
for slot, result in normality_results.items():
    print(f"Hour {slot} - W-statistic: {result.statistic}, P-value: {result.pvalue}")

