<a href="https://colab.research.google.com/gist/Alessdelagarza/b7fe5a69acec6e21b96c9e2f2f4dae27/ab-testing-for-shoefly.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A/B Testing for ShoeFly.com

 __Project Description:__ 
Our favorite online shoe store, ShoeFly.com is 
performing an A/B Test. They have two different versions of an ad, which they 
have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week. Help them analyze the data using aggregate measures.

In [None]:
import pandas as pd

ad_clicks = pd.read_csv('ad_clicks.csv')
ad_clicks.head(5)

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,7:18,A
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B


__Q1:__ Which ad platform is gettting the most views?

In [None]:
clicks_by_source = ad_clicks.groupby('utm_source').user_id.count().reset_index()
print(clicks_by_source)

  utm_source  user_id
0      email      255
1   facebook      504
2     google      680
3    twitter      215


>__A1:__ Google had the largestt number of ad clicks with a total of 680

__Q2:__ Create a new column `is_click` that shoes True if `ad_click_timestamp`is not null and False otherwise

In [None]:
ad_clicks['is_click'] = ~ad_clicks.ad_click_timestamp.isnull()
ad_clicks.head(5)

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group,is_click
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,7:18,A,True
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B,False
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A,False
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B,False
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B,False


__Q3:__ We now want to know the percent of people who clicked on the ads per source. First we need to group by `utm_source` and `is_click` 

In [None]:
clicks_by_source = ad_clicks.groupby(['utm_source', 'is_click']).user_id.count().reset_index()
clicks_by_source

Unnamed: 0,utm_source,is_click,user_id
0,email,False,175
1,email,True,80
2,facebook,False,324
3,facebook,True,180
4,google,False,441
5,google,True,239
6,twitter,False,149
7,twitter,True,66


> Now, lets pivot the table to make it easier to compare the values

In [None]:
clicks_pivot = clicks_by_source.pivot(columns='is_click', index='utm_source', values='user_id')
clicks_pivot

is_click,False,True
utm_source,Unnamed: 1_level_1,Unnamed: 2_level_1
email,175,80
facebook,324,180
google,441,239
twitter,149,66


> Lets find the percentage of clicks from total ads shown

In [None]:
clicks_pivot['percent_clicked'] = round(clicks_pivot[True] / (clicks_pivot[True] + clicks_pivot[False]) * 100, 2)
clicks_pivot

is_click,False,True,percent_clicked
utm_source,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
email,175,80,31.37
facebook,324,180,35.71
google,441,239,35.15
twitter,149,66,30.7


__Q4:__ Lets asses how many users were shown tests A or B. 

In [None]:
clicks_by_group = ad_clicks.groupby(['experimental_group']).user_id.count().reset_index()
clicks_by_group

Unnamed: 0,experimental_group,user_id
0,A,827
1,B,827


In [None]:
clicks_by_group = ad_clicks.groupby(['experimental_group', 'utm_source']).user_id.count().reset_index()
clicks_by_group_pivot = clicks_by_group.pivot(columns='experimental_group', index='utm_source', values='user_id')
clicks_by_group_pivot

experimental_group,A,B
utm_source,Unnamed: 1_level_1,Unnamed: 2_level_1
email,121,134
facebook,254,250
google,349,331
twitter,103,112


> __A4:__ Roughly the same amount of people were shown experiment A and B for each source. Overall, 827 people were shown test A and 827 people were shown Test B

__Q5:__ Check the percentage of clicks that each Ad, A or B, received.

In [None]:
clicks_by_group = ad_clicks.groupby(['experimental_group', 'is_click']).user_id.count().reset_index()
clicks_by_group
clicks_by_group_pivot = clicks_by_group.pivot(columns='is_click', index='experimental_group', values='user_id').reset_index()
clicks_by_group_pivot

is_click,experimental_group,False,True
0,A,517,310
1,B,572,255


In [None]:
clicks_by_group_pivot['percentage_clicks'] = round(clicks_by_group_pivot[True] / (clicks_by_group_pivot[True] + clicks_by_group_pivot[False]) * 100, 2)
clicks_by_group_pivot

is_click,experimental_group,False,True,percentage_clicks
0,A,517,310,37.48
1,B,572,255,30.83


> __A5:__ it appears to be that users who were shown Ad A clicked more often than those who viewed Ad B

__Q6:__ Check what happends to each Ad during different days of the week.

In [None]:
a_clicks = ad_clicks[as_clicks.experimental_group == 'A']
b_clicks = ad_clicks[as_clicks.experimental_group == 'B']

In [None]:
a_clicks_pivot = a_clicks.groupby(['is_click','day']).user_id.count().reset_index().pivot(columns='is_click', index='day',values='user_id').reset_index()
a_clicks_pivot['percent_clicked'] = round(a_clicks_pivot[True] / (a_clicks_pivot[True] + a_clicks_pivot[False]) * 100, 2)
a_clicks_pivot

is_click,day,False,True,percent_clicked
0,1 - Monday,70,43,38.05
1,2 - Tuesday,76,43,36.13
2,3 - Wednesday,86,38,30.65
3,4 - Thursday,69,47,40.52
4,5 - Friday,77,51,39.84
5,6 - Saturday,73,45,38.14
6,7 - Sunday,66,43,39.45


In [None]:
b_clicks_pivot = b_clicks.groupby(['is_click','day']).user_id.count().reset_index().pivot(columns='is_click', index='day',values='user_id').reset_index()
b_clicks_pivot['percent_clicked'] = round(b_clicks_pivot[True] / (b_clicks_pivot[True] + b_clicks_pivot[False]) * 100, 2)
b_clicks_pivot

is_click,day,False,True,percent_clicked
0,1 - Monday,81,32,28.32
1,2 - Tuesday,74,45,37.82
2,3 - Wednesday,89,35,28.23
3,4 - Thursday,87,29,25.0
4,5 - Friday,90,38,29.69
5,6 - Saturday,76,42,35.59
6,7 - Sunday,75,34,31.19


## Conclusion:
After looking at how both Ad A and Ad B performed it looks like Ad A performed better. Throughout the week *(except for tuesday)* the percentage of people who saw Ad A clicked on the Shoefly.com link. This shows that Ad A had a better performance over all and should be considered the better option for Shoefly.com