ShoeFly.Com A/B Testing

Our favorite online shoe store, ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week. Help them analyze the data using aggregate measures.

In [10]:
#Import Pandas & Data

import pandas as pd

ad_clicks = pd.read_csv("ad_clicks.csv")
ad_clicks.head()

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,7:18,A
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B


Task 1 -> Your manager wants to know which ad platform is getting you the most views.

In [11]:
ad_platform_views = ad_clicks.groupby('utm_source').user_id.count().reset_index()
print(ad_platform_views)

  utm_source  user_id
0      email      255
1   facebook      504
2     google      680
3    twitter      215


Task 2 -> If the column ad_click_timestamp is not null, then someone actually clicked on the ad that was displayed. Create a new column which is True if ad_click_timestamp is not null and False otherwise.

In [12]:
ad_clicks['is_click'] = ad_clicks.ad_click_timestamp.isnull()
print(ad_clicks.groupby('is_click').user_id.count().reset_index())

   is_click  user_id
0     False      565
1      True     1089


Task 3 -> Your manager wants to know the percent of people who clicked on ads from each utm_source.

In [13]:
# Create Clicks Table
clicks_by_source = ad_clicks.groupby(['utm_source','is_click']).user_id.count().reset_index().pivot(index = 'utm_source', columns = 'is_click', values = "user_id")
print(clicks_by_source)

is_click    False  True 
utm_source              
email          80    175
facebook      180    324
google        239    441
twitter        66    149


In [15]:
# Calculate Percentage
clicks_by_source['Percent Clicked'] = clicks_by_source[True] / (clicks_by_source[True] + clicks_by_source[False])
clicks_by_source

is_click,False,True,Percent Clicked
utm_source,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
email,80,175,0.686275
facebook,180,324,0.642857
google,239,441,0.648529
twitter,66,149,0.693023


Task 4 -> The column experimental_group tells us whether the user was shown Ad A or Ad B. Were approximately the same number of people shown both ads?

In [16]:
print(ad_clicks.groupby('experimental_group').user_id.count().reset_index())
# Yes - The same number of people were shown both ads.

  experimental_group  user_id
0                  A      827
1                  B      827


Task 5-> Using the column is_click that we defined earlier, check to see if a greater percentage of users clicked on Ad A or Ad B.

In [21]:
A_or_B = ad_clicks.groupby(['is_click','experimental_group']).user_id.count().reset_index().pivot(index = 'experimental_group', columns = 'is_click',values = 'user_id')
print(A_or_B)

is_click            False  True 
experimental_group              
A                     310    517
B                     255    572


In [23]:
A_or_B['Percentage'] = A_or_B[True] / (A_or_B[True] + A_or_B[False])
print(A_or_B)

is_click            False  True  Percentage
experimental_group                         
A                     310   517    0.625151
B                     255   572    0.691657


Task 6 -> The Product Manager for the A/B test thinks that the clicks might have changed by day of the week.

In [24]:
# Split the Ads b/w A&B
a_clicks = ad_clicks[ad_clicks['experimental_group'] == "A"]
b_clicks = ad_clicks[ad_clicks['experimental_group'] == "B"]

In [27]:
# Analyze ad A by Day
a_clicks_by_day = a_clicks.groupby(['day','is_click']).user_id.count().reset_index()
print(a_clicks_by_day.pivot(index = 'is_click', columns = 'day', values = 'user_id'))

day       1 - Monday  2 - Tuesday  3 - Wednesday  4 - Thursday  5 - Friday  \
is_click                                                                     
False             43           43             38            47          51   
True              70           76             86            69          77   

day       6 - Saturday  7 - Sunday  
is_click                            
False               45          43  
True                73          66  


In [28]:
# Analyze ad B by Day
b_clicks_by_day = b_clicks.groupby(['day','is_click']).user_id.count().reset_index()
print(b_clicks_by_day.pivot(index = 'is_click', columns = 'day', values = 'user_id'))

day       1 - Monday  2 - Tuesday  3 - Wednesday  4 - Thursday  5 - Friday  \
is_click                                                                     
False             32           45             35            29          38   
True              81           74             89            87          90   

day       6 - Saturday  7 - Sunday  
is_click                            
False               42          34  
True                76          75  
