# A/B Testing for ShoeFly.com

An online shoe store, ShoeFly.com, is performing an A/B Test. They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week. Help them analyze the data using aggregate measures.


In [46]:
import pandas as pd

## Importing the dataset
Lets start by importing the dataset from `ad_click.csv` file into a dataset.

In [47]:
ad_clicks = pd.read_csv('ad_clicks.csv')

## Analyzing ad sources

1. Examine the first few rows of ad_clicks.

In [48]:
ad_clicks.head()

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,7:18,A
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B


2. Your manager wants to know which ad platform is getting you the most views. How many views (i.e., rows of the table) came from each utm_source?

In [49]:
platform_most_views = ad_clicks.groupby('utm_source').user_id.count().sort_values(ascending=False)
platform_most_views

utm_source
google      680
facebook    504
email       255
twitter     215
Name: user_id, dtype: int64

3. If the column `ad_click_timestamp` is not null, then someone actually clicked on the ad that was displayed. Create a new column called `is_click`, which is True if `ad_click_timestamp` is not null and False otherwise.

In [50]:
ad_clicks['is_click'] = ad_clicks.ad_click_timestamp.apply(lambda timestamp: False if pd.isna(timestamp) else True)
ad_clicks

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group,is_click
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,7:18,A,True
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B,False
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A,False
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B,False
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B,False
...,...,...,...,...,...,...
1649,fe8b5236-78f6-4192-9da6-a76bba67cfe6,twitter,7 - Sunday,,A,False
1650,fed3db6d-8c92-40e3-a4fb-1fb9d7337eb1,facebook,5 - Friday,,B,False
1651,ff3a22ff-521c-478c-87ca-7dc7b8f34372,twitter,3 - Wednesday,,B,False
1652,ff3af0d6-b092-4c4d-9f2e-2bdd8f7c0732,google,1 - Monday,22:57,A,True


4. We want to know the percent of people who clicked on ads from each `utm_source`. Start by grouping by `utm_source` and `is_click` and counting the number of user_id‘s in each of those groups. Save your answer to the variable `clicks_by_source`.

In [51]:
clicks_by_source = ad_clicks.groupby(['utm_source', 'is_click']).user_id.count().reset_index()
clicks_by_source

Unnamed: 0,utm_source,is_click,user_id
0,email,False,175
1,email,True,80
2,facebook,False,324
3,facebook,True,180
4,google,False,441
5,google,True,239
6,twitter,False,149
7,twitter,True,66


5. Now let’s pivot the data so that the columns are `is_click` (either True or False), the index is `utm_source`, and the values are `user_id`. Save your results to the variable `clicks_pivot`.

In [52]:
clicks_pivot = clicks_by_source.pivot(index='utm_source', columns='is_click', values='user_id')
clicks_pivot

is_click,False,True
utm_source,Unnamed: 1_level_1,Unnamed: 2_level_1
email,175,80
facebook,324,180
google,441,239
twitter,149,66


6. Create a new column in `clicks_pivot` called `percent_clicked` which is equal to the percent of users who clicked on the ad from each `utm_source`. Was there a difference in click rates for each source?

In [53]:
clicks_pivot['percent_clicked'] = (clicks_pivot[True] / (clicks_pivot[True] + clicks_pivot[False])) * 100
clicks_pivot

is_click,False,True,percent_clicked
utm_source,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
email,175,80,31.372549
facebook,324,180,35.714286
google,441,239,35.147059
twitter,149,66,30.697674


## Analyzing an A/B Test

7. The column `experimental_group` tells us whether the user was shown Ad A or Ad B. Were approximately the same number of people shown both ads?

In [54]:
both_ads = ad_clicks.groupby('experimental_group').user_id.count()
both_ads

experimental_group
A    827
B    827
Name: user_id, dtype: int64

8. Using the column `is_click` that we defined earlier, check to see if a greater percentage of users clicked on Ad A or Ad B.

In [55]:
percentage_clicked = ad_clicks.groupby(['experimental_group', 'is_click']).user_id.count().reset_index()
percentage_clicked

Unnamed: 0,experimental_group,is_click,user_id
0,A,False,517
1,A,True,310
2,B,False,572
3,B,True,255


In [56]:
pivot_percentage = percentage_clicked.pivot(index='experimental_group', columns='is_click', values='user_id')
pivot_percentage

is_click,False,True
experimental_group,Unnamed: 1_level_1,Unnamed: 2_level_1
A,517,310
B,572,255


In [57]:
pivot_percentage['percentage clicked'] = (pivot_percentage[True] / (pivot_percentage[True] + pivot_percentage[False])) * 100
pivot_percentage

is_click,False,True,percentage clicked
experimental_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,517,310,37.484885
B,572,255,30.834341


9. The Product Manager for the A/B test thinks that the clicks might have changed by day of the week. Start by creating two DataFrames: `a_clicks` and `b_clicks`, which contain only the results for A group and B group, respectively.

In [64]:
a_b_clicks = ad_clicks.groupby(['experimental_group', 'is_click']).user_id.count().reset_index()
a_b_clicks

Unnamed: 0,experimental_group,is_click,user_id
0,A,False,517
1,A,True,310
2,B,False,572
3,B,True,255


In [72]:
a_clicks = a_b_clicks[a_b_clicks.experimental_group == 'A'].pivot(index='experimental_group', columns='is_click', values='user_id')
a_clicks

is_click,False,True
experimental_group,Unnamed: 1_level_1,Unnamed: 2_level_1
A,517,310


In [73]:
b_clicks = a_b_clicks[a_b_clicks.experimental_group == 'B'].pivot(index='experimental_group', columns='is_click', values='user_id')
b_clicks

is_click,False,True
experimental_group,Unnamed: 1_level_1,Unnamed: 2_level_1
B,572,255


10. For each group (`a_clicks` and `b_clicks`), calculate the percent of users who clicked on the ad by day.

In [74]:
a_clicks['click %'] = (a_clicks[True] / (a_clicks[False] + a_clicks[True])) * 100
a_clicks

is_click,False,True,click %
experimental_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,517,310,37.484885


In [75]:
b_clicks['click %'] = (b_clicks[True] / (b_clicks[False] + b_clicks[True])) * 100
b_clicks

is_click,False,True,click %
experimental_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
B,572,255,30.834341
