# AB Testing for ShoeFly

ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week.

## Instructions
***


### Analyzing Ad Sources


1. The manager wants to know which ad platform is getting the most views.<br>**How many views came from each `utm_source`**?



2. If the column `ad_click_timestamp` is not null, then someone actually clicked on the ad that was displayed.<br>**Create a new column called `is_click`, which is True if `ad_click_timestamp` is not null and False otherwise.**



3. We want to know the percent of people who clicked on ads from each utm_source.<br>**Start by grouping by `utm_source` and `is_click` and counting the number of `user_id‘s` in each of those groups**. 



4. **Pivot the data so that the columns are `is_click `(either True or False), the index is `utm_source`, and the values are `user_id`**.<br> Save the data on a variable called `clicks_pivot`



5. **Create a new column in `clicks_pivot` called `percent_clicked`** which is equal to the percent of users who clicked on the ad from each `utm_source`.



### Analyzing an A/B Test

6. The column `experimental_group` tells us whether the user was shown Ad A or Ad B. **Were approximately the same number of people shown both adds?**


7. Using the column `is_click` that we defined earlier, **check to see if a greater percentage of users clicked on Ad A or Ad B.**


8. The Product Manager for the A/B test thinks that the clicks might have changed by day of the week.<br>**Start by creating two DataFrames**: `a_clicks` and `b_clicks` which contain only the results for A group and B group, respectively.


9. For each group (`a_clicks` and `b_clicks`), **calculate the percent of users who clicked on the ad by day**.


10. Compare the results for A and B. What happened over the course of the week?<br>Do you recommend that your company use Ad A or Ad B?

## Practice
****

In [56]:
import pandas as pd
from IPython.display import display

ad_clicks = pd.read_csv('ad_clicks.csv')

In [74]:
#1 views from each utm_source
utm_source_max = ad_clicks.groupby('utm_source').user_id.count().reset_index()

#2 True and False column
ad_clicks['is_click']= ~ad_clicks.ad_click_timestamp.isnull()
  
#3 Grouping by utm_source and is_click and counting the number of user_id‘s in each of those groups
clicks_by_source = ad_clicks.groupby(['utm_source', 'is_click']).user_id.count().reset_index()

#4 PIVOT
clicks_pivot = clicks_by_source.pivot(
    columns='is_click', 
    index='utm_source',
    values='user_id')

display(clicks_pivot)

is_click,False,True
utm_source,Unnamed: 1_level_1,Unnamed: 2_level_1
email,175,80
facebook,324,180
google,441,239
twitter,149,66


In [78]:
#5 percent of users who clicked on the ad from each utm_source
clicks_pivot['percente_clicked'] = clicks_pivot[True] / (clicks_pivot[True] + clicks_pivot[False])

#6 Number of people shown both adds
count = ad_clicks.groupby('experimental_group').user_id.count().reset_index()

#7 percentage of users clicked on Ad A or Ad B
experimental  = ad_clicks.groupby(['experimental_group','is_click']).user_id.count().reset_index()
experimental_pivot = experimental.pivot(columns='is_click', index='experimental_group',  values='user_id')
experimental_pivot['percent_clicked'] = experimental_pivot[True] / (experimental_pivot[True] + experimental_pivot[False])

display(experimental_pivot)

is_click,False,True,percent_clicked
experimental_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,517,310,0.374849
B,572,255,0.308343


In [82]:
# 8 the results for A group and B group
a_clicks = ad_clicks[ad_clicks.experimental_group == 'A']
b_clicks = ad_clicks[ad_clicks.experimental_group == 'B']

#9 calculate the percent of users who clicked on the ad by day
aclicks = a_clicks.groupby(['is_click', 'day']).user_id.count().reset_index()
bclicks= b_clicks.groupby(['is_click', 'day']).user_id.count().reset_index()

aclicks_pivot = aclicks.pivot(columns='is_click',index='day',values='user_id')
aclicks_pivot['percent_clicked'] = aclicks_pivot[True] / (aclicks_pivot[True] +  aclicks_pivot[False])

bclicks_pivot = bclicks.pivot(columns='is_click',index='day',values='user_id')
bclicks_pivot['percent_clicked'] = bclicks_pivot[True] / (bclicks_pivot[True] +  bclicks_pivot[False])


display(aclicks_pivot)
display(bclicks_pivot)

is_click,False,True,percent_clicked
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1 - Monday,70,43,0.380531
2 - Tuesday,76,43,0.361345
3 - Wednesday,86,38,0.306452
4 - Thursday,69,47,0.405172
5 - Friday,77,51,0.398438
6 - Saturday,73,45,0.381356
7 - Sunday,66,43,0.394495


is_click,False,True,percent_clicked
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1 - Monday,81,32,0.283186
2 - Tuesday,74,45,0.378151
3 - Wednesday,89,35,0.282258
4 - Thursday,87,29,0.25
5 - Friday,90,38,0.296875
6 - Saturday,76,42,0.355932
7 - Sunday,75,34,0.311927
