# A/B Testing for a Shoe Store Website


Our favorite online shoe store, ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week.


### Analyzing Ad Sources

In [14]:
import pandas as pd

In [15]:
ad_clicks = pd.read_csv('ad_clicks.csv')
ad_clicks.head()

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,7:18,A
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B


***

**1.** we wants to know which ad platform is getting you the most views.
How many views came from each utm_source?

In [21]:
view_count = ad_clicks.groupby('utm_source').user_id.count().reset_index().sort_values(by=['user_id'], ascending=False)
view_count.rename(columns={"user_id":"view_counts"})

Unnamed: 0,utm_source,view_counts
2,google,680
1,facebook,504
0,email,255
3,twitter,215


***

 ### percent of people who clicked on ads from each utm_source

If the column ad_click_timestamp is not null, then someone actually clicked on the ad that was displayed.

We create a new column called is_click, which is True if ad_click_timestamp is not null and False otherwise.

In [22]:
ad_clicks["is_click"] =  ~ad_clicks["ad_click_timestamp"].isnull()

***

Start by grouping by utm_source and is_click and counting the number of user_id‘s in each of those groups

In [23]:
clicks_by_source = ad_clicks.groupby(['utm_source' , 'is_click']).user_id.count().reset_index()

In [24]:
clicks_by_source

Unnamed: 0,utm_source,is_click,user_id
0,email,False,175
1,email,True,80
2,facebook,False,324
3,facebook,True,180
4,google,False,441
5,google,True,239
6,twitter,False,149
7,twitter,True,66


Now we pivot the data.

In [33]:
clicks_pivot = clicks_by_source.pivot(columns='is_click', index='utm_source', values='user_id')

In [34]:
clicks_pivot.head()

is_click,False,True
utm_source,Unnamed: 1_level_1,Unnamed: 2_level_1
email,175,80
facebook,324,180
google,441,239
twitter,149,66


### difference in click rates for each source

We create a new column in clicks_pivot called percent_clicked which is equal to the percent of users who clicked on the ad from each utm_source.

In [35]:
clicks_pivot['percent_clicked'] = clicks_pivot[True] / (clicks_pivot[True] + clicks_pivot[False])

In [36]:
clicks_pivot.head()

is_click,False,True,percent_clicked
utm_source,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
email,175,80,0.313725
facebook,324,180,0.357143
google,441,239,0.351471
twitter,149,66,0.306977


### Analyzing an A/B Test

The column experimental_group tells us whether the user was shown Ad A or Ad B.

Were approximately the same number of people shown both adds?

In [39]:
experimental_group_count= ad_clicks.groupby('experimental_group').user_id.count().reset_index()
experimental_group_count

Unnamed: 0,experimental_group,user_id
0,A,827
1,B,827


Using the column is_click that we defined earlier, check to see if a greater percentage of users clicked on Ad A or Ad B.

Group by both experimental_group and is_click and count the number of user_id‘s.

You might want to use a pivot table like we did for the utm_source exercises.

In [48]:
unpivoted = ad_clicks.groupby(['experimental_group', 'is_click']).user_id.count().reset_index()
# unpivoted.rename(columns={"user_id":"count"})
unpivoted

Unnamed: 0,experimental_group,is_click,user_id
0,A,False,517
1,A,True,310
2,B,False,572
3,B,True,255


In [53]:
unpivoted.pivot(index = "experimental_group", columns = "is_click", values="user_id")

is_click,False,True
experimental_group,Unnamed: 1_level_1,Unnamed: 2_level_1
A,517,310
B,572,255


We think that the clicks might have changed by day of the week.

Start by creating two DataFrames: a_clicks and b_clicks, which contain only the results for A group and B group, respectively.

In [57]:
a_clicks = ad_clicks[ad_clicks.experimental_group == 'A']
a_clicks_count = a_clicks.groupby(['day' ,'is_click']).user_id.count().reset_index()

a_clicks_count_pivot = a_clicks_count.pivot(
  columns = 'is_click',
  index = 'day',
  values = 'user_id')
a_clicks_count_pivot

is_click,False,True
day,Unnamed: 1_level_1,Unnamed: 2_level_1
1 - Monday,70,43
2 - Tuesday,76,43
3 - Wednesday,86,38
4 - Thursday,69,47
5 - Friday,77,51
6 - Saturday,73,45
7 - Sunday,66,43


For each group (a_clicks and b_clicks), calculate the percent of users who clicked on the ad by day.

In [58]:
a_clicks_count_pivot['percent'] = a_clicks_count_pivot[True]/(a_clicks_count_pivot[True] + a_clicks_count_pivot[False])

In [59]:
a_clicks_count_pivot

is_click,False,True,percent
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1 - Monday,70,43,0.380531
2 - Tuesday,76,43,0.361345
3 - Wednesday,86,38,0.306452
4 - Thursday,69,47,0.405172
5 - Friday,77,51,0.398438
6 - Saturday,73,45,0.381356
7 - Sunday,66,43,0.394495


***

In [60]:
b_clicks = ad_clicks[ad_clicks.experimental_group == 'B']
b_clicks_count = b_clicks.groupby(['day' ,'is_click']).user_id.count().reset_index()

b_clicks_count_pivot = b_clicks_count.pivot(
  columns = 'is_click',
  index = 'day',
  values = 'user_id')
b_clicks_count_pivot

is_click,False,True
day,Unnamed: 1_level_1,Unnamed: 2_level_1
1 - Monday,81,32
2 - Tuesday,74,45
3 - Wednesday,89,35
4 - Thursday,87,29
5 - Friday,90,38
6 - Saturday,76,42
7 - Sunday,75,34


In [61]:
b_clicks_count_pivot['percent'] = b_clicks_count_pivot[True]/(b_clicks_count_pivot[True] + b_clicks_count_pivot[False])

In [62]:
b_clicks_count_pivot

is_click,False,True,percent
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1 - Monday,81,32,0.283186
2 - Tuesday,74,45,0.378151
3 - Wednesday,89,35,0.282258
4 - Thursday,87,29,0.25
5 - Friday,90,38,0.296875
6 - Saturday,76,42,0.355932
7 - Sunday,75,34,0.311927
