### A/B Testing for ShoeFly.com
Our favorite online shoe store, ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week. Help them analyze the data using aggregate measures.


Examine the first few rows of ad_clicks.

In [1]:
import pandas as pd

ad_clicks = pd.read_csv('ad_clicks.csv')

print(ad_clicks.head(10))

                                user_id utm_source            day  \
0  008b7c6c-7272-471e-b90e-930d548bd8d7     google   6 - Saturday   
1  009abb94-5e14-4b6c-bb1c-4f4df7aa7557   facebook     7 - Sunday   
2  00f5d532-ed58-4570-b6d2-768df5f41aed    twitter    2 - Tuesday   
3  011adc64-0f44-4fd9-a0bb-f1506d2ad439     google    2 - Tuesday   
4  012137e6-7ae7-4649-af68-205b4702169c   facebook     7 - Sunday   
5  013b0072-7b72-40e7-b698-98b4d0c9967f   facebook     1 - Monday   
6  0153d85b-7660-4c39-92eb-1e1acd023280     google   4 - Thursday   
7  01555297-d6e6-49ae-aeba-1b196fdbb09f     google  3 - Wednesday   
8  018cea61-19ea-4119-895b-1a4309ccb148      email     1 - Monday   
9  01a210c3-fde0-4e6f-8efd-4f0e38730ae6      email    2 - Tuesday   

  ad_click_timestamp experimental_group  
0               7:18                  A  
1                NaN                  B  
2                NaN                  A  
3                NaN                  B  
4                NaN          

Your manager wants to know which ad platform is getting you the most views.

How many views (i.e., rows of the table) came from each utm_source?

In [2]:
views_by_source = ad_clicks.groupby('utm_source').user_id.count().reset_index()
print(views_by_source)

  utm_source  user_id
0      email      255
1   facebook      504
2     google      680
3    twitter      215


If the column ad_click_timestamp is not null, then someone actually clicked on the ad that was displayed.

Create a new column called is_click, which is True if ad_click_timestamp is not null and False otherwise.

In [3]:
ad_clicks['is_click'] = ~ad_clicks.ad_click_timestamp.isnull()

We want to know the percent of people who clicked on ads from each utm_source.

Start by grouping by utm_source and is_click and counting the number of user_id's in each of those groups. Save your answer to the variable clicks_by_source.

In [4]:
click_by_source = ad_clicks.groupby(['utm_source', 'is_click'])\
.user_id.count().reset_index()

Now let's pivot the data so that the columns are is_click (either True or False), the index is utm_source, and the values are user_id.

Save your results to the variable clicks_pivot.

In [5]:
clicks_pivot = click_by_source.pivot(columns='is_click',
                                     index='utm_source',
                                     values='user_id').reset_index()

Create a new column in clicks_pivot called percent_clicked which is equal to the percent of users who clicked on the ad from each utm_source.

Was there a difference in click rates for each source?

In [6]:
clicks_pivot['percent_clicked'] = clicks_pivot[True] / (clicks_pivot[True]+clicks_pivot[False])

The column experimental_group tells us whether the user was shown Ad A or Ad B.

Were approximately the same number of people shown both adds?

In [7]:
print(ad_clicks.groupby('experimental_group').count())
print(ad_clicks.groupby(['experimental_group', 'is_click']).user_id.count().reset_index())

                    user_id  utm_source  day  ad_click_timestamp  is_click
experimental_group                                                        
A                       827         827  827                 310       827
B                       827         827  827                 255       827
  experimental_group  is_click  user_id
0                  A     False      517
1                  A      True      310
2                  B     False      572
3                  B      True      255


Using the column is_click that we defined earlier, check to see if a greater percentage of users clicked on Ad A or Ad B.

In [8]:
a_clicks = ad_clicks[ad_clicks.experimental_group == 'A']
b_clicks = ad_clicks[ad_clicks.experimental_group == 'B']

For each group (a_clicks and b_clicks), calculate the percent of users who clicked on the ad by day.

In [9]:
a_day_count = a_clicks.groupby(['is_click','day']).user_id.count().reset_index()

b_day_count = b_clicks.groupby(['is_click','day']).user_id.count().reset_index()

Compare the results for A and B. What happened over the course of the week?

Do you recommend that your company use Ad A or Ad B?

In [10]:
a_day_count = a_day_count[a_day_count.is_click==True].reset_index()
b_day_count = b_day_count[b_day_count.is_click==True].reset_index()

day_click_list = pd.DataFrame(
    {'Day': a_day_count.day,
     'Ad A': a_day_count.user_id,
     'Ad B': b_day_count.user_id})
day_click_list = day_click_list[['Day', 'Ad A', 'Ad B']]
print(day_click_list)
print('Ad A Mean: ',  day_click_list['Ad A'].mean())
print('Ad B Mean: ',  day_click_list['Ad B'].mean())

             Day  Ad A  Ad B
0     1 - Monday    43    32
1    2 - Tuesday    43    45
2  3 - Wednesday    38    35
3   4 - Thursday    47    29
4     5 - Friday    51    38
5   6 - Saturday    45    42
6     7 - Sunday    43    34
Ad A Mean:  44.285714285714285
Ad B Mean:  36.42857142857143
