## A/B Testing for ShoeFly.com

Our favorite online shoe store, ShoeFly.com is performing an A/B Test.  
They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week. Help them analyze the data using aggregate measures.

### Complete all the tasks below!

**1: import ```ad_clicks.csv```, and save it to a variable called ```ad_clicks```.**

**Examine the first few rows of ad_clicks.**

In [1]:
# code for task1
import pandas as pd

ad_clicks = pd.read_csv('ad_clicks.csv')
print(ad_clicks.head(5))

                                user_id utm_source           day  \
0  008b7c6c-7272-471e-b90e-930d548bd8d7     google  6 - Saturday   
1  009abb94-5e14-4b6c-bb1c-4f4df7aa7557   facebook    7 - Sunday   
2  00f5d532-ed58-4570-b6d2-768df5f41aed    twitter   2 - Tuesday   
3  011adc64-0f44-4fd9-a0bb-f1506d2ad439     google   2 - Tuesday   
4  012137e6-7ae7-4649-af68-205b4702169c   facebook    7 - Sunday   

  ad_click_timestamp experimental_group  
0               7:18                  A  
1                NaN                  B  
2                NaN                  A  
3                NaN                  B  
4                NaN                  B  


**2: Your manager wants to know which ad platform is getting you the most views.**

**How many views (i.e., rows of the table) came from each utm_source?**

In [2]:
# code fot task2
views_each_utm_source = ad_clicks.groupby('utm_source').user_id.count().reset_index()
print(views_each_utm_source)

  utm_source  user_id
0      email      255
1   facebook      504
2     google      680
3    twitter      215


**3: If the column ```ad_click_timestamp``` is not null, then someone actually clicked on the ad that was displayed.**

**Create a new column called ```is_click```, which is True if ```ad_click_timestamp``` is not null and False otherwise.**

In [3]:
# code fot task3
ad_clicks['is_click'] = ~ad_clicks.ad_click_timestamp.isnull()
print(ad_clicks.head(5))

                                user_id utm_source           day  \
0  008b7c6c-7272-471e-b90e-930d548bd8d7     google  6 - Saturday   
1  009abb94-5e14-4b6c-bb1c-4f4df7aa7557   facebook    7 - Sunday   
2  00f5d532-ed58-4570-b6d2-768df5f41aed    twitter   2 - Tuesday   
3  011adc64-0f44-4fd9-a0bb-f1506d2ad439     google   2 - Tuesday   
4  012137e6-7ae7-4649-af68-205b4702169c   facebook    7 - Sunday   

  ad_click_timestamp experimental_group  is_click  
0               7:18                  A      True  
1                NaN                  B     False  
2                NaN                  A     False  
3                NaN                  B     False  
4                NaN                  B     False  


**4: We want to know the percent of people who clicked on ads from each ```utm_source```.**

**Start by grouping by ```utm_source``` and ```is_click``` and counting the number of ```user_id```‘s in each of those groups. Save your answer to the variable ```clicks_by_source```.**

In [4]:
# code fot task4
clicks_by_source = ad_clicks.groupby(['utm_source', 'is_click']).user_id.count().reset_index()
print(clicks_by_source)

  utm_source  is_click  user_id
0      email     False      175
1      email      True       80
2   facebook     False      324
3   facebook      True      180
4     google     False      441
5     google      True      239
6    twitter     False      149
7    twitter      True       66


**5: Now let’s pivot the data so that the columns are ```is_click``` (either True or False), the index is ```utm_source```, and the values are ```user_id```.**

**Save your results to the variable ```clicks_pivot```.**

In [5]:
# code for task5
clicks_pivot = clicks_by_source.pivot(index='utm_source', columns='is_click', values='user_id').reset_index()
print(clicks_pivot)

is_click utm_source  False  True
0             email    175    80
1          facebook    324   180
2            google    441   239
3           twitter    149    66


**6: Create a new column in ```clicks_pivot``` called ```percent_clicked``` which is equal to the percent of users who clicked on the ad from each ```utm_source```.**

**Was there a difference in click rates for each source?**

In [6]:
# code for task6
clicks_pivot['percent_clicked'] = clicks_pivot[True] / (clicks_pivot[False] + clicks_pivot[True])
print(clicks_pivot)

is_click utm_source  False  True  percent_clicked
0             email    175    80         0.313725
1          facebook    324   180         0.357143
2            google    441   239         0.351471
3           twitter    149    66         0.306977


**7: The column ```experimental_group``` tells us whether the user was shown Ad A or B.**

**Were approximately the same number of people shown both adds?**

In [7]:
# code for task7
a_b_count = ad_clicks.groupby('experimental_group').user_id.count().reset_index()
print(a_b_count)

  experimental_group  user_id
0                  A      827
1                  B      827


**8: Using the column ```is_click``` that we defined earlier, check to see if a greater percentage of users clicked on Ad A or B.**

In [8]:
# code fot task8
clicks_percentage = ad_clicks.groupby(['experimental_group', 'is_click']).user_id.count().reset_index()
print(clicks_percentage)

  experimental_group  is_click  user_id
0                  A     False      517
1                  A      True      310
2                  B     False      572
3                  B      True      255


**9: The Product Manager for the A/B test thinks that the clicks might have changed by day of the week.**

**Start by creating two DataFrames: ```a_clicks``` and ```b_clicks```, which contain only the results for A group and B group, respectively.**

In [9]:
# code for task9
a_clicks = ad_clicks[ad_clicks.experimental_group == 'A']
b_clicks = ad_clicks[ad_clicks.experimental_group == 'B']

print(a_clicks.head(5))
print(b_clicks.head(5))

                                user_id utm_source            day  \
0  008b7c6c-7272-471e-b90e-930d548bd8d7     google   6 - Saturday   
2  00f5d532-ed58-4570-b6d2-768df5f41aed    twitter    2 - Tuesday   
5  013b0072-7b72-40e7-b698-98b4d0c9967f   facebook     1 - Monday   
6  0153d85b-7660-4c39-92eb-1e1acd023280     google   4 - Thursday   
7  01555297-d6e6-49ae-aeba-1b196fdbb09f     google  3 - Wednesday   

  ad_click_timestamp experimental_group  is_click  
0               7:18                  A      True  
2                NaN                  A     False  
5                NaN                  A     False  
6                NaN                  A     False  
7                NaN                  A     False  
                                 user_id utm_source            day  \
1   009abb94-5e14-4b6c-bb1c-4f4df7aa7557   facebook     7 - Sunday   
3   011adc64-0f44-4fd9-a0bb-f1506d2ad439     google    2 - Tuesday   
4   012137e6-7ae7-4649-af68-205b4702169c   facebook     7 - Sun

**10: For each group (```a_clicks``` and ```b_clicks```), calculate the percent of users who clicked on the ad by day.**

In [10]:
# code for task10
a_clicks_everyday = a_clicks.groupby(['day', 'is_click']).user_id.count().reset_index()
b_clicks_everyday = b_clicks.groupby(['day', 'is_click']).user_id.count().reset_index()

a_clicks_everyday_pivot = \
    a_clicks_everyday.pivot(index='day', columns='is_click', values='user_id').reset_index()
b_clicks_everyday_pivot = \
    b_clicks_everyday.pivot(index='day', columns='is_click', values='user_id').reset_index()

a_clicks_everyday_pivot['clicked_percentage'] = \
    a_clicks_everyday_pivot[True] / (a_clicks_everyday_pivot[True] + a_clicks_everyday_pivot[False])

b_clicks_everyday_pivot['clicked_percentage'] = \
    b_clicks_everyday_pivot[True] / (b_clicks_everyday_pivot[True] + b_clicks_everyday_pivot[False])

print(a_clicks_everyday_pivot)
print(b_clicks_everyday_pivot)

is_click            day  False  True  clicked_percentage
0            1 - Monday     70    43            0.380531
1           2 - Tuesday     76    43            0.361345
2         3 - Wednesday     86    38            0.306452
3          4 - Thursday     69    47            0.405172
4            5 - Friday     77    51            0.398438
5          6 - Saturday     73    45            0.381356
6            7 - Sunday     66    43            0.394495
is_click            day  False  True  clicked_percentage
0            1 - Monday     81    32            0.283186
1           2 - Tuesday     74    45            0.378151
2         3 - Wednesday     89    35            0.282258
3          4 - Thursday     87    29            0.250000
4            5 - Friday     90    38            0.296875
5          6 - Saturday     76    42            0.355932
6            7 - Sunday     75    34            0.311927


**11: Compare the results for A and B. What happened over the course of the week?**

**Do you recommend that your company use Ad A or B?**

In [11]:
# code for task11
for i in range(0, len(a_clicks_everyday_pivot)):
    print(a_clicks_everyday_pivot[True][i] / b_clicks_everyday_pivot[True][i])

1.34375
0.9555555555555556
1.0857142857142856
1.6206896551724137
1.3421052631578947
1.0714285714285714
1.2647058823529411
