# A/B Testing Project

<p><b>In this Project I am going to analyze data from the ad_clicks.csv file and perform AB Testing</b></p>


In [1]:
# import pandas for data analytics
import pandas as pd

# read csv file
ad_clicks = pd.read_csv('ad_clicks.csv')

#print first 10 values
print(ad_clicks.head())


                                user_id utm_source           day  \
0  008b7c6c-7272-471e-b90e-930d548bd8d7     google  6 - Saturday   
1  009abb94-5e14-4b6c-bb1c-4f4df7aa7557   facebook    7 - Sunday   
2  00f5d532-ed58-4570-b6d2-768df5f41aed    twitter   2 - Tuesday   
3  011adc64-0f44-4fd9-a0bb-f1506d2ad439     google   2 - Tuesday   
4  012137e6-7ae7-4649-af68-205b4702169c   facebook    7 - Sunday   

  ad_click_timestamp experimental_group  
0               7:18                  A  
1                NaN                  B  
2                NaN                  A  
3                NaN                  B  
4                NaN                  B  


<p><b>Next up I am going to see how many views came from each utm_source</b></p>

In [3]:
views_by_source = ad_clicks.groupby('utm_source').user_id.count().reset_index()
print(views_by_source.head())

  utm_source  user_id
0      email      255
1   facebook      504
2     google      680
3    twitter      215


<p><b>Next up I am going to create a new column which is <code>True</code> if <code>ad_click_timestamp</code> is not null and <code>False</code> otherwise</b></p>

In [5]:
ad_clicks['is_click'] = ~ad_clicks.ad_click_timestamp.isnull()
print(ad_clicks.head())

                                user_id utm_source           day  \
0  008b7c6c-7272-471e-b90e-930d548bd8d7     google  6 - Saturday   
1  009abb94-5e14-4b6c-bb1c-4f4df7aa7557   facebook    7 - Sunday   
2  00f5d532-ed58-4570-b6d2-768df5f41aed    twitter   2 - Tuesday   
3  011adc64-0f44-4fd9-a0bb-f1506d2ad439     google   2 - Tuesday   
4  012137e6-7ae7-4649-af68-205b4702169c   facebook    7 - Sunday   

  ad_click_timestamp experimental_group  is_click  
0               7:18                  A      True  
1                NaN                  B     False  
2                NaN                  A     False  
3                NaN                  B     False  
4                NaN                  B     False  


<p><b>Next up I am going to Find % of people who clicked ads from each <code>utm_source</code></b></p>

In [6]:
clicks_by_source = ad_clicks.groupby(['utm_source', 'is_click']).user_id.count().reset_index()
print(clicks_by_source.head())

  utm_source  is_click  user_id
0      email     False      175
1      email      True       80
2   facebook     False      324
3   facebook      True      180
4     google     False      441


<p><b>Next up I am going to pivot the data</b></p>

In [8]:
clicks_pivot = clicks_by_source.pivot(columns='is_click', index='utm_source', values='user_id').reset_index()
print(clicks_pivot.head())

is_click utm_source  False  True
0             email    175    80
1          facebook    324   180
2            google    441   239
3           twitter    149    66


<p><b>Next up I am going to create a new column, equal to % of users who clicked ad from each <code>utm_source</code></b></p>

In [9]:
clicks_pivot['percent_clicked'] = clicks_pivot[True] / (clicks_pivot[True] + clicks_pivot[False])
print(clicks_pivot.head())

is_click utm_source  False  True  percent_clicked
0             email    175    80         0.313725
1          facebook    324   180         0.357143
2            google    441   239         0.351471
3           twitter    149    66         0.306977


<p><b>Next up I am going to print whether the user was shown Ad A or Ad B.</b></p>

In [10]:
print(ad_clicks.groupby('experimental_group').user_id.count().reset_index())

  experimental_group  user_id
0                  A      827
1                  B      827


<p><b>Next up I am going to check to see if a greater percentage of users clicked on Ad A or Ad B.</b></p>

In [11]:
print(ad_clicks.groupby(['experimental_group', 'is_click']).user_id.count().reset_index())

  experimental_group  is_click  user_id
0                  A     False      517
1                  A      True      310
2                  B     False      572
3                  B      True      255


<p><b>Next up I am going to create two DataFrames: <code>a_clicks</code> and <code>b_clicks</code>,contain only the results for A group and B group.</b></p>

In [20]:
a_clicks = ad_clicks[ad_clicks.experimental_group == 'A']
b_clicks = ad_clicks[ad_clicks.experimental_group == 'B']
print('\nA clicks:\n')
print(a_clicks.head())
print('\nB clicks:\n')
print(b_clicks.head())


A clicks:

                                user_id utm_source            day  \
0  008b7c6c-7272-471e-b90e-930d548bd8d7     google   6 - Saturday   
2  00f5d532-ed58-4570-b6d2-768df5f41aed    twitter    2 - Tuesday   
5  013b0072-7b72-40e7-b698-98b4d0c9967f   facebook     1 - Monday   
6  0153d85b-7660-4c39-92eb-1e1acd023280     google   4 - Thursday   
7  01555297-d6e6-49ae-aeba-1b196fdbb09f     google  3 - Wednesday   

  ad_click_timestamp experimental_group  is_click  
0               7:18                  A      True  
2                NaN                  A     False  
5                NaN                  A     False  
6                NaN                  A     False  
7                NaN                  A     False  

B clicks:

                                 user_id utm_source            day  \
1   009abb94-5e14-4b6c-bb1c-4f4df7aa7557   facebook     7 - Sunday   
3   011adc64-0f44-4fd9-a0bb-f1506d2ad439     google    2 - Tuesday   
4   012137e6-7ae7-4649-af68-205b4702169

<p><b>Next up for each group, I am going to calculate % of users who clicked on the ad by day.</b></p>

In [21]:
a_clicks_pivot = a_clicks.groupby(['is_click', 'day']).user_id.count().reset_index().pivot(columns='is_click', index='day', values='user_id').reset_index()

b_clicks_pivot = b_clicks.groupby(['is_click', 'day']).user_id.count().reset_index().pivot(columns='is_click', index='day', values='user_id').reset_index()

print('\nA clicks:\n')
print(a_clicks_pivot.head())
print('\nB clicks:\n')
print(b_clicks_pivot.head())


A clicks:

is_click            day  False  True
0            1 - Monday     70    43
1           2 - Tuesday     76    43
2         3 - Wednesday     86    38
3          4 - Thursday     69    47
4            5 - Friday     77    51

B clicks:

is_click            day  False  True
0            1 - Monday     81    32
1           2 - Tuesday     74    45
2         3 - Wednesday     89    35
3          4 - Thursday     87    29
4            5 - Friday     90    38


<p><b>Next up Compare the results for A and B.</b></p>

In [22]:
a_clicks_pivot['percent_clicked'] = a_clicks_pivot[True] / (a_clicks_pivot[True] + a_clicks_pivot[False])

b_clicks_pivot['percent_clicked'] = b_clicks_pivot[True] / (b_clicks_pivot[True] + b_clicks_pivot[False])

print('\nA clicks:\n')
print(a_clicks_pivot.head())
print('\nB clicks:\n')
print(b_clicks_pivot.head())


A clicks:

is_click            day  False  True  percent_clicked
0            1 - Monday     70    43         0.380531
1           2 - Tuesday     76    43         0.361345
2         3 - Wednesday     86    38         0.306452
3          4 - Thursday     69    47         0.405172
4            5 - Friday     77    51         0.398438

B clicks:

is_click            day  False  True  percent_clicked
0            1 - Monday     81    32         0.283186
1           2 - Tuesday     74    45         0.378151
2         3 - Wednesday     89    35         0.282258
3          4 - Thursday     87    29         0.250000
4            5 - Friday     90    38         0.296875


<p><b>End Of Project.</b></p>