Analyzing A/B Test Results with Python

A/B Testing consists of running an experiment to compare the results of a test and control group. A test group might contain a particular change or variable that we want to test against the control group to see if there is a statistically significant change in the test variable. 

Often times we will run a/b testing in all sorts of marketing campaigns which may include web landing pages, email marketing templates, or paid search campaigns to test different variables against a control group. Fortune 500 companies like Amazon and Google run as many as 10,000 a/b tests every year. Always testing web design changes, customer order flows, or color changes to see if there is an improvement in customer conversion rates or other metrics that may be used to track the results of the A/B test campaign. 

Here we will analyze a marketing campaign with a/b testing to see what information we can gain from the a/b test analysis and recommendations we can make moving forward.

We will begin by importing the necessary python packages and libraries

In [3]:
import pandas as pd
import numpy as np
import random
import matplotlib as plt


We will read the csv file and explore the dataset

In [4]:
df = pd.read_csv('marketing_new.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,user_id,date_served,marketing_channel,variant,converted,language_displayed,language_preferred,age_group,date_subscribed,date_canceled,subscribing_channel,is_retained,DoW,channel_code,is_correct_lang
0,0,a100000029,2018-01-01,House Ads,personalization,True,English,English,0-18 years,2018-01-01,,House Ads,True,0.0,1.0,Yes
1,1,a100000030,2018-01-01,House Ads,personalization,True,English,English,19-24 years,2018-01-01,,House Ads,True,0.0,1.0,Yes
2,2,a100000031,2018-01-01,House Ads,personalization,True,English,English,24-30 years,2018-01-01,,House Ads,True,0.0,1.0,Yes
3,3,a100000032,2018-01-01,House Ads,personalization,True,English,English,30-36 years,2018-01-01,,House Ads,True,0.0,1.0,Yes
4,4,a100000033,2018-01-01,House Ads,personalization,True,English,English,36-45 years,2018-01-01,,House Ads,True,0.0,1.0,Yes


Explore the dataset columns

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10037 entries, 0 to 10036
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Unnamed: 0           10037 non-null  int64  
 1   user_id              10037 non-null  object 
 2   date_served          10021 non-null  object 
 3   marketing_channel    10022 non-null  object 
 4   variant              10037 non-null  object 
 5   converted            10037 non-null  bool   
 6   language_displayed   10037 non-null  object 
 7   language_preferred   10037 non-null  object 
 8   age_group            10037 non-null  object 
 9   date_subscribed      1856 non-null   object 
 10  date_canceled        577 non-null    object 
 11  subscribing_channel  1856 non-null   object 
 12  is_retained          10037 non-null  bool   
 13  DoW                  1856 non-null   float64
 14  channel_code         1856 non-null   float64
 15  is_correct_lang      10037 non-null 

In [6]:
df.variant.unique()

array(['personalization', 'control'], dtype=object)

In [9]:
df.user_id.shape[0]

10037

We have an 'Unnamed: 0' column that we need to drop and setting inplace equal to True to make the change in the underlying data file

In [22]:
df.drop(columns='Unnamed: 0', axis=1, inplace=True)

What is the number of unique users in the dataset?

In [10]:
df.user_id.nunique()

7309

What is the overall conversion rate of the marketing dataset?

In [13]:
df.converted.mean()

0.10869781807312942

In [None]:
What is the converion rate by marketing channel?

In [27]:
df.groupby(['marketing_channel'])['converted'].mean()

marketing_channel
Email        0.341593
Facebook     0.127419
House Ads    0.062962
Instagram    0.141635
Push         0.083585
Name: converted, dtype: float64

Now we can see the conversion rates of each marketing channel and compare the performance of the ads across the different marketing channels. Since we are interested in comparing the conversion rates of the a/b test campaigns we will also want to a second group by along the variant column.

In [26]:
df.groupby(['marketing_channel', 'variant'])['converted'].mean()

marketing_channel  variant        
Email              control            0.291971
                   personalization    0.388316
Facebook           control            0.058166
                   personalization    0.191511
House Ads          control            0.067398
                   personalization    0.057772
Instagram          control            0.058559
                   personalization    0.216684
Push               control            0.032051
                   personalization    0.129524
Name: converted, dtype: float64

Now we want to isolate the control group and add it to a new dataframe called df1

In [43]:
df1 = df.query('variant == "control"').groupby(['marketing_channel'])['converted'].mean()
df1_test = pd.DataFrame(df1).to_numpy
df1_test

<bound method DataFrame.to_numpy of                    converted
marketing_channel           
Email               0.291971
Facebook            0.058166
House Ads           0.067398
Instagram           0.058559
Push                0.032051>

We also want to do the same for the personalization group and add it to a new dataframe called df2

In [33]:
df2 = df.query('variant == "personalization"').groupby(['marketing_channel'])['converted'].mean()

In [None]:
df.plot(x="marketing_channel", y=["Age", "Height(in cm)"], kind="bar")

In [44]:
from matplotlib import pyplot as plt

# Set the width of the bars
wd = 0.3
x_pos = np.arange(1, 2*len(df), 2)

# Plotting the multiple bar graphs on the same figure
plt.bar(x_pos, df1_test.converted, color='r', width=wd, edgecolor='k',
        label='control')
plt.bar(x_pos+wd, df2.converted, color='y', width=wd, edgecolor='k', 
        label='personalization')

# Add xticks
plt.xticks(x_pos+wd, df1.marketing_channel.values, fontsize=15)
plt.yticks(fontsize=15)
plt.title('A/B Test Conversion Rates', fontsize=20)
plt.xlabel('Marketing Channel', fontsize=17)
plt.ylabel('Conversion Rate', fontsize=17)

plt.legend(loc='upper center', fontsize=15)
plt.show()

AttributeError: 'function' object has no attribute 'converted'