# Analyzing an A/B Test

A company has been conducting an A/B Test with three different variants, they wants us analyze their results to make some important business decisions.

They ran an A/B test with three different groups: A, B, and C.It has the following columns:

- user_id: a unique id for each visitor to their site
- group: either 'A', 'B', or 'C' depending on which group the visitor was assigned to
- is_purchase: either 'Yes' if the visitor made a purchase or 'No' if they did not.

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency
from scipy.stats import binom_test

abdata = pd.read_csv('clicks.csv')
abdata.head()

Unnamed: 0,user_id,group,is_purchase
0,8e27bf9a,A,No
1,eb89e6f0,A,No
2,7119106a,A,No
3,e53781ff,A,No
4,02d48cf1,A,Yes


Note that we have two categorical variables: group and is_purchase. We are interested in whether visitors are more likely to make a purchase if they are in any one group compared to the others. Because we want to know if there is an association between two categorical variables, we’ll start by using a Chi-Square test.

In order to run a Chi-Square test, we first need to create a contingency table of the variables group and is_purchase.

In [2]:
Xtab = pd.crosstab(abdata.group, abdata.is_purchase)
Xtab

is_purchase,No,Yes
group,Unnamed: 1_level_1,Unnamed: 2_level_1
A,1350,316
B,1483,183
C,1583,83


In [3]:
chi2, pval, dof, expected = chi2_contingency(Xtab)
print(pval)

2.4126213546684264e-35


After asking company about the test they were running we figure out:

They were trying to get users to purchase a small upgrade package which they are not sure how much to charge for it, so they tested three different price points: 0.99 (group 'A'), 1.99 (group 'B'), and 4.99 (group 'C'). It looks like significantly more people bought the upgrade package for $0.99, so they guess that’s what they’ll charge.

Now we know the Chi-Square wasn’t the right test at all. It’s true that more people wanted to purchase the upgrade at $0.99. What we really want to know is whether each price point allows us to make enough money that we can exceed some target goal. So we need to know, how much it cost to build this feature?

We guess that we need to generate a minimum of $1000 in revenue per week in order to justify this project.

In order to justify this feature, we will need to calculate the necessary purchase rate for each price point. Let’s start by calculating the number of visitors to the site this week.

As this test was run over the course of a week, so the number of visitors in abdata is equal to the number of visitors in a typical week.

In [4]:
num_visits = len(abdata)

we need to calculate the number of visitors who would need to purchase the upgrade package at each price point (0.99, 1.99, 4.99) in order to generate minimum revenue target of $1,000 per week.

In [6]:
# Calculate the purchase rate needed at 0.99
num_sales_needed_099 = 1000/0.99

Now that we know how many sales we need at a $0.99 price point, calculate the proportion of weekly visitors who would need to make a purchase in order to meet that goal.

In [11]:
p_sales_needed_099 = num_sales_needed_099/num_visits
p_sales_needed_099

0.20210104243717691

Repeat these steps for the other price points (1.99 and 4.99)

In [12]:
# Calculate the purchase rate needed at 1.99
num_sales_needed_199 = 1000/1.99
p_sales_needed_199 = num_sales_needed_199/num_visits
p_sales_needed_199

0.10054272965467594

In [13]:
# Calculate the purchase rate needed at 4.99
num_sales_needed_499 = 1000/4.99
p_sales_needed_499 = num_sales_needed_499/num_visits
p_sales_needed_499

0.040096198800161346

As we see for higher price points, we’ll need to sell fewer upgrade packages in order to meet our minimum revenue target — so the proportions should decrease as the price points increase.

We want to know if the percent of Group A (the 0.99 price point) that purchased an upgrade package is significantly greater than p_sales_needed_099 (the percent of visitors who need to buy an upgrade package at 0.99 in order to make our minimum revenue target of 1,000).

To answer this question, we want to focus on just the visitors in group A. Then, we want to compare the number of purchases in that group to p_sales_needed_099.

Since we have a single sample of categorical data and want to compare it to a hypothetical population value, a binomial test is appropriate. In order to run a binomial test for group A, we need to know two pieces of information:

- The number of visitors in group A (the number of visitors who were offered the $0.99 price point)
- The number of visitors in Group A who made a purchase

In [20]:
# Calculate samp size & sales for 0.99 price point
samp_size_099 = np.sum(abdata.group == 'A')
sales_099 = np.sum((abdata.group == 'A') & (abdata.is_purchase == 'Yes'))
# Print samp size & sales for 0.99 price point
print(samp_size_099)
print(sales_099)

1666
316


In [21]:
# Calculate samp size & sales for 1.99 price point
samp_size_199 = np.sum(abdata.group == 'B')
sales_199 = np.sum((abdata.group == 'B') & (abdata.is_purchase == 'Yes'))
# Print samp size & sales for 1.99 price point
print(samp_size_199)
print(sales_199)

1666
183


In [24]:
# Calculate samp size & sales for 4.99 price point
samp_size_499 = np.sum(abdata.group == 'C')
sales_499 = np.sum((abdata.group == 'C') & (abdata.is_purchase == 'Yes'))
# Print samp size & sales for 4.99 price point
print(samp_size_499)
print(sales_499)

1666
83


perform a binomial test using binom_test() to see if the observed purchase rate is significantly greater than p_sales_needed for each price

In [26]:
# Calculate the p-value for Group A
pvalueA = binom_test(sales_099, n=samp_size_099, p=p_sales_needed_099, alternative='greater')

# Print the p-value for Group A
print(pvalueA)

0.9028081076188985


In [27]:
# Calculate the p-value for Group B
pvalueB = binom_test(sales_199, n=samp_size_199, p=p_sales_needed_199, alternative='greater')

# Print the p-value for Group B
print(pvalueB)

0.11184562623739903


In [28]:
# Calculate the p-value for Group C
pvalueC = binom_test(sales_499, n=samp_size_499, p=p_sales_needed_499, alternative='greater')

# Print the p-value for Group C
print(pvalueC)

0.027944826659907135


pvalueC is the only p-value below the threshold of 0.05. Therefore, the C group is the only group where we would conclude that the purchase rate is significantly higher than the target needed to reach 1000 revenue per week. Therefore, the company should charge $4.99 for the upgrade.

In [30]:
# Set the correct value for the final answer variable
final_answer = '4.99'

# Print the chosen price group
print(final_answer)

4.99
