### Analyzing Farmburg's A/B Test
We're working on an A/B test with three different groups: A, B, and C. The data has the following columns:
- user_id: a unique id for each visitor to the FarmBurg site
- group: either 'A', 'B', or 'C' depending on which group the visitor was assigned to
- is_purchase: either 'Yes' if the visitor made a purchase or 'No' if they did not.

In [29]:
# Import libraries
import pandas as pd
import numpy as np

from scipy.stats import chi2_contingency
from scipy.stats import binom_test

# Load the dataset and print out a sample
abdata = pd.read_csv("clicks.csv")
print(abdata.head())

    user_id group is_purchase
0  8e27bf9a     A          No
1  eb89e6f0     A          No
2  7119106a     A          No
3  e53781ff     A          No
4  02d48cf1     A         Yes


We have two categorical variables: `group` and `is_purchase`. We are interested in whether visitors are more likely to make a purchase if they are in any one group compared to the others. Because we want to know if there is an association between two categorical variables, we’ll start by using a Chi-Square test to address our question.

In [8]:
# Create a contingency table
Xtab = pd.crosstab(abdata.group, abdata.is_purchase)
# Print the result
print(Xtab)

# Run Chi-Square test and print the result
chi2, pval, dof, expected = chi2_contingency(Xtab)
print(pval)

is_purchase    No  Yes
group                 
A            1350  316
B            1483  183
C            1583   83
2.4126213546684264e-35


The p-value is equivalent to 0.0000000000000000000000000000000000241 and less than 0.05 and we can conclude that there is a significant difference in the purchase rate for groups A, B, and C.

While it’s true that more people wanted to purchase the upgrade at $0.99, we could expect that. What we really want to know is whether each price point allows us to make enough money that we can exceed some target goal.

In order to justify a new feature, we will need to calculate the necessary purchase rate for each price point. Let’s start by calculating the number of visitors to the site this week.

In [9]:
# Calculate the number of visitors
num_visits = len(abdata)
# Print the result
print(num_visits)

4998


Now that we know how many visitors we generally get each week (`num_visits`), we need to calculate the number of visitors who would need to purchase the upgrade package at each price point ($0.99, $1.99, $4.99) in order to generate minimum revenue target of $1,000 per week.

To start, we want to calculate the number of sales of $0.99 and calculate the proportion of weekly visitors who would need to make a purchase in order to reach $1,000 dollars of revenue.

In [21]:
# Calculate the number for 0,99 package
num_sales_needed_099 = 1000 / 0.99

# Calculate proportion for 0,99
p_sales_needed_099 = num_sales_needed_099 / num_visits
# Print the result
print(p_sales_needed_099)

0.20210104243717691


In [22]:
# Calculate the number for 1,99 package
num_sales_needed_199 = 1000 / 1.99
# Calculate proportion for 1,99
p_sales_needed_199 = num_sales_needed_199 / num_visits
# Print the result
print(p_sales_needed_199)

# Calculate the number for 4,99 package
num_sales_needed_499 = 1000 / 4.99

# Calculate proportion for 4,99
p_sales_needed_499 = num_sales_needed_499 / num_visits
# Print the result
print(p_sales_needed_499)


0.10054272965467594
0.040096198800161346


Now we want to know if the percent of Group A (the $0.99 price point) that purchased an upgrade package is significantly greater than `p_sales_needed_099` (the percent of visitors who need to buy an upgrade package at $0.99 in order to make our minimum revenue target of $1,000).

To answer this question, we want to focus on just the visitors in group A. Then, we want to compare the number of purchases in that group to `p_sales_needed_099`.

Since we have a single sample of categorical data and want to compare it to a hypothetical population value, a binomial test is appropriate. In order to run a binomial test for group A, we need to know two pieces of information:
- the number of visitors in group A (the number of visitors who were offered the $0.99 price point)
- the number of visitors in Group A who made a purchase

In [27]:
# Calculate samp size & sales for 0.99 price point
samp_size_099 = np.sum(abdata.group == 'A')
sales_099 = np.sum((abdata.group == 'A') & (abdata.is_purchase == 'Yes'))

# Print samp size & sales for 0.99 price point
print(samp_size_099)
print(sales_099)

1666
316


In [28]:
# Calculate samp size & sales for 1.99 price point
samp_size_199 = np.sum(abdata.group == 'B')
sales_199 = np.sum((abdata.group == 'B') & (abdata.is_purchase == 'Yes'))

# Print samp size & sales for 1.99 price point
print(samp_size_199)
print(sales_199)

# Calculate samp size & sales for 4.99 price point
samp_size_499 = np.sum(abdata.group == 'C')
sales_499 = np.sum((abdata.group == 'C') & (abdata.is_purchase == 'Yes'))

# Print samp size & sales for 4.99 price point
print(samp_size_499)
print(sales_499)

1666
183
1666
83


In [31]:
# Calculate the p-value for Group A
pvalueA = binom_test(sales_099, n=samp_size_099, p=p_sales_needed_099, alternative="greater")
# Print the result
print(pvalueA)

# Calculate the p-value for Group B
pvalueB = binom_test(sales_199, n=samp_size_199, p=p_sales_needed_199, alternative="greater")
#Print the result
print(pvalueB)

# Calculate the p-value for Group C
pvalueC = binom_test(sales_499, n=samp_size_499, p=p_sales_needed_499, alternative="greater")
# Print the result
print(pvalueC)


0.9028081076188554
0.11184562623740604
0.027944826659830634


`pvalueC` is the only p-value below the significant threshold of 0.05. Therefore, the C group is the only group where we would conclude that the purchase rate is significantly higher than the target needed to reach $1,000 revenue per week. Therefore, we should charge $4.99 for the upgrade.