#### Analyzing Farmburg's A/B Test
Brian is a Product Manager at FarmBurg, a company that makes a farming simulation social network game. In the FarmBurg game, you can plow, plant, and harvest different crops. ​Brian has been conducting an A/B Test with three different variants, and he wants you to help him analyze the results. Using the Python modules pandas and SciPy, you will help him make some important business decisions!

Brian ran an A/B test with three different groups: A, B, and C. He has provided us with a CSV file of his results named clicks.csv. It has the following columns:

- user_id: a unique id for each visitor to the FarmBurg site
- group: either 'A', 'B', or 'C' depending on which group the visitor was assigned to
- is_purchase: either 'Yes' if the visitor made a purchase or 'No' if they did not.

In [1]:
#Importing libraries
import pandas as pd
import numpy as np

In [3]:
# Read in the `clicks.csv` file as `abdata`
abdata = pd.read_csv('/Users/elorm/Documents/Repos/Datasets/clicks.csv')
print(abdata.head())

    user_id group is_purchase
0  8e27bf9a     A          No
1  eb89e6f0     A          No
2  7119106a     A          No
3  e53781ff     A          No
4  02d48cf1     A         Yes


Note that we have two categorical variables: group and is_purchase. We are interested in whether visitors are more likely to make a purchase if they are in any one group compared to the others. Because we want to know if there is an association between two categorical variables, we’ll start by using a Chi-Square test to address our question.

In [4]:
Xtab = pd.crosstab(abdata.group, abdata.is_purchase)
print(Xtab)

is_purchase    No  Yes
group                 
A            1350  316
B            1483  183
C            1583   83


The p-value should be 2.4126213546684264e-35, which means it’s very close to zero (approximately 0.0000000000000000000000000000000000241). Therefore, the p-value is less than 0.05 and we can conclude that there is a significant difference in the purchase rate for groups A, B, and C.

Our day is a little less busy than expected, so we decide to ask Brian about his test.

Us: Hey Brian! What was that test you were running anyway?

Brian: We are trying to get users to purchase a small FarmBurg upgrade package. It’s called a microtransaction. We’re not sure how much to charge for it, so we tested three different price points: `$0.99` (group 'A'), `$1.99` (group 'B'), and `$4.99` (group 'C'). It looks like significantly more people bought the upgrade package for `$0.99`, so I guess that’s what we’ll charge.

Us: Oh no! We should have asked you this before we did that Chi-Square test. That wasn’t the right test at all. It’s true that more people wanted to purchase the upgrade at `$0.99`; you probably expected that. What we really want to know is whether each price point allows us to make enough money that we can exceed some target goal. Brian, how much do you think it cost to build this feature?

Brian: Hmm. I guess that we need to generate a minimum of `$1000` in revenue per week in order to justify this project.

Us: We have some work to do!

In order to justify this feature, we will need to calculate the necessary purchase rate for each price point. Let’s start by calculating the number of visitors to the site this week.

It turns out that Brian ran his original test over the course of a week, so the number of visitors in abdata is equal to the number of visitors in a typical week. Calculate the number of visitors in the data and save the value in a variable named num_visits. Make sure to print the value.

Now that we know how many visitors we generally get each week (num_visits), we need to calculate the number of visitors who would need to purchase the upgrade package at each price point `($0.99, $1.99, $4.99)` in order to generate Brian’s minimum revenue target of `$1,000` per week.

To start, calculate the number of sales that would be needed to reach $1,000 dollars of revenue at a price point of `$0.99`. 

In [5]:
num_visits = len(abdata)
print(num_visits)

4998


In [7]:
num_sales_needed_099 = 1000/0.99
print(num_sales_needed_099)

1010.1010101010102


Now that we know how many sales we need at a $0.99 price point, calculate the proportion of weekly visitors who would need to make a purchase in order to meet that goal. Remember that the number of weekly visitors is saved as num_visits.

In [8]:
p_sales_needed_099 = num_sales_needed_099/num_visits
print(p_sales_needed_099)

0.20210104243717691


In [9]:
#Number of sales needed for 1.99 price point
num_of_sales_needed_199 = 1000/1.99
print(num_of_sales_needed_199)
#Proportion of weekly visitors to make the number of sales
p_sales_needed_199 = num_of_sales_needed_199/num_visits
print(p_sales_needed_199)

502.51256281407035
0.10054272965467594


In [10]:
#Number of sales needed for 4.99 price point
num_of_sales_needed_499 = 1000/4.99
print(num_of_sales_needed_499)
#Proportion of weekly visitors to make the number of sales
p_sales_needed_499 = num_of_sales_needed_499/num_visits
print(p_sales_needed_499)

200.40080160320642
0.040096198800161346


In [12]:
group_a = abdata[abdata['group'] == 'A']
samp_size_099 = len(group_a)
print(samp_size_099)

1666


In [16]:
sales_099 = len(group_a[group_a['is_purchase'] == 'Yes'])
print(sales_099)

316


Calculate the sample size and number of purchases in group B (the `$1.99` price point) and save them as samp_size_199 and sales_199, respectively. Then do the same for group C (the `$4.99` price point) and save them as samp_size_499 and sales_499, respectively.

In [17]:
#Group B
group_b = abdata[abdata['group'] == 'B']
samp_size_199 = len(group_b)
print(samp_size_199)
sales_199 = len(group_b[group_b['is_purchase'] == 'Yes'])
print(sales_199)

1666
183


In [18]:
#group C
#Group C
group_c = abdata[abdata['group'] == 'C']
samp_size_499 = len(group_c)
print(samp_size_499)
sales_499 = len(group_c[group_c['is_purchase'] == 'Yes'])
print(sales_499)

1666
83


For Group A (`$0.99` price point), perform a binomial test using binom_test() to see if the observed purchase rate is significantly greater than p_sales_needed_099. Remember that there are four inputs to binom_test():

- x will be the number of purchases for Group A
- n will be the total number of visitors assigned group A
- p will be the target percent of purchases for the `$0.99` price point
alternative will indicate the alternative hypothesis for this test; in this case, we want to know if the observed purchase rate is significantly 'greater' than the purchase rate that results in the minimum revenue target.
Save the results to pvalueA, and print its value. Note that you’ll first need to import the binom_test() function from scipy.stats using the following line of code:

In [19]:
# Import the binom_test module
from scipy.stats import binom_test

# Calculate the p-value for Group A
pvalueA = binom_test(sales_099, n=samp_size_099, p=p_sales_needed_099, alternative='greater')

# Print the p-value for Group A
print(pvalueA)

# Calculate the p-value for Group B
pvalueB = binom_test(sales_199, n=samp_size_199, p=p_sales_needed_199, alternative='greater')

# Print the p-value for Group B
print(pvalueB)

# Calculate the p-value for Group C
pvalueC = binom_test(sales_499, n=samp_size_499, p=p_sales_needed_499, alternative='greater')

# Print the p-value for Group C
print(pvalueC)

# Set the correct value for the final answer variable
final_answer = '4.99'

# Print the chosen price group
print(final_answer)

0.9028081076188985
0.11184562623739903
0.027944826659907135
4.99


pvalueC is the only p-value below the threshold of 0.05. Therefore, the C group is the only group where we would conclude that the purchase rate is significantly higher than the target needed to reach `$1000` revenue per week. Therefore, Brian should charge `$4.99` for the upgrade.