# Analyzing Farmburg's A/B Test
Brian is a Product Manager at FarmBurg, a company that makes a farming simulation social network game. In the FarmBurg game, you can plow, plant, and harvest different crops. Brian has been conducting an A/B Test with three different variants, and he wants us to help him analyze the results. Using the Python modules pandas and SciPy, we will help him make some important business decisions!

In [21]:
import pandas as pd
from scipy.stats import chi2_contingency,binom_test

In [5]:
df = pd.read_csv('clicks.csv')
print(df.info())
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4998 entries, 0 to 4997
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   user_id      4998 non-null   object
 1   group        4998 non-null   object
 2   is_purchase  4998 non-null   object
dtypes: object(3)
memory usage: 117.3+ KB
None


Unnamed: 0,user_id,group,is_purchase
0,8e27bf9a,A,No
1,eb89e6f0,A,No
2,7119106a,A,No
3,e53781ff,A,No
4,02d48cf1,A,Yes


In [3]:
group_a = df[df.group == 'A']
group_b = df[df.group == 'B']
group_c = df[df.group == 'C']

# Hypothesis 1
We are interested in whether visitors are more likely to make a purchase if they are in any one group compared to the others. Because we want to know if there is an association between two categorical variables, we’ll start by using a Chi-Square test to address our question.

In [7]:
crosstab = pd.crosstab(df.group,df.is_purchase)
print(crosstab)

is_purchase    No  Yes
group                 
A            1350  316
B            1483  183
C            1583   83


In [8]:
chi2,pval,dof,expected = chi2_contingency(crosstab)
print(pval)

2.4126213546684264e-35


which means it’s very close to zero (approximately 0.0000000000000000000000000000000000241). Therefore, the p-value is less than 0.05 and we can conclude that there is a significant difference in the purchase rate for groups A, B, and C.

# Hypothesis 2
Brain : We are trying to get users to purchase a small FarmBurg upgrade package. It’s called a microtransaction. We’re not sure how much to charge for it, so we tested three different price points: 0.99 for group A,1.99 for Group B, 4.99 for Group C .It looks like significantly more people bought the upgrade package for $0.99, so I guess that’s what we’ll charge.

We should have asked Brian this before we did that Chi-Square test. That wasn’t the right test at all. It’s true that more people wanted to purchase the upgrade at $0.99; we probably expected that. What we really want to know is whether each price point allows us to make enough money that we can exceed some target goal. 

Brian: Hmm. I guess that we need to generate a minimum of $1000 in revenue per week in order to justify this project.

In [18]:
num_visits = len(df)
print('--------------------0------------------------')
num_sales_needed_099 = 1000/0.99
print(num_sales_needed_099)
print('--------------------------------------------')

p_sales_needed_099  = num_sales_needed_099/num_visits *100
print(p_sales_needed_099)
print('------------------1--------------------------')

num_sales_needed_199 = 1000/1.99
print(num_sales_needed_199)
print('--------------------------------------------')

p_sales_needed_199  = num_sales_needed_199/num_visits *100
print(p_sales_needed_199)
print('-------------------4-------------------------')

num_sales_needed_499 = 1000/4.99
print(num_sales_needed_499)
print('--------------------------------------------')

p_sales_needed_499  = num_sales_needed_499/num_visits *100
print(p_sales_needed_499)



--------------------0------------------------
1010.1010101010102
--------------------------------------------
20.21010424371769
------------------1--------------------------
502.51256281407035
--------------------------------------------
10.054272965467593
-------------------4-------------------------
200.40080160320642
--------------------------------------------
4.0096198800161345


In [22]:
print('----------------binom test----------------------')
T009 = df[df.group == 'A']
samp_size_099 = len(T009)
print(samp_size_099)
sales_099 = len(T009[T009.is_purchase == 'Yes'])
print(sales_099)
print('--------------------------------------------')
T199 = df[df.group == 'B']
samp_size_199 = len(T199)
print(samp_size_199)
sales_199 = len(T199[T199.is_purchase == 'Yes'])
print(sales_199)

print('--------------------------------------------')
T499 = df[df.group == 'C']
samp_size_499 = len(T499)
print(samp_size_499)
sales_499 = len(T499[T499.is_purchase == 'Yes'])
print(sales_499)
print('--------------------------------------------')

pval1 = binom_test(316,1666,0.212,alternative ='greater')
print(pval1)
pval2 = binom_test(183,1666,0.10,alternative = 'greater')
print(pval2)
pval3 = binom_test(83,1666,0.04,alternative = 'greater')
print(pval3)


----------------binom test----------------------
1666
316
--------------------------------------------
1666
183
--------------------------------------------
1666
83
--------------------------------------------
0.9888254602240276
0.0982588983603735
0.02663954665996981


pvalueC is the only p-value below the threshold of 0.05. Therefore, the C group is the only group where we would conclude that the purchase rate is significantly higher than the target needed to reach 1000 revenue per week. Therefore, Brian should charge 4.99 for the upgrade.