#### HYPOTHESIS TESTING

<br>

## Analyzing Farmburg's A/B Test
<hr>

FarmBurg is a company that makes a farming simulation social network game. In the FarmBurg game, you can plow, plant, and harvest different crops. We want to conduct an A/B Test with three different variants (Group A: \\$0.99, Group B: \\$1.99, Group C: \\$4.99) to determine what would be the best price point to get a weekly revenue of \\$1,000.

The `clicks.csv` has the following columns:
- `user_id`: a unique id for each visitor to the FarmBurg site
- `group`: either `A`, `B`, or `C`
- `is_purchase`: either `Yes` if the visitor made a purchase, or `No` if they did not

In [32]:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency, binom_test, binom

In [2]:
abdata = pd.read_csv('clicks.csv')
abdata.head()

Unnamed: 0,user_id,group,is_purchase
0,8e27bf9a,A,No
1,eb89e6f0,A,No
2,7119106a,A,No
3,e53781ff,A,No
4,02d48cf1,A,Yes


In [3]:
Xtab = pd.crosstab(abdata.group, abdata['is_purchase'])
print(Xtab)
#group A ($0.99) has the most clicks

is_purchase    No  Yes
group                 
A            1350  316
B            1483  183
C            1583   83


In [5]:
chi2, pval, dof, expected = chi2_contingency(Xtab)
print("{:.38f}".format(float(pval)))
print("There is a significant difference in the purchase rate for all 3 groups.")

0.00000000000000000000000000000000002413
There is a significant difference in the purchase rate for all 3 groups.


In [8]:
num_visits = len(abdata)
print("Total Rows: " + str(num_visits))

Total Rows: 4998


In [9]:
#calculate the number of sales needed to reach $1,000 using a price point of $0.99
num_sales_needed_099 = 1000 / 0.99
print("Number of Sales Needed for $0.99: " + str(num_sales_needed_099))

#calculate the proportion of weekly visitors who would need to make a purchase in order to reach that goal
p_sales_needed_099 = (num_sales_needed_099 / num_visits)
print("Probability for $0.99: " + str(p_sales_needed_099))

Number of Sales Needed for $0.99: 1010.1010101010102
Probability for $0.99: 0.20210104243717691


In [10]:
#calculate the # of sales needed to reach $1,000 using price point of $1.99
num_sales_needed_199 = 1000 / 1.99
print("Number of Sales Needed for $1.99: " + str(num_sales_needed_199))

#calculate the proportion of weekly visitors
p_sales_needed_199 = (num_sales_needed_199 / num_visits)
print("Probability for $1.99: " + str(p_sales_needed_199))

Number of Sales Needed for $1.99: 502.51256281407035
Probability for $1.99: 0.10054272965467594


In [11]:
#calculate the # of sales needed to reach $1,000 using price point of $4.99
num_sales_needed_499 = 1000 / 4.99
print("Number of Sales Needed for $4.99: " + str(num_sales_needed_499))

#calculate the proportion of weekly visitors
p_sales_needed_499 = (num_sales_needed_499 / num_visits)
print("Probability for $4.99: " + str(p_sales_needed_499))

Number of Sales Needed for $4.99: 200.40080160320642
Probability for $4.99: 0.040096198800161346


#### Group A

<b>Null: </b> p = 0.202 <br>
<b>Alternative: </b> p > 0.202

In [14]:
#sample size is group A (who were offered $0.99 price)
samp_size_099 = np.sum(abdata.group == "A")
#print(samp_size_099)

#sales_099 is those in group A who purchased
sales_099 = np.sum((abdata.group == "A") & (abdata['is_purchase'] == 'Yes'))
#print(sales_099)

pvalueA = binom_test(x = sales_099, n = samp_size_099, p = p_sales_needed_099, alternative = 'greater')
print("Group A p-value: " + str(pvalueA))

Group A p-value: 0.9028081076188985


#### Group B

<b>Null: </b> p = 0.101 <br>
<b>Alternative: </b> p > 0.101

In [17]:
#group B has the $1.99 price
samp_size_199 = np.sum(abdata.group == 'B')
#print(samp_size_199)

#sales_199 is for group B
sales_199 = np.sum((abdata.group == "B") & (abdata['is_purchase'] == 'Yes'))
#print(sales_199)

pvalueB = binom_test(x = sales_199, n = samp_size_199, p = p_sales_needed_199, alternative = 'greater')
print("Group B p-value: " + str(pvalueB))

Group B p-value: 0.11184562623739903


#### Group C

<b>Null: </b> p = 0.04 <br>
<b>Alternative: </b> p > 0.04

In [20]:
#group C has the $4.99 price
samp_size_499 = np.sum(abdata.group == 'C')
#print(samp_size_499)

#sales_499 is for group C
sales_499 = np.sum((abdata.group == "C") & (abdata['is_purchase'] == 'Yes'))
#print(sales_499)

pvalueC = binom_test(x = sales_499, n = samp_size_499, p = p_sales_needed_499, alternative = 'greater')
print("Group C p-value: " + str(pvalueC))

Group C p-value: 0.027944826659907135


The p-value for Group C is the only p-value below the threshold of 0.05 (reject the null hypothesis), meaning it's significantly higher than the original target. Therefore, the C group is the only group where we would conclude that the purchase rate would reach \\$1000 revenue per week.