<center><h1>FarmBug's A/B Test</h1></center>

Brian is a Product Manager at FarmBurg, a company that makes a farming simulation social network game. In the FarmBurg game, you can plow, plant, and harvest different crops.

Today, you will be acting as Brian’s data analyst for an A/B Test that he has been conducting.

Brian tells you that he ran an A/B test with three different groups: A, B, and C. You’re kind of busy today, so you don’t ask too many questions about the differences between A, B, and C. Maybe they were shown three different versions of an ad. Who cares?

(HINT: you will care later)

In [1]:
import pandas as pd

df = pd.read_csv("clicks.csv")
df.head()

Unnamed: 0,user_id,group,click_day
0,8e27bf9a-5b6e-41ed-801a-a59979c0ca98,A,
1,eb89e6f0-e682-4f79-99b1-161cc1c096f1,A,
2,7119106a-7a95-417b-8c4c-092c12ee5ef7,A,
3,e53781ff-ff7a-4fcd-af1a-adba02b2b954,A,
4,02d48cf1-1ae6-40b3-9d8b-8208884a0904,A,Saturday


We need to help Brian determine whether or not there is a significant difference in the percent of users who purchased the upgrade package among groups A, B, and C.

Define a new column called is_purchase which is "Purchase" if <code>click_day</code> is not <code>None</code> and "No Purchase" if <code>click_day</code> is <code>None</code>. This will tell us if each visitor clicked on the Purchase link.

In [2]:
df["is_purchase"] = df.click_day.apply(lambda x: "Purchase" if pd.notnull(x) else "Not Purchase")
df.head()

Unnamed: 0,user_id,group,click_day,is_purchase
0,8e27bf9a-5b6e-41ed-801a-a59979c0ca98,A,,Not Purchase
1,eb89e6f0-e682-4f79-99b1-161cc1c096f1,A,,Not Purchase
2,7119106a-7a95-417b-8c4c-092c12ee5ef7,A,,Not Purchase
3,e53781ff-ff7a-4fcd-af1a-adba02b2b954,A,,Not Purchase
4,02d48cf1-1ae6-40b3-9d8b-8208884a0904,A,Saturday,Purchase


We want to count the number of users who made a purchase from each group. Use groupby to count the number of "Purchase" and "No Purchase" from each group. Save your answer to the variable <code>purchase_counts</code>

In [3]:
purchase_counts = df.groupby(['group', 'is_purchase']).user_id.count().reset_index()
purchase_counts

Unnamed: 0,group,is_purchase,user_id
0,A,Not Purchase,1350
1,A,Purchase,316
2,B,Not Purchase,1483
3,B,Purchase,183
4,C,Not Purchase,1583
5,C,Purchase,83


#### Performing a significance test.

The data from this A/B test is categorical data. Why? Because a user's response can be either "Purchase" or "No Purchase". There are more than two categories for the groups, either A,B,C. This leads us to perform a Chi Square Test (comparing two categorical variables: group and purchase decision)

In [4]:
from scipy.stats import chi2_contingency

"""
[[A_Purchases,A_not_purchases],
 [B_Purchases,B_not_purchases],
 [C_Purchases,C_not_purchases]]

"""

contingency = [[316, 1350],
               [183, 1483],
               [83, 1583]]


_, pvalue, _, _ = chi2_contingency(contingency)

if pvalue < 0.05:
    is_significant = True
else:
    is_significant = False

print(is_significant)

True


This tells us that there's a statistical difference between groups and purchase decision, from the contingency table we can see that people from group A were more inclined to purchasing. So, we can conclude that groups are a purchasing factor among clients.

Your day is a little less busy than you expected, so you decide to ask Brian about his test.

**You**: Hey Brian! What was that test you were running anyway?

**Brian**: It was awesome! We are trying to get users to purchase a small FarmBurg upgrade package. It’s called a microtransaction. We’re not sure how much to charge for it, so we tested three different price points: \\$0.99, \\$1.99, and \\$4.99.It looks like significantly more people bought the upgrade package for $0.99, so I guess that’s what we’ll charge.

**You**: Oh no! I should have asked you this before we did that chi-squared test. I don’t think that this was the right test at all. It’s true that more people wanted to purchase the upgrade at \\$0.99; you probably expected that. What we really want to know is if each price point allows us to make enough money that we can exceed some target goal. Brian, how much do you think it cost to build this feature?

**Brian**: Hmm. I guess that we need to generate a minimum of \\$1000 per week in order to justify this project.

**You**: We have some work to do!

Let’s assume that <code>num_visits</code> is how many visitors we generally get each week. Given that, calculate the percent of visitors who would need to purchase the upgrade package at each price point (\\$0.99, \\$1.99, \\$4.99) in order to generate Brian’s target of \\$1,000 per week.

Save the results to:

* p_clicks_099
* p_clicks_199
* p_clicks_499

Note that for higher price points, you’ll need to sell fewer upgrade packages in order to meet your target.

In [5]:
num_visits = len(df)

p_clicks_099 = (1000 / 0.99) / num_visits
p_clicks_199 = (1000 / 1.99) / num_visits
p_clicks_499 = (1000 / 4.99) / num_visits

print("Percentage of users required to purchase at a $0.99 price point to meet $1000 target: ", p_clicks_099)
print("Percentage of users required to purchase at a $1.99 price point to meet $1000 target: ", p_clicks_199)
print("Percentage of users required to purchase at a $4.99 price point to meet $1000 target: ", p_clicks_499)

Percentage of users required to purchase at a $0.99 price point to meet $1000 target:  0.20210104243717691
Percentage of users required to purchase at a $1.99 price point to meet $1000 target:  0.10054272965467594
Percentage of users required to purchase at a $4.99 price point to meet $1000 target:  0.040096198800161346


#### Performing a significance test II.

We want to see if the percent of Groups A, B, and C that purchased an upgrade package is *significantly* greater than <code>p_clicks_099</code>, <code>p_clicks_199</code>, and <code>p_clicks_499</code> respectively (the percent of visitors who need to buy an upgrade package at \\$0.99, \\$1.99 and \\$4.99 in order to make our target of \\$1,000).

In [6]:
purchase_pivot = purchase_counts.pivot(columns='is_purchase', index='group', values='user_id').reset_index()
purchase_pivot['Total users exposed to price point'] = purchase_pivot['Not Purchase'] + purchase_pivot['Purchase']
purchase_pivot

is_purchase,group,Not Purchase,Purchase,Total users exposed to price point
0,A,1350,316,1666
1,B,1483,183,1666
2,C,1583,83,1666


In [7]:
316/1666 > p_clicks_099 #Ni modo

False

In [8]:
183/1666 > p_clicks_199 #Next question: Is it significantly greater?

True

In [9]:
83/1666 > p_clicks_499 #Next question: Is it significantly greater?

True

We are comparing a *single set* of samples to a *target*. Out data is still *categorical*.

We should use a binomial test on each group to see if the observed purchase rate is significantly greater than what we need in order to generate at least \\$1,000 per week.

In [10]:
from scipy.stats import binom_test

pvalueA = binom_test(316, 1666, p_clicks_099)
pvalueB = binom_test(183, 1666, p_clicks_199)
pvalueC = binom_test(83, 1666, p_clicks_499)

print("There's a {} probability that there's no significant difference between the actual A-package sales and the required A-package sales \n".format(pvalueA))
print("There's a {} probability that there's no significant difference between the actual B-package sales and the required B-package sales \n".format(pvalueB))
print("There's a {} probability that there's no significant difference between the actual C-package sales and the required C-package sales".format(pvalueC))

There's a 0.2111287299402726 probability that there's no significant difference between the actual A-package sales and the required A-package sales 

There's a 0.20660209246555486 probability that there's no significant difference between the actual B-package sales and the required B-package sales 

There's a 0.045623672477172125 probability that there's no significant difference between the actual C-package sales and the required C-package sales


What price should Brian charge for the upgrade package? Save your answer to the variable final_answer

In [11]:
final_answer = 4.99 #C-Package