# Farmburg A/B Test Analysis

## Project Overview

In this project, we analyze the results of an A/B test conducted by Farmburg, a farming simulation social network game. The primary objective is to determine the optimal price point for a new microtransaction feature by evaluating user purchase behavior across three different price points.

In [68]:
import pandas as pd
from scipy.stats import chi2_contingency, binom_test
import warnings

# Suppress deprecation warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

## Loading and Inspecting the Data

In [71]:
# Load the dataset
abdata = pd.read_csv('clicks.csv')

# Display the first few rows of the dataset
abdata.head()

Unnamed: 0,user_id,group,is_purchase
0,8e27bf9a,A,No
1,eb89e6f0,A,No
2,7119106a,A,No
3,e53781ff,A,No
4,02d48cf1,A,Yes


## Step 2: Creating a Contingency Table

In [74]:
# Create a contingency table for group and is_purchase
Xtab = pd.crosstab(abdata['group'], abdata['is_purchase'])

# Display the contingency table
Xtab

is_purchase,No,Yes
group,Unnamed: 1_level_1,Unnamed: 2_level_1
A,1350,316
B,1483,183
C,1583,83


### Explanation of Results

The contingency table shows the distribution of purchases across the three groups (A, B, C). Group A, corresponding to the **\$0.99** price point, has the highest number of purchases (**316**), followed by Group B (**\$1.99**) and Group C (**\$4.99**). This suggests that a lower price might be more attractive, but further analysis is required.
.


## Step 3: Performing the Chi-Square Test

In [78]:
# Perform Chi-Square test to see if there is a significant difference in purchase rates
chi2, pval, dof, expected = chi2_contingency(Xtab)

# Print the p-value
pval

2.4126213546684264e-35

### Explanation of Results
The Chi-Square test is used to determine whether there is a statistically significant difference in purchase rates among the groups. The p-value is extremely low (close to zero), indicating a significant difference in purchase behavior between the groups.

## Step 4: Calculating the Number of Visitors

In [82]:
# Calculate the number of visitors
num_visits = len(abdata)

# Print the number of visitors
num_visits

4998

## Step 5: Calculating the Number of Sales Needed for Each Price Point

In [85]:
# Calculate the number of sales needed to reach $1000 in revenue
num_sales_needed_099 = 1000 / 0.99
num_sales_needed_199 = 1000 / 1.99
num_sales_needed_499 = 1000 / 4.99

# Print the number of sales needed
num_sales_needed_099, num_sales_needed_199, num_sales_needed_499

(1010.1010101010102, 502.51256281407035, 200.40080160320642)

## Step 6: Calculating the Proportion of Visits Needed to Meet the Revenue Target

In [88]:
# Calculate the proportion of visits needed to meet the revenue target
p_sales_needed_099 = num_sales_needed_099 / num_visits
p_sales_needed_199 = num_sales_needed_199 / num_visits
p_sales_needed_499 = num_sales_needed_499 / num_visits

# Print the proportions needed
p_sales_needed_099, p_sales_needed_199, p_sales_needed_499

(0.20210104243717691, 0.10054272965467594, 0.040096198800161346)

## Step 7: Calculating Sample Sizes and Number of Purchases for Each Group

In [91]:
# Calculate the sample sizes and number of purchases for each group
samp_size_099 = sum(abdata['group'] == 'A')
sales_099 = sum((abdata['group'] == 'A') & (abdata['is_purchase'] == 'Yes'))

samp_size_199 = sum(abdata['group'] == 'B')
sales_199 = sum((abdata['group'] == 'B') & (abdata['is_purchase'] == 'Yes'))

samp_size_499 = sum(abdata['group'] == 'C')
sales_499 = sum((abdata['group'] == 'C') & (abdata['is_purchase'] == 'Yes'))

# Print the sample sizes and number of purchases
(samp_size_099, sales_099), (samp_size_199, sales_199), (samp_size_499, sales_499)

((1666, 316), (1666, 183), (1666, 83))

## Step 8: Performing Binomial Tests for Each Group

In [94]:
# Perform binomial tests for each group
pvalueA = binom_test(x=sales_099, n=samp_size_099, p=p_sales_needed_099, alternative='greater')
pvalueB = binom_test(x=sales_199, n=samp_size_199, p=p_sales_needed_199, alternative='greater')
pvalueC = binom_test(x=sales_499, n=samp_size_499, p=p_sales_needed_499, alternative='greater')

# Print the p-values
pvalueA, pvalueB, pvalueC

(0.9028081076188554, 0.11184562623740596, 0.027944826659830616)

### Conclusion

Based on the analysis:

- The **\$0.99** price point (Group A) did not meet the required threshold for generating \$1000 in revenue.
- The **\$1.99** price point (Group B) also did not meet the required threshold.
- The **\$4.99** price point (Group C) had a purchase rate that significantly exceeded the threshold needed to meet the revenue target.

Therefore, it is recommended that Brian should charge **\$4.99** for the upgrade package to maximize revenue.
