# Assignment 4
## Econ 8310 - Business Forecasting

This assignment will make use of the bayesian statistical models covered in Lessons 10 to 12. 

A/B Testing is a critical concept in data science, and for many companies one of the most relevant applications of data-driven decision-making. In order to improve product offerings, marketing campaigns, user interfaces, and many other user-facing interactions, scientists and engineers create experiments to determine the efficacy of proposed changes. Users are then randomly assigned to either the treatment or control group, and their behavior is recorded.
If the changes that the treatment group is exposed to can be measured to have a benefit in the metric of interest, then those changes are scaled up and rolled out to across all interactions.
Below is a short video detailing the A/B Testing process, in case you want to learn a bit more:
[https://youtu.be/DUNk4GPZ9bw](https://youtu.be/DUNk4GPZ9bw)

For this assignment, you will use an A/B test data set, which was pulled from the Kaggle website (https://www.kaggle.com/datasets/yufengsui/mobile-games-ab-testing). I have added the data from the page into Codio for you. It can be found in the cookie_cats.csv file in the file tree. It can also be found at [https://github.com/dustywhite7/Econ8310/raw/master/AssignmentData/cookie_cats.csv](https://github.com/dustywhite7/Econ8310/raw/master/AssignmentData/cookie_cats.csv)

The variables are defined as follows:

| Variable Name  | Definition |
|----------------|----|
| userid         | A unique number that identifies each player  |
| version        | Whether the player was put in the control group (gate_30 - a gate at level 30) or the group with the moved gate (gate_40 - a gate at level 40) |
| sum_gamerounds | The number of game rounds played by the player during the first 14 days after install.  |
| retention1     | Did the player come back and play 1 day after installing?     |
| retention7     | Did the player come back and play 7 days after installing?    |               

### The questions

You will be asked to answer the following questions in a small quiz on Canvas:
1. What was the effect of moving the gate from level 30 to level 40 on 1-day retention rates?
2. What was the effect of moving the gate from level 30 to level 40 on 7-day retention rates?
3. What was the biggest challenge for you in completing this assignment?

You will also be asked to submit a URL to your forked GitHub repository containing your code used to answer these questions.

In [None]:
import pandas as pd
import pymc as pm
import arviz as az
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import numpy as np

url = "https://github.com/dustywhite7/Econ8310/raw/master/AssignmentData/cookie_cats.csv"
ab_test = pd.read_csv(url)

# Preprocessing: Separate data by version (gate_30 and gate_40)
gate_30_data = ab_test[ab_test['version'] == 'gate_30']
gate_40_data = ab_test[ab_test['version'] == 'gate_40']

# Summarize retention rates for 1-day and 7-day retention
retention_1_gate_30 = gate_30_data['retention_1'].mean()
retention_1_gate_40 = gate_40_data['retention_1'].mean()

retention_7_gate_30 = gate_30_data['retention_7'].mean()
retention_7_gate_40 = gate_40_data['retention_7'].mean()

# Convert retention data to arrays for Bayesian modeling
retention_1_gate_30_obs = gate_30_data['retention_1'].astype(int).values
retention_1_gate_40_obs = gate_40_data['retention_1'].astype(int).values

retention_7_gate_30_obs = gate_30_data['retention_7'].astype(int).values
retention_7_gate_40_obs = gate_40_data['retention_7'].astype(int).values

# Bayesian A/B testing for 1-day retention
with pm.Model() as model_1_day:
    p_gate_30 = pm.Beta('p_gate_30', alpha=1, beta=1)
    p_gate_40 = pm.Beta('p_gate_40', alpha=1, beta=1)

    obs_gate_30 = pm.Bernoulli('obs_gate_30', p=p_gate_30, observed=retention_1_gate_30_obs)
    obs_gate_40 = pm.Bernoulli('obs_gate_40', p=p_gate_40, observed=retention_1_gate_40_obs)

    step = pm.Metropolis()
    trace_1_day = pm.sample(2000, tune=1000, step=step, return_inferencedata=True, random_seed=42)

# Bayesian A/B testing for 7-day retention
with pm.Model() as model_7_day:
    p_gate_30 = pm.Beta('p_gate_30', alpha=1, beta=1)
    p_gate_40 = pm.Beta('p_gate_40', alpha=1, beta=1)

    obs_gate_30 = pm.Bernoulli('obs_gate_30', p=p_gate_30, observed=retention_7_gate_30_obs)
    obs_gate_40 = pm.Bernoulli('obs_gate_40', p=p_gate_40, observed=retention_7_gate_40_obs)

    step = pm.Metropolis()
    trace_7_day = pm.sample(2000, tune=1000, step=step, return_inferencedata=True, random_seed=42)

# Summarize and visualize results for both tests
# I got help from Copilot to develop part of this code
az_summary_1_day = az.summary(trace_1_day, var_names=['p_gate_30', 'p_gate_40'])
az_summary_7_day = az.summary(trace_7_day, var_names=['p_gate_30', 'p_gate_40'])

az.plot_posterior(trace_1_day, var_names=['p_gate_30', 'p_gate_40'], hdi_prob=0.95)
plt.title('Posterior Distribution of 1-Day Retention Probabilities')
plt.show()

az.plot_posterior(trace_7_day, var_names=['p_gate_30', 'p_gate_40'], hdi_prob=0.95)
plt.title('Posterior Distribution of 7-Day Retention Probabilities')
plt.show()

# Return the summaries
# I got help from Copilot to develop part of this code
az_summary_1_day, az_summary_7_day