<a href="https://colab.research.google.com/github/larixgomex/python-practice/blob/main/Home_Page_AB_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Home Page A/B test

### Here are the two versions:

[Variant A](https://drive.google.com/file/d/1LqPXgeOJ8QQ1ZfcO4_Mz26lehmyOkles/view) - Slider with a white design

[Variant B](https://drive.google.com/file/d/1rBydNNlrg5d1AmGXo8-9DsfrbE-tuAox/view) - Static page with a green design

### We need to split the users

Before we can actually run the AB Test, we need to segment our users into two groups. Let's start by importing the user data from the customers tab in [this spreadsheet](https://docs.google.com/spreadsheets/d/1lpyAhs6Yh2WZ-zqKrpfxKN08fZ3PTISvS2ajl3L6Avk/edit#gid=386045473).

In [18]:
# Import the data
import pandas as pd
customers = pd.read_csv("Greenweez Home Page Results - customers.csv")

In [19]:
# take a look at our dataframe
customers

Unnamed: 0,customers_id,avg_basket
0,9731,202.59
1,61582,22.92
2,305054,32.05
3,305036,30.46
4,10969,87.93
...,...,...
39995,273264,35.46
39996,273371,87.03
39997,70803,50.49
39998,6743,86.19


Let's adopt a naive strategy first - splitting by median customers_id

In [20]:
customers = customers.sort_values(by='customers_id').reset_index(drop=True)
customers1 = customers.iloc[:20000]
customers2 = customers.iloc[20000:]

Did we do a good job? Let's look at the mean avg_basket for both groups

In [21]:
print(customers1["avg_basket"].mean(), customers2["avg_basket"].mean())

76.670484 52.311415999999994


That's quite a difference! Should we try another strategy?

In [22]:
customers = customers.sample(frac=1).reset_index(drop=True)
customers1 = customers.iloc[:20000]
customers2 = customers.iloc[20000:]

Let's check the avg_basket again. We should have done a better job!

In [23]:
print(customers1["avg_basket"].mean(), customers2["avg_basket"].mean())

64.30687250000001 64.6750275


### The results are in

After 4 weeks, the web developers have gotten back to you with the results of the [test](https://docs.google.com/spreadsheets/d/1lpyAhs6Yh2WZ-zqKrpfxKN08fZ3PTISvS2ajl3L6Avk/edit?usp=sharing). Let's analyse them to see which variant is the best.

In [24]:
# Load in the CSV of the first day.
results = pd.read_excel("Greenweez Home Page Results.xlsx",sheet_name = '4 weeks')

In [25]:
# Have a look at your newly created dataframe


In [26]:
# reset the index to the "AB test group" column
results.set_index("AB test group", inplace= True)

In [27]:
# Make sure you know how to access the individual values - try displaying the number of sessions for the blank slider
# Try using the column/index names and not numbers to make the code more readable
results.loc["Slider blank", "Nb sessions"]

243210

### The bounce variable

The first metric we want to analyze is bounce. Here the best test is the Chi-Square test because bounce is a discrete binary variable, a customer either bounces or doesn't!*

Now that we've chosen the appropriate test, we can move forward, but we're still missing something! Since neither of these variants have been implemented before and we don't have a baseline, we'll have to create our own. Our hypothesis is that the Bounce rate is the same for both variants -- equal to the average Bounce rate of 37.40%.

Find below the theoretical number of bounces for both variants using the average bounce rate!

In [28]:
# Compute the theoretical number of bounces for both variants using the average bounce rate!
blank_theoretical_bounce = results.loc['Total', '% bounces'] * results.loc['Slider blank', 'Nb sessions']
green_theoretical_bounce = results.loc['Total', '% bounces'] * results.loc['Static green', 'Nb sessions']

Now that we have all the elements we need, compute the Chi-Square test below.

In [32]:
## With Scipy

# Import the right modules (also import numpy)
from scipy.stats import chisquare
import numpy as np

# Create arrays for the observed and expected bounce values
f_obs_bounce = np.array([results.loc['Slider blank', 'Nb bounces'], results.loc['Static green', 'Nb bounces']])
f_exp_bounce = np.array([blank_theoretical_bounce, green_theoretical_bounce])

# Calculate chisquare
chi_square_bounce = chisquare(f_obs=f_obs_bounce, f_exp=f_exp_bounce)
chi_square_bounce

Power_divergenceResult(statistic=11.614027402426252, pvalue=0.0006545625835136192)

We can safely reject the null hypothesis since the p-value is low enough (lower than our 5% threshold)

### What about the other metrics?

Let's repeat what we just did for the other valid metric: number of transactions made. Again, we need to compute the theoretical values first.

Could we also compute for number of pages visited? Why/why not?

#### Number of transactions made

In [30]:
# Compute the theoretical transactions for both variants using the conversion rate!
blank_theoretical_transactions = results.loc['Slider blank',  'Nb sessions'] * results.loc['Total', '% conversions']
green_theoretical_transactions = results.loc['Static green',  'Nb sessions'] * results.loc['Total', '% conversions']

In [31]:
# Chi-Square with the Scipy function
f_obs_transactions = np.array([results.loc['Slider blank', 'Nb transactions'], results.loc['Static green', 'Nb transactions']])
f_exp_transactions = np.array([blank_theoretical_transactions, green_theoretical_transactions])

chi_square_transactions = chisquare(f_obs=f_obs_transactions, f_exp=f_exp_transactions)
chi_square_transactions

Power_divergenceResult(statistic=1.9196028930112474, pvalue=0.1659004437802039)

Our p-value result is higher than 0.5, indicating that the null hypothesis is true; hence it is not statistically significant.