# Business Decision in Retail using A/B Testing

## Problem Statement
Client X wants to know if by putting up new displays for Brand Z’s candy, the consumer will purchase more of Brand Z’s candy leading to higher revenue

Business goal: Increase total revenue for candy

In [10]:
# All the libralies used in this project

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# Hypothesis Testing
from scipy.stats import shapiro
import scipy.stats as stats

In [2]:
# Load data

transactions = pd.read_csv("transactions.csv")
ids_control = pd.read_csv("control_stores.csv")
ids_treatment = pd.read_csv("treatment_stores.csv")
products = pd.read_csv("products_of_interest.csv")

In [3]:
# Explore Data

transactions.head()

Unnamed: 0.1,Unnamed: 0,date_week,store_id,product_id,currency_code,revenue
0,0,2016-07-10 00:00:00.000000,526,31107,USD,3.32
1,1,2016-07-10 00:00:00.000000,526,30772,USD,1.99
2,2,2016-07-10 00:00:00.000000,526,30887,USD,5.32
3,3,2016-07-10 00:00:00.000000,526,31133,USD,6.98
4,4,2016-07-10 00:00:00.000000,526,31118,USD,1.99


In [4]:
#Functions definition

def get_stores_revenue(week,transactions,sample): #it gets all revenue per store per week
    
    idx_week = transactions['date_week'] == week
    i=0
    store_rev_all = [0]*len(control)
    for x in sample:
        idx_sto = transactions['store_id']==x[0]
        idx = idx_week & idx_sto
        store_rev = transactions['revenue'].loc[idx].sum()
        store_rev_all[i] = store_rev
        i=i+1
    return store_rev_all

def tbl2array(head,ids): #converts table to numerical array
    headers = np.array([head])  # get headers
    values = ids.values  # numpy array of values
    matrix = np.concatenate([headers, values])  # append to the final matrix
    return matrix

In [5]:
#Filter products of interest
products_list = tbl2array([30898],products)
idx_pro_int = np.isin(transactions.product_id, products_list)
transactions_fil = transactions[idx_pro_int]

#Fix negative values
transactions_fil.revenue = abs(transactions_fil.revenue)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  transactions_fil.revenue = abs(transactions_fil.revenue)


## Split & Define Control Group & Test Group

In [6]:
#Get control and treat store ids
control = tbl2array([384],ids_control)
treat = tbl2array([457],ids_control)

In [7]:
#define selected weeks
week1='2016-11-27 00:00:00.000000'
week2='2016-12-04 00:00:00.000000'

In [8]:
#Get stores mean revenue per week
rev_control_week1 = get_stores_revenue(week1,transactions_fil,control)
rev_control_week2 = get_stores_revenue(week2,transactions_fil,control)
rev_control = [(x + y)/2 for x, y in zip(rev_control_week1, rev_control_week2)]

rev_treat_week1 = get_stores_revenue(week1,transactions_fil,treat)
rev_treat_week2 = get_stores_revenue(week2,transactions_fil,treat)
rev_treat = [(x + y)/2 for x, y in zip(rev_treat_week1, rev_treat_week2)]

## Apply Shapiro Test for normality

If parametric apply Levene Test for homogeneity of variances<br>
If Parametric + homogeneity of variances apply T-Test<br>
If Parametric - homogeneity of variances apply Welch Test<br>
If Non-parametric apply Mann Whitney U Test directly<br>

In [11]:
#Perform t-test
stats.ttest_ind(a=rev_control, b=rev_treat)

Ttest_indResult(statistic=-0.08665119060506651, pvalue=0.9313092405895493)