# Hawkins Bank | Hypothesis Testing

In [13]:
import pandas as pd
import numpy as np
import scipy.stats
from scipy.stats import ttest_1samp


In [2]:
data = pd.read_csv('df_cleaned.csv', index_col=[0])


In [3]:
data

Unnamed: 0,offer_accepted,reward,mailer_type,income_level,#_bank_accounts_open,overdraft_protection,credit_rating,#_credit_cards_held,#_homes_owned,household_size,own_your_home,average_balance
0,No,Air Miles,Letter,High,1,No,High,2,0,4,No,1160.75
1,No,Air Miles,Letter,Medium,1,No,Medium,2,2,5,Yes,147.25
2,No,Air Miles,Postcard,High,2,No,Medium,2,1,2,Yes,276.50
3,No,Air Miles,Letter,Medium,2,No,High,1,0,4,No,1219.00
4,No,Air Miles,Letter,Medium,1,No,Medium,2,1,6,Yes,1211.00
...,...,...,...,...,...,...,...,...,...,...,...,...
17971,No,Cash Back,Letter,High,1,No,Low,1,1,5,Yes,167.50
17972,No,Cash Back,Letter,High,1,No,Low,3,1,3,Yes,850.50
17973,No,Cash Back,Letter,High,1,No,Low,2,0,4,No,1087.25
17974,No,Cash Back,Letter,Medium,1,No,Medium,4,2,2,Yes,1022.25


### Two-sided (tailed) test:

H0: mean average balance is equal to average balance of majority class (offer_accepted=No)
    
H1: mean average balance is NOT equal to average balance of majority class (offer_accepted=No)

Remember: Hypothesis testing is for the ENTIRE population, not just the sample. 

In [8]:
offer_rejected = data[data["offer_accepted"] == 'No']


No    16955
Name: offer_accepted, dtype: int64

In [11]:
offer_rejected.describe()

Unnamed: 0,#_bank_accounts_open,#_credit_cards_held,#_homes_owned,household_size,average_balance
count,16955.0,16955.0,16955.0,16955.0,16955.0
mean,1.255913,1.903804,0.780419,3.502625,940.848894
std,0.473093,0.79651,0.671269,1.113703,350.408076
min,1.0,1.0,0.0,1.0,48.25
25%,1.0,1.0,0.0,3.0,787.0
50%,1.0,2.0,1.0,3.0,1007.0
75%,1.0,2.0,1.0,4.0,1153.375
max,3.0,4.0,3.0,8.0,3366.25


In [12]:
offer_rejected["average_balance"].mean()
#this is the average average balance of the majority class (people who didn't take the credit card offer)

940.8488941315246

H0: mean average balance of population($) (all Hawkins Bank customers) = 941 (the mean average balance of the majority class)

H1: mean average balance of population($) (all Hawkins Bank customers) != 941 (the mean average balance of the majority class)

In [14]:
# we select a value for alpha of 0.05 (p-value threshold)

stat, pval = ttest_1samp(data['average_balance'], 941)

print('stat is  ', stat)
print('pvalue for the two-tailed test is ', pval)

stat is   -0.18541587006702911
pvalue for the two-tailed test is  0.8529049537596921


### Conclusion: We fail to reject the null hypothesis. 

#### There's a better chance that the mean average balance for all Hawkins Bank customers is 941 and the negative stat indicates that the real mean average balance is slightly less than 941. 

In [15]:
data["average_balance"].mean()
#indeed here we see that the mean average balance of the sample (not the population) is $940.52 

940.5155624165554