# Hypothesis Testing
---

> Simulation

In [1]:
import numpy as np

# select a random seed to replicate the results
np.random.seed(42)

In [4]:
# sales history

history = 365

# generate a one-year sale from store A

mean_A = 20
std_A = 5
shop_A_sales = np.random.normal(mean_A, std_A, history)

# generate one-year sales for store B
mean_B = 19.5
std_B = 5
shop_B_sales = np.random.normal(mean_B, std_B, history)

> We will be testing the following hypothesis: * H0 = the mean of sales of shop A equals the mean of sales of shop B (i.e. the difference between the sales is equal to zero) * HA = the means are not equal

In [2]:
# set the significance level
alpha = 0.05

In [7]:
# print the store A mean
print('shop_A_sales mean:', shop_A_sales.mean())

# print the store B mean
print('shop_B_sales mean:', shop_B_sales.mean())

# the difference in the means
observed_means_diff = shop_A_sales.mean() - shop_B_sales.mean()
print('observed_means_diff:', observed_means_diff)

shop_A_sales mean: 20.04973201106029
shop_B_sales mean: 19.309929401404304
observed_means_diff: 0.7398026096559853


Because the mean of sales in store A is not so far from the mean of sales in store B, and their standard deviations are equal, it is tough to decide if the sales are equal.

In [9]:
both_sales = np.concatenate((shop_A_sales, shop_B_sales))

# permutation
sales_perm = np.random.permutation(both_sales)

# permutation replicates 
perm_shop_A_sales = sales_perm[:len(shop_A_sales)]
perm_shop_B_sales = sales_perm[len(shop_A_sales):]

In [10]:
# After this step, we have to compute the difference between the permutation replicates means.
print(perm_shop_A_sales.mean() - perm_shop_B_sales.mean())

0.21098789154327235


In [11]:
# create an empty list to store the permutation replicates means
perm_repl_means = []

for _ in range(1000):
    # permutation 
    sales_perm = np.random.permutation(both_sales)

    # permutation replicates 
    perm_shop_A_sales = sales_perm[:len(shop_A_sales)]
    perm_shop_B_sales = sales_perm[len(shop_A_sales):]

    # permutation replicates mean
    perm_repl_mean = perm_shop_A_sales.mean() - perm_shop_B_sales.mean()

    # append perm_repl_mean to list
    perm_repl_means.append(perm_repl_mean)

> Note
> 
>The p-value is the probability of observing a test statistic as extreme or more extreme than >the one you've observed, given that the null hypothesis is true.

In [12]:
# compute the p-value
p = np.sum(np.abs(perm_repl_means) >= observed_means_diff) / len(perm_repl_means)

# print the result
print('p-value =', p)

p-value = 0.043


> The p-value tells us that there is about a 4.2% chance that we would get the difference of means observed in the experiment if sales were exactly the same.

In [13]:
# final decision
if p < alpha:
    print('H0 is rejected.')
else:
    print('H0 is not rejected.')


H0 is rejected.


> Because the p_value is smaller than our significance level alpha we reject the null hypothesis that our cell phone sales are equal in both stores.