# Hypothesis testing introduction

This is a very short notebook to get familiar with some key aspects covered in inductive statistics.

## 1. H0 vs H1 & T-test

Null and alternative hypothesis testing and t-score are the two basic concepts covered in this notebook. Let's assume for the t-score calculations below that the hypothesis tested have an alpha of 0.05, all variables are normally distributed and no assumption about the variance.

### 1.1 Thoughts based on assumptions

- Observations approximately follow a normal distribution -> Shapiro-Wilks test
- Homogeneity of variances (equal between treatment groups) -> Levene or Bartlett test
- Obseervations are sampled independently from each other from the same population

## 1.2 Import packages

In [1]:
import numpy as np
from scipy.stats import ttest_ind

### 1.3 Difference in sunny days between cities

In [3]:
# H0: there is no difference between the avg. n. days in SF and B
# H0: sf_mean = bst_mean (group means are equal)

# Ha (also called H1): there is a difference between the avg. n. days in SF and B
# Ha: sf_mean != bst_mean (group means are different)

In [5]:
# set alpha
alpha = 0.05

# number of sunny days per year
sf_sun = [220, 183, 301, 199, 323]
bst_sun = [125, 186, 156, 133, 202]

# calculating variances to assess if equal_var True or False
var_sf_sun = np.var(sf_sun, ddof=1)
var_bst_sun = np.var(bst_sun, ddof=1)
print(f"Variances for the two variables are: {var_sf_sun} and {var_bst_sun}")

# t-test
ttest_ind(sf_sun, bst_sun, equal_var = False)
# t_value, p_value = stats.ttest_ind(sf_sun, bst_sun, equal_var = False)

Variances for the two variables are: 3951.2000000000003 and 1102.3


Ttest_indResult(statistic=2.6673789490993483, pvalue=0.036739143285587854)

In [7]:
# p-value < alpha
# reject H0
# therefore there is a difference between avg. n. days in SF and B

### 1.4 Difference in sandwich prices between cities

In [8]:
# H0: there is no difference between the avg. sandwich price in Zurich and London
# H0: mean price_zh = mean price_l (groups means are equal)

# Ha: there is a difference
# Ha: mean price_zh != mean price_l (group means are different)

In [9]:
# sandwich prices
x = [2.34, 15.24, 4.24, 9.05, 7.04, 9.09, 2.74, 4.24]
y = [2.49, 3.13, 2.92, 3.88, 3.11, 3.31, 2.41, 3.71, 2.17, 2.64]

# calculating variances to assess if equal_var True or False
var_x = np.var(x, ddof=1)
var_y = np.var(y, ddof=1)
print(f'Variances are: {var_x}, {var_y}')

# t-test
ttest_ind(x,y, equal_var=False)

Variances are: 18.722592857142853, 0.314601111111111


Ttest_indResult(statistic=2.4482815379085707, pvalue=0.04333598269449579)

In [10]:
# p_value < alpha
# reject H0
# therefore there is a difference between avg. price for sandwiches in Zurich and London

### 1.5 Difference in drivers' speed between cities

In [11]:
# speed
x = [127, 83, 94, 98, 92]
y = [151, 102, 85, 112, 104]

# equal_var True or False
var_x = np.var(x, ddof=1)
var_y = np.var(y, ddof=1)
print(f'Variances are: {var_x}, {var_y}')

# t-test
ttest_ind(x,y, equal_var=False)

Variances are: 278.70000000000005, 601.6999999999999


Ttest_indResult(statistic=-0.9043285278726962, pvalue=0.3956676705134941)

In [12]:
# p_value > alpha
# fail to reject H0
# therefore there is no difference between avg. speed between red and blue cars