# Hypothesis Testing - Example

This code is one of many examples available here: (https://towardsdatascience.com/hypothesis-testing-with-python-step-by-step-hands-on-tutorial-with-practical-examples-e805975ea96e).

In [1]:
import numpy as np
from scipy import stats
import pandas as pd

In [2]:
alpha = 0.05
pd.options.display.float_format = '{:,.4f}'.format

In [3]:
def check_normality(data):
    #perform a Shapiro-Wilk test for normality, under the H_0 that 'data' was drawn from a normal distribution
    test_stat_normality, p_value_normality = stats.shapiro(data)
    
    if p_value_normality < alpha:
        print("p value: %.4f <" % p_value_normality, alpha)
        print("Reject null hypothesis >> Data is not normally distributed.")
    else:
        print("p value: %.4f >=" % p_value_normality, alpha)
        print("Fail to reject null hypothesis >> Data is normally distributed.")

In [4]:
def check_variance_homogeneity(group1, group2):
    #perform a Levene test for equal variance, under the H_0 that all input samples are from populations with equal variances
    test_stat_var, p_value_var = stats.levene(group1,group2)
    
    if p_value_var < alpha:
        print("p value: %.4f <" % p_value_var, alpha)
        print("Reject null hypothesis >> Sample variances are different.")
    else:
        print("p value: %.4f >=" % p_value_var, alpha)
        print("Fail to reject null hypothesis >> Sample variances are the same.")

A professor believes that students who attend to live (online) classes are more sucessful than those who watch recorded classes later. He recorded the avg grades of the students at the end of the semester.

In [5]:
sync = np.array([94. , 84.9, 82.6, 69.5, 80.1, 79.6, 81.4, 77.8, 81.7, 78.8, 73.2,
       87.9, 87.9, 93.5, 82.3, 79.3, 78.3, 71.6, 88.6, 74.6, 74.1, 80.6])
asyncr = np.array([77.1, 71.7, 91. , 72.2, 74.8, 85.1, 67.6, 69.9, 75.3, 71.7, 65.7, 72.6, 71.5, 78.2])

Assumptions:\
Observations in each sample are independent and identically distributed (iid);\
Observations in each sample are normally distributed;\
Observations in each sample have the same variance.\
\
$H_{0a}$: data is normally distributed.\
$H_{1a}$: data is not normally distributed.

In [6]:
check_normality(sync)
check_normality(asyncr)

p value: 0.6556 >= 0.05
Fail to reject null hypothesis >> Data is normally distributed.
p value: 0.0803 >= 0.05
Fail to reject null hypothesis >> Data is normally distributed.


$H_{0b}$: sample variances are the same.\
$H_{1b}$: sample variances are not the same.

In [7]:
check_variance_homogeneity(sync, asyncr)

p value: 0.8149 >= 0.05
Fail to reject null hypothesis >> Sample variances are the same.


$H_{0c}$: $\mu_{sync} \leq \mu_{async}$\
$H_{1c}$: $\mu_{sync} > \mu_{async}$

In [8]:
#perform a t-test for the means of two independent samples of scores
ttest, p_value = stats.ttest_ind(sync,asyncr)

print("Since the hypothesis is one sided >> use p_value/2 >> p_value_one_sided: %.4f" %(p_value/2))

if p_value/2 < alpha:
    print("\np value: %.4f <" % (p_value/2), alpha, ">> t-statistic: %.4f > 1.645 (tabulated z-score)" % ttest)
    print("Reject null hypothesis")
else:
    print("p value: %.4f >=" % (p_value/2), alpha, ">> t-statistic: %.4f <= 1.645 (tabulated z-score)" % ttest)
    print("Fail to reject null hypothesis")

Since the hypothesis is one sided >> use p_value/2 >> p_value_one_sided: 0.0038

p value: 0.0038 < 0.05 >> t-statistic: 2.8415 > 1.645 (tabulated z-score)
Reject null hypothesis


**Conclusion**: synchronous students have higher grades than asynchronous ones on avg.