# Hypothesis Testing

In [3]:
import numpy as np
from scipy import stats
import pandas as pd
pd.options.display.float_format = '{:,.4f}'.format

For checking normality, below code uses Shapiro-Wilk’s W test which is generally preferred for smaller samples however there are other options like Kolmogorov-Smirnov and D’Agostino and Pearson’s test. Please visit https://docs.scipy.org/doc/scipy/reference/stats.html for more information.

In [5]:
def check_normality(data):
    test_stat_normality, p_value_normality=stats.shapiro(data)
    print("p value:%.4f" % p_value_normality)
    if p_value_normality <0.05:
        print("Reject null hypothesis >> The data is not normally distributed")
    else:
        print("Fail to reject null hypothesis >> The data is normally distributed")       

For checking variance homogeneity, below code uses Levene’s test but you can also check Bartlett’s test from here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.bartlett.html#scipy.stats.bartlett

In [6]:
def check_variance_homogeneity(group1, group2): #F-test
    test_stat_var, p_value_var= stats.levene(group1,group2)
    print("p value:%.4f" % p_value_var)
    if p_value_var <0.05:
        print("Reject null hypothesis >> The variances of the samples are different.")
    else:
        print("Fail to reject null hypothesis >> The variances of the samples are same.")

## Example-1   
 A university professor gave online lectures instead of face-to-face classes due to Covid-19. Later, he uploaded recorded lectures to the cloud for students who followed the course asynchronously (those who did not attend the lesson but later watched the records). However, he believes that the students who attend class at the class time and participate in the process are more successful. Therefore, he recorded the average grades of the students at the end of the semester. The data is below. 

synchronous = [94. , 84.9, 82.6, 69.5, 80.1, 79.6, 81.4, 77.8, 81.7, 78.8, 73.2, 87.9, 87.9, 93.5, 82.3, 79.3, 78.3, 71.6, 88.6, 74.6, 74.1, 80.6]      
asynchronous = [77.1, 71.7, 91. , 72.2, 74.8, 85.1, 67.6, 69.9, 75.3, 71.7, 65.7, 72.6, 71.5, 78.2]

**Conduct the hypothesis testing to check whether the professor's belief is statistically significant by using a 0.05 significance level to evaluate the null and alternative hypotheses. Before doing hypothesis testing, check the related assumptions. Comment on the results.**


### Assumptions
Observations in each sample are independent and identically distributed (iid).  
Observations in each sample are normally distributed.  
Observations in each sample have the same variance. 

In [11]:
sync = np.array([94. , 84.9, 82.6, 69.5, 80.1, 79.6, 81.4, 77.8, 81.7, 78.8, 73.2,
       87.9, 87.9, 93.5, 82.3, 79.3, 78.3, 71.6, 88.6, 74.6, 74.1, 80.6])
asyncr =np.array([77.1, 71.7, 91. , 72.2, 74.8, 85.1, 67.6, 69.9, 75.3, 71.7, 65.7, 72.6, 71.5, 78.2])

$H_{0}$: The data is normally distributed.  
$H_{1}$: The data is not normally distributed.   
Assume that alpha=0.05 If p-value is >0.05, it can be said that data is normally distributed.

In [12]:
check_normality(sync)
check_normality(asyncr)

p value:0.6556
Fail to reject null hypothesis >> The data is normally distributed
p value:0.0803
Fail to reject null hypothesis >> The data is normally distributed


$H_{0}$: The variances of the samples are same.  
$H_{1}$: The variances of the samples are different.
    
It tests the null hypothesis that the population variances are equal (called homogeneity of variance or homoscedasticity). If the resulting p-value of Levene's test is less than some significance level (typically 0.05), the obtained differences in sample variances are unlikely to have occurred based on random sampling from a population with equal variances

In [13]:
check_variance_homogeneity(sync, asyncr)

p value:0.8149
Fail to reject null hypothesis >> The variances of the samples are same.


Since the grades are obtained from the different individuals, the data is unpaired. As assumptions are satisfied, we can perform the parametric version of the test for 2 groups and unpaired data. 

We can define the hypothesis:

$H_{0}$: $\mu_{s}= \mu_{a}$    **or** The average grades are same

$H_{1}$: $\mu_{s}\neq  \mu_{a}$ **or** The average grades are NOT same

In [14]:
ttest,p_value = stats.ttest_ind(sync,asyncr)
print("p value:%.8f" % p_value)
print("since the hypothesis is one sided >> use p_value/2 >> p_value_one_sided:%.4f" %(p_value/2))
if p_value/2 <0.05:
    print("Reject null hypothesis")
else:
    print("Fail to reject null hypothesis") 

p value:0.00753598
since the hypothesis is one sided >> use p_value/2 >> p_value_one_sided:0.0038
Reject null hypothesis


**At this significance level, there is enough evidence to conclude that the average grade of the students who follow the course synchronously is higher than the async.** 

--------

## Example-2

The University Health Center diagnosed eighteen students with high cholesterol in the previous semester. Healthcare personnel told these patients about the dangers of high cholesterol and prescribed a diet program. One month later, the patients came for control, and their cholesterol level was reexamined. Test whether there is a difference in the cholesterol levels of the patients.   

**According to this information, conduct the hypothesis testing to check whether there is a decrease in the cholesterol levels of the patients after the diet by using a 0.05 significance level. Before doing hypothesis testing, check the related assumptions. Comment on the results**

test_results_before_diet=[224, 235, 223, 253, 253, 224, 244, 225, 259, 220, 242, 240, 239, 229, 276, 254, 237, 227]  
test_results_after_diet=[198, 195, 213, 190, 246, 206, 225, 199, 214, 210, 188, 205, 200, 220, 190, 199, 191, 218]

## Assumptions
• The dependent variable must be continuous (interval/ratio)  
• The observations are independent of one another  
• The dependent variable should be approximately normally distributed

In [32]:
test_results_before_diet=np.array([224, 235, 223, 253, 253, 224, 244, 225, 259, 220, 242, 240, 239, 229, 276, 254, 237, 227])
test_results_after_diet=np.array([198, 195, 213, 190, 246, 206, 225, 199, 214, 210, 188, 205, 200, 220, 190, 199, 191, 218])

$H_{0}$: The data is normally distributed.  
$H_{1}$: The data is not normally distributed. 

In [33]:
check_normality(test_results_before_diet)
check_normality(test_results_after_diet)
print(np.mean(test_results_before_diet))
print(np.mean(test_results_after_diet))

p value:0.1635
Fail to reject null hypothesis >> The data is normally distributed
p value:0.1003
Fail to reject null hypothesis >> The data is normally distributed
239.11111111111111
205.94444444444446


The data is paired since data is collected from the same individuals and assumptions are satisfied, then we can use the dependent t-test.

Let's define our hypothesis:

$H_{0}$: $\mu_{d}= 0 $ **or** The true mean difference is equal to zero.   
$H_{1}$: $\mu_{d}\neq 0 $ **or**  The true mean difference is NOT equal to zero.  

In [28]:
test_stat, p_value_paired = stats.ttest_rel(test_results_before_diet,test_results_after_diet)
print("p value:%.6f" % p_value_paired , "one tailed p value:%.6f" %(p_value_paired/2))
if p_value_paired <0.05:
    print("Reject null hypothesis")
else:
    print("Fail to reject null hypothesis")

p value:0.000008 one tailed p value:0.000004
Reject null hypothesis


At this significance level, there is enough evidence to conclude mean cholesterol level of patients has decreased after the diet.

-----

## Example-3
A venture capitalist wanted to invest in a startup that provides data compression without any loss in quality, but there are two competitors: PiedPiper and EndFrame. Initially, she believed the performance of the EndFrame could be better but still wanted to test it before the investment. Then, she gave the same files to each company to compress and recorded their performance scores. The data is below.    
    
piedpiper=[4.57, 4.55, 5.47, 4.67, 5.41, 5.55, 5.53, 5.63, 3.86, 3.97, 5.44, 3.93, 5.31, 5.17, 4.39, 4.28, 5.25]     
endframe = [4.27, 3.93, 4.01, 4.07, 3.87, 4.  , 4.  , 3.72, 4.16, 4.1 , 3.9 , 3.97, 4.08, 3.96, 3.96, 3.77, 4.09]


**According to this information, conduct the related hypothesis testing by using a 0.05 significance level. Before doing hypothesis testing, check the related assumptions. Comment on the results.**

## Assumptions
• The dependent variable must be continuous (interval/ratio)  
• The observations are independent of one another  
• The dependent variable should be approximately normally distributed

$H_{0}$: The data is normally distributed.  
$H_{1}$: The data is not normally distributed.   
Assume that alpha=0.05 If p-value is >0.05, it can be said that data is normality distributed.

In [13]:
piedpiper=np.array([4.57, 4.55, 5.47, 4.67, 5.41, 5.55, 5.53, 5.63, 3.86, 3.97, 5.44, 3.93, 5.31, 5.17, 4.39, 4.28, 5.25])
endframe = np.array([4.27, 3.93, 4.01, 4.07, 3.87, 4.  , 4.  , 3.72, 4.16, 4.1 , 3.9 , 3.97, 4.08, 3.96, 3.96, 3.77, 4.09])
check_normality(piedpiper)
check_normality(endframe)
print(np.mean(piedpiper))
print(np.mean(endframe))

p value:0.0304
Reject null hypothesis >> The data is not normally distributed
p value:0.9587
Fail to reject null hypothesis >> The data is normally distributed
4.881176470588235
3.991764705882353


The normality assumption is not satisfied; therefore, we need to use the nonparametric version of the paired test, namely the Wilcoxon Signed Rank test.

Let's define hypothesis:

$H_{0}$: $\mu_{d} = 0 $ **or** The true mean difference is equal to zero.   
$H_{1}$: $\mu_{d} \neq 0 $ **or**  The true mean difference is NOT equal to zero. 

In [12]:
test,pvalue = stats.wilcoxon(endframe,piedpiper) ##alternative default two sided
print("p-value:%.6f" %pvalue, ">> one_tailed_pval:%.6f" %(pvalue/2))

test,one_sided_pvalue = stats.wilcoxon(endframe,piedpiper, alternative="less")
print("one sided pvalue:%.6f" %(one_sided_pvalue))
if pvalue <0.05:
    print("Reject null hypothesis")
else:
    print("Fail to reject null hypothesis")

ValueError: The samples x and y must have the same length.

Reject $H_{0}$ >> At this significance level, there is enough evidence to conclude that the performance the PiedPaper is better than the EndFrame.