### Hypothesis Testing
The purpose of the test is to tell if there is any significant difference between two data sets.

### Overview
This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

#### Use Case 1: 
A student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.

He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.

In [1]:
from scipy import stats 
import numpy as np

In [2]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

### One sample t-test

Check if the mean of the GPU1 is equal to zero.

Null Hypothesis is that mean is equal to zero.
Alternate hypothesis is that it is not equal to zero.

In [4]:
p = stats.ttest_1samp(GPU1, 0)[1]
p

#Here, p-value is much less than 5% level of significance
#So the statistical decision is to reject the null hypothesis at 5% level of significance.

7.228892044970457e-15

In [6]:
if(p<0.05):
    print('Ha= Reject Null Hypothesis/Accept alternate Hypothesis,Mean is not equal to zero')
else:
    #The null hypothesis for a 1-sample t-test is that the there is no change.
    print('Ho= Accept Null Hypothesis, Mean is equal to zero')

Ha= Reject Null Hypothesis/Accept alternate Hypothesis,Mean is not equal to zero


In [7]:
#Also, we see that mean of GPU1 is not equal to zero.
GPU1.mean()

10.333333333333334

#### Use Case 2:
Null Hypothesis : There is no significant difference between data sets

Alternate Hypothesis : There is a significant difference

In [9]:
p = stats.ttest_ind(GPU1,GPU2)[1]
p

#p value is 0.013 and it is less than 5% level of significance
#So the statistical decision is to reject the null hypothesis at 5% level of significance.
#Hence, the two data sets is significantly different.

0.013794282041452725

In [10]:
if(p<0.05):
    print('Ha= Reject Null Hypothesis/Accept alternate Hypothesis,Significant difference in groups')
else:
    #The null hypothesis for a 2-sample t-test is that groups are equal.
    print('Ho= Accept Null Hypothesis, No significant differnce in the datasets')

Ha= Reject Null Hypothesis/Accept alternate Hypothesis,Significant difference in groups


### Two-sample testing :
He is trying a third GPU - GPU3.
Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.

In [12]:
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

In [13]:
p = stats.ttest_ind(GPU1,GPU3)[1]
p

0.14509210993138993

In [14]:
if(p<0.05):
    print('Ha= Reject Null Hypothesis/Accept alternate Hypothesis,Significant difference in groups')
else:
    #The null hypothesis for a 2-sample t-test is that groups are equal.
    print('Ho= Accept Null Hypothesis, No significant differnce in the datasets')

Ho= Accept Null Hypothesis, No significant differnce in the datasets


### ANOVA
If you need to compare more than two data sets at a time, an ANOVA is your best bet.

The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.

In [16]:
import numpy as np

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality.

In [23]:
p = stats.levene(e1,e2,e3)[1]
p

#The p value > 5% meaning there is homogeneity of variances and we can proceed

0.12259792666001798

In [24]:
if(p<0.05):
    #The alternate hypothesis is that the variances are not equal for at least one pair
    print('Ha= Reject Null Hypothesis/Accept alternate Hypothesis,variances are not equal for at least one pair')
else:
    #The null hypothesis for a levene test is that the variances are equal across all samples.
    print('Ho= Accept Null Hypothesis, variances are equal across all samples')

Ho= Accept Null Hypothesis, variances are equal across all samples


### One-Way ANOVA tests
The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

In [25]:
p = stats.f_oneway(e1,e2,e3)[1]
p
#Here, p-value is much greater than 5% level of significance
#So the statistical decision is to accept the null hypothesis at 5% level of significance.

0.13574644501798466

In [26]:
if(p<0.05):
    #The alternate hypothesis is that there is significant difference in any the two means so they are unequal.
    print('Ha= Reject Null Hypothesis/Accept alternate Hypothesis,Significant difference in any the two means so they are unequal')
else:
    #The null hypothesis for the test is that the two or more means are equal..
    print('Ho= Accept Null Hypothesis, Two or more groups have same population mean')

Ho= Accept Null Hypothesis, Two or more groups have same population mean


### TypeI and TypeII errors

Type I error, also known as a “false positive”: the error of rejecting a null
hypothesis when it is actually true. In other words, this is the error of accepting an
alternative hypothesis (the real hypothesis of interest) when the results can be
attributed to chance

Type II error, also known as a "false negative": the error of not rejecting a null
hypothesis when the alternative hypothesis is the true state of nature. In other
words, this is the error of failing to accept an alternative hypothesis when you
don't have adequate power

#### Use Case 4:
You are a manager of a chinese restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. State the null and alternative hypothesis.

##### Answer: 
Null Hypothesis : The waiting time to place an order has not changed from 4.5 min in the past month.

Alternate Hypothesis: The waiting time to place an order has changed from 4.5 min in the past month.

### Chi square test

In [29]:
import numpy as np
# Here d1 to d6 are the number of dice and four values in each dice represents value of dice for player 1 to player 4

d1 = [1, 6, 3, 4]
d2 = [2, 5, 1, 3]
d3 = [4, 2, 3, 1]
d4 = [3, 4, 1, 2]
d5 = [1, 6, 3, 5]
d6 = [3, 2, 2, 1]

dice = np.array([d1, d2, d3, d4, d5, d6])

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold.

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

chi2 stat
p-value
degree of freedom
contingency

In [30]:
chi_test = stats.chi2_contingency(dice)
p = chi_test[1]

In [31]:
print("Chi2 stats value from Chi square test : "+ str(chi_test[0]))
print("p-value from Chi square test          : "+ str(chi_test[1]))
print("degree of freedom from Chi square test: "+ str(chi_test[2]))
print("contingency from Chi square test      : "+ str(chi_test[3]))

Chi2 stats value from Chi square test : 11.445004959326388
p-value from Chi square test          : 0.720458335452983
degree of freedom from Chi square test: 15
contingency from Chi square test      : [[2.88235294 5.14705882 2.67647059 3.29411765]
 [2.26470588 4.04411765 2.10294118 2.58823529]
 [2.05882353 3.67647059 1.91176471 2.35294118]
 [2.05882353 3.67647059 1.91176471 2.35294118]
 [3.08823529 5.51470588 2.86764706 3.52941176]
 [1.64705882 2.94117647 1.52941176 1.88235294]]


In [32]:
if(p<0.01):
    #The alternate hypothesis is that there is significant difference in any of the two datasets.
    print('Ha= Reject Null Hypothesis/Accept alternate Hypothesis,Significant difference in any of the two datasets')
else:
    #The null hypothesis for Chi Square test is that all the datasets are equal. 
    print('Ho= Accept Null Hypothesis, All the datasets are equal')

Ho= Accept Null Hypothesis, All the datasets are equal


### Z-test
Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.

In [34]:
zscore = stats.zscore(dice,ddof=1)
zscore

array([[-1.10096377,  0.99917458,  0.84757938,  0.81649658],
       [-0.27524094,  0.45417026, -1.18661113,  0.20412415],
       [ 1.37620471, -1.18084268,  0.84757938, -1.02062073],
       [ 0.55048188, -0.09083405, -1.18661113, -0.40824829],
       [-1.10096377,  0.99917458,  0.84757938,  1.42886902],
       [ 0.55048188, -1.18084268, -0.16951588, -1.02062073]])

In [35]:
import scipy as scipy
p_values = 1-scipy.special.ndtr(zscore)
p_values
print("mean of the p-values are: "+ str(p_values.mean()))

mean of the p-values are: 0.496903548959095


#### Use Case 5 :
A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. In some cases, you might be interested in testing differences between samples of the same group at different points in time. We can conduct a paired t-test using the scipy function stats.ttest_rel().

In [36]:
before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)

Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

In [37]:
p = stats.ttest_rel(after,before)[1]
p

1.003325574539875e-11

In [38]:
if(p<0.01):
    print('Ha= Reject Null Hypothesis/Accept alternate Hypothesis,Drug Works')
else:
    #The default null hypothesis for a 2-sample t-test is that the two groups are equal.
    print('Ho= . Accept Null Hypothesis, Drug did not work')

Ha= Reject Null Hypothesis/Accept alternate Hypothesis,Drug Works
