<h1>Normality Tests</h1>
Tests whether a data sample has a Gaussian distribution.

Assumptions

<ul><li>Observations in each sample are independent and identically distributed (iid).</li></ul>

Interpretation

<ul><li>H0: the sample has a Gaussian distribution.</li>
    <li>H1: the sample does not have a Gaussian distribution.</li></ul>

<h3>Shapiro-Wilk Test</h3>

In [1]:
#Example of normally distributed data
data1 = [65.0, 61.0, 63.0, 86.0, 70.0, 55.0, 74.0, 35.0, 72.0, 68.0, 45.0, 58.0]
#Example of not-normally distributed data
#data1 = [1.0, 10.0, 1.0, 10.0, 1.0, 10.0, 1.0, -10.0, 1.0, -10.0, 1.0, 1.0]

from scipy.stats import shapiro
stat, p = shapiro(data1)
if (p > 0.05):
    print("H0 is accepted with p=%.3f" %(p))
    print("The data is normally distributed")
else:
    print("H0 is rejected with p=%.3f" %(p))
    print("The data is not normally distributed")

H0 is accepted with p=0.922
The data is normally distributed


<h3>D’Agostino’s K^2 Test</h3>
Requires a samplesize of n>=20

In [2]:
#Example of normally distributed data
data1 = [65.0, 61.0, 63.0, 86.0, 70.0, 55.0, 74.0, 35.0, 72.0, 68.0, 45.0, 58.0]
#Example of not-normally distributed data
#data1 = [1.0, 10.0, 1.0, 10.0, 1.0, 10.0, 1.0, -10.0, 1.0, -10.0, 1.0, 1.0]

from scipy.stats import normaltest
stat, p = normaltest(data1)
if (p > 0.05):
    print("H0 is accepted with p=%.3f" %(p))
    print("The data is normally distributed")
else:
    print("H0 is rejected with p=%.3f" %(p))
    print("The data is not normally distributed")

H0 is accepted with p=0.514
The data is normally distributed


  "anyway, n=%i" % int(n))


<h1>Correlation Tests</h1>
This section lists statistical tests that you can use to check if two samples are related.

<h3>Pearson’s Correlation Coefficient</h3>

Tests whether two samples have a monotonic relationship.

Assumptions

<ul>
    <li>Observations in each sample are independent and identically distributed (iid).</li>
    <li>Observations in each sample are normally distributed.</li>
    <li>Observations in each sample have the same variance.</li>
    </ul>
    
Interpretation

<ul>
    <li>H0: the two samples are independent.</li>
    <li>H1: there is a dependency between the samples.</li>
    </ul>

In [3]:
from scipy.stats import pearsonr

#Example of correlated data
data1 = [3.0, 5.0, 4.0, 4.0, 2.0, 3.0]
data2 = [86.0, 95.0, 92.0, 83.0, 78.0, 82.0]

#Example of not-correlated data
#data1 = [3.0, 5.0, 4.0, 4.0, 2.0, 3.0]
#data2 = [6.0, 2.0, 7.0, 2.0, 2.0, 6.0]

corr, p = pearsonr(data1, data2)

if (p < 0.05):
    print("H1 is accepted with p=%.3f" %(p))
    print("There is a correlation between the datasets with R=%.3f" % (corr))
else:
    print("H1 is rejected with p=%.3f" %(p))
    print("There is no correlation between the datasets (R=%.3f)" % (corr))

H1 is accepted with p=0.027
There is a correlation between the datasets with R=0.862


<h3>Spearman’s Rank Correlation</h3>

Tests whether two samples have a monotonic relationship.

Assumptions

<ul>
    <li>Observations in each sample are independent and identically distributed (iid).</li>
    <li>Observations in each sample can be ranked.</li>
</ul>
   
Interpretation

<ul>
    <li>H0: the two samples are independent.</li>
    <li>H1: there is a dependency between the samples.</li>
</ul>

In [4]:
from scipy.stats import spearmanr

#Example of correlated data
data1 = [3.0, 5.0, 4.0, 4.0, 2.0, 3.0]
data2 = [86.0, 95.0, 92.0, 83.0, 78.0, 82.0]

#Example of not-correlated data
#data1 = [3.0, 5.0, 4.0, 4.0, 2.0, 3.0]
#data2 = [6.0, 2.0, 7.0, 2.0, 2.0, 6.0]

corr, p = spearmanr(data1, data2)

if (p < 0.05):
    print("H1 is accepted with p=%.3f" %(p))
    print("There is a correlation between the datasets with R=%.3f" % (corr))
else:
    print("H1 is rejected with p=%.3f" %(p))
    print("There is no correlation between the datasets (R=%.3f)" % (corr))

H1 is accepted with p=0.031
There is a correlation between the datasets with R=0.853


<h1>Equal Variances Tests</h1>

<h3>Levene’s Test</h3>
The Levene test tests the null hypothesis that all input samples are from populations with equal variances.

Interpretation

<ul>
    <li>H0: all input samples are from populations with equal variances.</li>
    <li>H1: all input samples are from populations with unequal variances.</li>
</ul>

In [5]:
from scipy.stats import levene

#Example of datasets with equal variances
data1 = [13, 17, 19, 11, 20, 15, 18, 9, 12, 16]
data2 = [12, 8, 6, 16, 12, 14, 10, 18, 4, 11]

#Example of datasets with unequal variances
#data1 = [13, 17, 19, 11, 20, 15, 18, 9, 12, 16]
#data2 = [12, 13, 12, 12, 12, 14, 12, 11, 12, 11]

stat, p = levene(data1, data2)

if (p < 0.05):
    print("H1 is accepted with p=%.3f" %(p))
    print("The datasets have unequal variances")
else:
    print("H1 is rejected with p=%.3f" %(p))
    print("The datasets have equal variances")

H1 is rejected with p=0.773
The datasets have equal variances


<h1>Parametric Statistical Hypothesis Tests</h1>
This section lists statistical tests that you can use to compare data samples.

<h3>T-test</h3>
(Student’s t-test)

Tests whether the means of two independent samples are significantly different.

Assumptions

<ul>
    <li>Observations in each sample are independent and identically distributed (iid).</li>
    <li>Observations in each sample are normally distributed.</li>
    <li>Observations in each sample have the same variance.</li>
</ul>

Interpretation

<ul>
    <li>H0: the means of the samples are equal.</li>
    <li>H1: the means of the samples are unequal.</li>
</ul>

In [6]:
from scipy.stats import ttest_ind

#Example of datasets with difference between the means
data1 = [13.0, 17.0, 19.0, 11.0, 20.0, 15.0, 18.0, 9.0, 12.0, 16.0]
data2 = [12.0, 8.0, 6.0, 16.0, 12.0, 14.0, 10.0, 18.0, 4.0, 11.0]

#Example of datasets with no difference between the means
#data1 = [13.0, 17.0, 19.0, 11.0, 20.0, 15.0, 18.0, 9.0, 12.0, 16.0]
#data2 = [14.0, 16.0, 18.0, 12.0, 20.0, 15.0, 19.0, 7.0, 14.0, 16.0]

stat, p = ttest_ind(data1, data2)

import numpy as np
print("Mean set 1: %.3f" %(np.array(data1).mean()))
print("Mean set 2: %.3f" %(np.array(data2).mean()))
print("")

if (p < 0.05):
    print("H1 is accepted with p=%.3f" %(p))
    print("There is a significant difference between the means of the datasets")
else:
    print("H1 is rejected with p=%.3f" %(p))
    print("There is no significant difference between the datasets")

Mean set 1: 15.000
Mean set 2: 11.100

H1 is accepted with p=0.043
There is a significant difference between the means of the datasets


<h3>Paired t-test</h3>
(Paired Student’s t-test)

Tests whether the means of two paired samples are significantly different.

Assumptions

<ul>
    <li>Observations in each sample are independent and identically distributed (iid).</li>
    <li>Observations in each sample are normally distributed.</li>
    <li>Observations in each sample have the same variance.</li>
    <li>Observations across each sample are paired.</li>
</ul>

Interpretation

<ul>
    <li>H0: the means of the samples are equal.</li>
    <li>H1: the means of the samples are unequal.</li>
</ul>

In [64]:
from scipy.stats import ttest_rel

#Example of datasets with difference between the means
data1 = [210.0, 205.0, 193.0, 182.0, 259.0, 239.0, 164.0, 197.0, 222.0, 211.0, 187.0, 175.0, 186.0, 243.0, 246.0]
data2 = [197.0, 195.0, 191.0, 174.0, 236.0, 226.0, 157.0, 196.0, 201.0, 196.0, 181.0, 164.0, 181.0, 229.0, 231.0]

#Example of datasets with no difference between the means
#data1 = [210.0, 205.0, 193.0, 182.0, 259.0, 239.0, 164.0, 197.0, 222.0, 211.0, 187.0, 175.0, 186.0, 243.0, 246.0]
#data2 = [207.0, 211.0, 196.0, 179.0, 256.0, 243.0, 160.0, 204.0, 226.0, 209.0, 184.0, 174.0, 184.0, 240.0, 241.0]

stat, p = ttest_rel(data1, data2)

import numpy as np
print("Mean set 1: %.3f" %(np.array(data1).mean()))
print("Mean set 2: %.3f" %(np.array(data2).mean()))
print("")

if (p < 0.05):
    print("H1 is accepted with p=%.3f" %(p))
    print("There is a significant difference between the means of the datasets")
else:
    print("H1 is rejected with p=%.3f" %(p))
    print("There is no significant difference between the datasets")

Mean set 1: 207.933
Mean set 2: 197.000

H1 is accepted with p=0.000
There is a significant difference between the means of the datasets


<h3>Analysis of Variance Test (ANOVA)</h3>

Tests whether the means of two or more independent samples are significantly different.

Assumptions

<ul>
    <li>Observations in each sample are independent and identically distributed (iid).</li>
    <li>Observations in each sample are normally distributed.</li>
    <li>Observations in each sample have the same variance.</li>
</ul>

Interpretation

<ul>
    <li>H0: the means of the samples are equal.</li>
    <li>H1: one or more of the means of the samples are unequal.</li>
</ul>

In [33]:
from scipy.stats import f_oneway

#Example of datasets with difference between the means
data1 = [42.0, 53.0, 49.0, 53.0, 43.0, 44.0, 45.0, 52.0, 54.0]
data2 = [69.0, 54.0, 58.0, 64.0, 64.0, 55.0, 56.0]
data3 = [35.0, 40.0, 53.0, 42.0, 50.0, 49.0, 55.0, 39.0, 40.0]

#Example of datasets with no difference between the means
#data1 = [42.0, 53.0, 49.0, 53.0, 43.0, 44.0, 45.0, 52.0, 54.0]
#data2 = [41.0, 54.0, 48.0, 54.0, 41.0, 45.0, 46.0, 51.0, 54.0]
#data3 = [40.0, 55.0, 49.0, 52.0, 42.0, 43.0, 44.0, 53.0, 54.0]

stat, p = f_oneway(data1, data2, data3)

import numpy as np
print("Mean set 1: %.3f" %(np.array(data1).mean()))
print("Mean set 2: %.3f" %(np.array(data2).mean()))
print("Mean set 3: %.3f" %(np.array(data3).mean()))
print("")

if (p < 0.05):
    print("H1 is accepted with p=%.3f" %(p))
    print("There is a significant difference between the means of the datasets")
else:
    print("H1 is rejected with p=%.3f" %(p))
    print("There is no significant difference between the datasets")

Mean set 1: 48.333
Mean set 2: 60.000
Mean set 3: 44.778

H1 is accepted with p=0.000
There is a significant difference between the means of the datasets


<h3>Fisher’s Least Significant Difference (LSD) Test</h3>
Conduct multiple t-tests.

In [34]:
from scipy.stats import ttest_ind
stats, p12 = ttest_ind(data1, data2)
stats, p13 = ttest_ind(data1, data3)
stats, p23 = ttest_ind(data2, data3)

if (p12 < 0.05):
    print("Col1\tCol2\t Difference(p=%.3f)" %(p12))
else:
    print("Col1\tCol2\t No Difference(p=%.3f)" %(p12))
if (p13 < 0.05):
    print("Col1\tCol3\t Difference(p=%.3f)" %(p13))
else:
    print("Col1\tCol3\t No Difference(p=%.3f)" %(p13))
if (p23 < 0.05):
    print("Col2\tCol3\t Difference(p=%.3f)" %(p23))
else:
    print("Col2\tCol3\t No Difference(p=%.3f)" %(p23))    

Col1	Col2	 Difference(p=0.001)
Col1	Col3	 No Difference(p=0.231)
Col2	Col3	 Difference(p=0.000)


<h3>Bonferroni correction</h3>
Conduct multiple t-tests, but divide the p-value by the number of planned comparisons (accounts for familywise error rate).

In [35]:
from scipy.stats import ttest_ind
stats, p12 = ttest_ind(data1, data2)
stats, p13 = ttest_ind(data1, data3)
stats, p23 = ttest_ind(data2, data3)

p = 0.05 / 3
print("Using corrected p-value=%.3f" % (p))
if (p12 < p):
    print("Col1\tCol2\t Difference(p=%.3f)" %(p12))
else:
    print("Col1\tCol2\t No Difference(p=%.3f)" %(p12))
if (p13 < p):
    print("Col1\tCol3\t Difference(p=%.3f)" %(p13))
else:
    print("Col1\tCol3\t No Difference(p=%.3f)" %(p13))
if (p23 < p):
    print("Col2\tCol3\t Difference(p=%.3f)" %(p23))
else:
    print("Col2\tCol3\t No Difference(p=%.3f)" %(p23))  

Using corrected p-value=0.017
Col1	Col2	 Difference(p=0.001)
Col1	Col3	 No Difference(p=0.231)
Col2	Col3	 Difference(p=0.000)


<h3>Tukey’s HSD</h3>
This test is not available in Scipy, and requires the somewhat messy to use Statsmodels library.

In [32]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd

d = {"columns": ["col1", "col1", "col1", "col1", "col1", "col1", "col1", "col1", "col1", "col2", "col2", "col2", "col2", "col2", "col2", "col2", "col2", "col2", "col3", "col3", "col3", "col3", "col3", "col3", "col3", "col3", "col3"],
     "values": [42.0, 53.0, 49.0, 53.0, 43.0, 44.0, 45.0, 52.0, 54.0, 69.0, 54.0, 58.0, 64.0, 64.0, 55.0, 56.0, 51.0, 53.0, 35.0, 40.0, 53.0, 42.0, 50.0, 49.0, 55.0, 39.0, 40.0]}

mc = pairwise_tukeyhsd(d['values'], d['columns'])
print(mc)

Multiple Comparison of Means - Tukey HSD,FWER=0.05
group1 group2 meandiff  lower    upper  reject
----------------------------------------------
 col1   col2   9.8889   2.7491  17.0287  True 
 col1   col3  -3.5556  -10.6953  3.5842 False 
 col2   col3  -13.4444 -20.5842 -6.3047  True 
----------------------------------------------


<h1>Nonparametric Statistical Hypothesis Tests</h1>

<h3>Mann-Whitney U Test</h3>
(also called Wilcoxon Rank-sum)

Tests whether the distributions of two independent samples are equal or not.

Assumptions

<ul>
    <li>Observations in each sample are independent and identically distributed (iid).</li>
    <li>Observations in each sample can be ranked.</li>
</ul>

Interpretation

<ul>
    <li>H0: the distributions of both samples are equal.</li>
    <li>H1: the distributions of both samples are not equal.</li>
</ul>

In [72]:
from scipy.stats import mannwhitneyu

#Example of datasets with difference between the means
data1 = [105.0, 119.0, 100.0, 97.0, 96.0, 101.0, 94.0, 95.0, 98.0]
data2 = [96.0, 99.0, 94.0, 89.0, 96.0, 93.0, 88.0, 105.0, 88.0]

#Example of datasets with no difference between the means
#data1 = [105.0, 119.0, 100.0, 97.0, 96.0, 101.0, 94.0, 95.0, 98.0]
#data2 = [99.0, 107.0, 94.0, 89.0, 96.0, 98.0, 92.0, 102.0, 92.0]

stat, p = mannwhitneyu(data1, data2)

import numpy as np
print("Mean set 1: %.3f" %(np.array(data1).mean()))
print("Mean set 2: %.3f" %(np.array(data2).mean()))
print("")

if (p < 0.05):
    print("H1 is accepted with p=%.3f" %(p))
    print("There is a significant difference between the means of the datasets")
else:
    print("H1 is rejected with p=%.3f" %(p))
    print("There is no significant difference between the datasets")

Mean set 1: 100.556
Mean set 2: 94.222

H1 is accepted with p=0.026
There is a significant difference between the means of the datasets


<h3>Wilcoxon Signed-Rank Test</h3>

Tests whether the distributions of two paired samples are equal or not.

Assumptions

<ul>
    <li>Observations in each sample are independent and identically distributed (iid).</li>
    <li>Observations in each sample can be ranked.</li>
    <li>Observations across each sample are paired.</li>
</ul>

Interpretation

<ul>
    <li>H0: the distributions of both samples are equal.</li>
    <li>H1: the distributions of both samples are not equal.</li>
</ul>

In [75]:
from scipy.stats import wilcoxon

#Example of datasets with difference between the means
data1 = [190.0, 175.0, 189.0, 160.0, 184.0, 178.0, 184.0, 179.0, 181.0]
data2 = [171.0, 170.0, 182.0, 158.0, 173.0, 163.0, 179.0, 173.0, 175.0]

#Example of datasets with no difference between the means
#data1 = [190.0, 175.0, 189.0, 160.0, 184.0, 178.0, 184.0, 179.0, 181.0]
#data2 = [176.0, 176.0, 189.0, 171.0, 173.0, 163.0, 170.0, 173.0, 175.0]

stat, p = wilcoxon(data1, data2)

import numpy as np
print("Mean set 1: %.3f" %(np.array(data1).mean()))
print("Mean set 2: %.3f" %(np.array(data2).mean()))
print("")

if (p < 0.05):
    print("H1 is accepted with p=%.3f" %(p))
    print("There is a significant difference between the means of the datasets")
else:
    print("H1 is rejected with p=%.3f" %(p))
    print("There is no significant difference between the datasets")

Mean set 1: 180.000
Mean set 2: 171.556

H1 is accepted with p=0.008
There is a significant difference between the means of the datasets




<h3>Kruskal-Wallis H Test</h3>

Tests whether the distributions of two or more independent samples are equal or not.

Assumptions

<ul>
    <li>Observations in each sample are independent and identically distributed (iid).</li>
    <li>Observations in each sample can be ranked.</li>
</ul>

Interpretation

<ul>
    <li>H0: the distributions of all samples are equal.</li>
    <li>H1: the distributions of one or more samples are not equal.</li>
</ul>

In [5]:
from scipy.stats import kruskal

#Example of datasets with difference between the means
data1 = [498.0, 582.0, 527.0, 480.0, 549.0]
data2 = [435.0, 360.0, 372.0, 413.0, 549.0]
data3 = [608.0, 515.0, 661.0, 637.0, 554.0]

#Example of datasets with no difference between the means
#data1 = [498.0, 582.0, 527.0, 480.0, 549.0]
#data2 = [535.0, 560.0, 572.0, 513.0, 549.0]
#data3 = [608.0, 515.0, 661.0, 637.0, 554.0]

stat, p = kruskal(data1, data2, data3)

import numpy as np
print("Mean set 1: %.3f" %(np.array(data1).mean()))
print("Mean set 2: %.3f" %(np.array(data2).mean()))
print("Mean set 3: %.3f" %(np.array(data3).mean()))
print("")

if (p < 0.05):
    print("H1 is accepted with p=%.3f" %(p))
    print("There is a significant difference between the means of the datasets")
else:
    print("H1 is rejected with p=%.3f" %(p))
    print("There is no significant difference between the datasets")

Mean set 1: 527.200
Mean set 2: 425.800
Mean set 3: 595.000

H1 is accepted with p=0.016
There is a significant difference between the means of the datasets


<h3>Friedman Test</h3>

Tests whether the distributions of two or more paired samples are equal or not.

Assumptions

<ul>
    <li>Observations in each sample are independent and identically distributed (iid).</li>
    <li>Observations in each sample can be ranked.</li>
    <li>Observations across each sample are paired.</li>
</ul>

Interpretation

<ul>
    <li>H0: the distributions of all samples are equal.</li>
    <li>H1: the distributions of one or more samples are not equal.</li>
</ul>

In [81]:
from scipy.stats import friedmanchisquare

#Example of datasets with difference between the means
data1 = [498.0, 582.0, 527.0, 480.0, 549.0, 486.0, 490.0]
data2 = [435.0, 360.0, 372.0, 413.0, 512.0, 390.0, 375.0]
data3 = [608.0, 515.0, 661.0, 637.0, 554.0, 425.0, 490.0]

#Example of datasets with no difference between the means
#data1 = [498.0, 582.0, 527.0, 480.0, 549.0, 486.0, 490.0]
#data2 = [535.0, 560.0, 572.0, 493.0, 532.0, 490.0, 475.0]
#data3 = [608.0, 515.0, 661.0, 637.0, 554.0, 425.0, 490.0]

stat, p = friedmanchisquare(data1, data2, data3)

import numpy as np
print("Mean set 1: %.3f" %(np.array(data1).mean()))
print("Mean set 2: %.3f" %(np.array(data2).mean()))
print("Mean set 3: %.3f" %(np.array(data3).mean()))
print("")

if (p < 0.05):
    print("H1 is accepted with p=%.3f" %(p))
    print("There is a significant difference between the means of the datasets")
else:
    print("H1 is rejected with p=%.3f" %(p))
    print("There is no significant difference between the datasets")

Mean set 1: 516.000
Mean set 2: 408.143
Mean set 3: 555.714

H1 is accepted with p=0.004
There is a significant difference between the means of the datasets
