<a href="https://colab.research.google.com/github/gbiamgaurav/Hypothesis-testing-AB-Testing/blob/main/Hypothesis_Testing_with_various_Tests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hypothesis Testing [cheatsheet for statistical tests](https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/)

# Normality Tests

This section lists statistical tests that you can use to check if your data has a Gaussian distribution.

## 1. Shapiro-Wilk Test

* Tests whether a data sample has a Gaussian distribution.

Assumptions:

* Observations in each sample are independent and identically
 distributed (iid).

Interpretation:

* H0: the sample has a Gaussian distribution.
* H1: the sample does not have a Gaussian distribution.

In [1]:
# Example of the Shapiro-Wilk Normality Test
from scipy.stats import shapiro
data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
stat, p = shapiro(data)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably Gaussian')
else:
 print('Probably not Gaussian')

stat=0.895, p=0.193
Probably Gaussian


### Using pingouin

check out the documentation here: [Link](https://pingouin-stats.org/build/html/index.html)



In [2]:
## Install pingouin

!pip install --upgrade pingouin

Collecting pingouin
  Downloading pingouin-0.5.4-py2.py3-none-any.whl (198 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m198.9/198.9 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Collecting pandas-flavor (from pingouin)
  Downloading pandas_flavor-0.6.0-py3-none-any.whl (7.2 kB)
Installing collected packages: pandas-flavor, pingouin
Successfully installed pandas-flavor-0.6.0 pingouin-0.5.4


In [3]:
import pingouin as pg

In [4]:
data

[0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.36, -1.478, -1.637, -1.869]

In [5]:
print(pg.normality(data))

          W     pval  normal
0  0.895101  0.19341    True


## 2. D’Agostino’s K^2 Test

Tests whether a data sample has a Gaussian distribution.

Assumptions:

* Observations in each sample are independent and identically distributed (iid).

Interpretation:

* H0: the sample has a Gaussian distribution.
* H1: the sample does not have a Gaussian distribution.

In [6]:
# Example of the D'Agostino's K^2 Normality Test
from scipy.stats import normaltest
data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
stat, p = normaltest(data)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably Gaussian')
else:
 print('Probably not Gaussian')

stat=3.392, p=0.183
Probably Gaussian




In [7]:
print(pg.normality(data))

          W     pval  normal
0  0.895101  0.19341    True


## 3. Anderson-Darling Test
Tests whether a data sample has a Gaussian distribution.

Assumptions:

* Observations in each sample are independent and identically distributed (iid).

Interpretation:

* H0: the sample has a Gaussian distribution.
* H1: the sample does not have a Gaussian distribution.

In [8]:
# Example of the Anderson-Darling Normality Test
from scipy.stats import anderson
data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
result = anderson(data)
print('stat=%.3f' % (result.statistic))
for i in range(len(result.critical_values)):
 sl, cv = result.significance_level[i], result.critical_values[i]
 if result.statistic < cv:
  print('Probably Gaussian at the %.1f%% level' % (sl))
 else:
  print('Probably not Gaussian at the %.1f%% level' % (sl))

stat=0.424
Probably Gaussian at the 15.0% level
Probably Gaussian at the 10.0% level
Probably Gaussian at the 5.0% level
Probably Gaussian at the 2.5% level
Probably Gaussian at the 1.0% level


In [9]:
print(pg.normality(data))

          W     pval  normal
0  0.895101  0.19341    True


In [16]:
from scipy.stats import anderson
data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869, 3.2458, 5.2451, 10.4523] ## make some changes in the data
result = anderson(data)
print('stat=%.3f' % (result.statistic))
for i in range(len(result.critical_values)):
 sl, cv = result.significance_level[i], result.critical_values[i]
 if result.statistic < cv:
  print('Probably Gaussian at the %.1f%% level' % (sl))
 else:
  print('Probably not Gaussian at the %.1f%% level' % (sl))

stat=0.880
Probably not Gaussian at the 15.0% level
Probably not Gaussian at the 10.0% level
Probably not Gaussian at the 5.0% level
Probably not Gaussian at the 2.5% level
Probably Gaussian at the 1.0% level


In [17]:
print(pg.normality(data))

          W      pval  normal
0  0.811561  0.009383   False


# Correlation Tests

This section lists statistical tests that you can use to check if two samples are related.

## 1. Pearson’s Correlation Coefficient
Tests whether two samples have a linear relationship.

Assumptions:

* Observations in each sample are independent and identically distributed (iid).
* Observations in each sample are normally distributed.
* Observations in each sample have the same variance.

Interpretation:

* H0: the two samples are independent.
* H1: there is a dependency between the samples.

In [22]:
# Example of the Pearson's Correlation test
from scipy.stats import pearsonr
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579]
stat, p = pearsonr(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably independent, Fail to reject the null hypothesis')
else:
 print('Probably dependent, Reject the null hypothesis')

stat=0.688, p=0.028
Probably dependent, Reject the null hypothesis


In [23]:
## use pingouin for correlation

print(pg.corr(data1, data2))

          n        r        CI95%     p-val   BF10     power
pearson  10  0.68797  [0.1, 0.92]  0.027873  3.247  0.642242


## 2. Spearman’s Rank Correlation
Tests whether two samples have a monotonic relationship.

Assumptions:

* Observations in each sample are independent and identically distributed (iid).
* Observations in each sample can be ranked.

Interpretation:

* H0: the two samples are independent.
* H1: there is a dependency between the samples.

In [24]:
# Example of the Spearman's Rank Correlation Test
from scipy.stats import spearmanr
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579]
stat, p = spearmanr(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably independent, fail to reject the null hypothesis')
else:
 print('Probably dependent, reject the null hypothesis')

stat=0.855, p=0.002
Probably dependent, reject the null hypothesis


In [25]:
print(pg.corr(data1, data2))

          n        r        CI95%     p-val   BF10     power
pearson  10  0.68797  [0.1, 0.92]  0.027873  3.247  0.642242


## 3. Kendall’s Rank Correlation
Tests whether two samples have a monotonic relationship.

Assumptions:

* Observations in each sample are independent and identically distributed (iid).
* Observations in each sample can be ranked.

Interpretation:

* H0: the two samples are independent.
* H1: there is a dependency between the samples.

In [31]:
# Example of the Kendall's Rank Correlation Test
from scipy.stats import kendalltau
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579]
stat, p = kendalltau(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably independent, fail to reject the null hypothesis')
else:
 print('Probably dependent, reject the null hypothesis')

stat=0.733, p=0.002
Probably dependent, reject the null hypothesis


In [32]:
print(pg.corr(data1, data2))

          n        r        CI95%     p-val   BF10     power
pearson  10  0.68797  [0.1, 0.92]  0.027873  3.247  0.642242


## Chi-Squared Test
Tests whether two categorical variables are related or independent.

Assumptions:

* Observations used in the calculation of the contingency table are independent.
* 25 or more examples in each cell of the contingency table.

Interpretation:

* H0: the two samples are independent.
* H1: there is a dependency between the samples.

In [33]:
# Example of the Chi-Squared Test
from scipy.stats import chi2_contingency
table = [[10, 20, 30],[6,  9,  17]]
stat, p, dof, expected = chi2_contingency(table)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably independent')
else:
 print('Probably dependent')

stat=0.272, p=0.873
Probably independent


In [34]:
## using pingouin

data = pg.read_dataset('chi2_independence')
expected, observed, stats = pg.chi2_independence(data, x='sex', y='target')
stats

Unnamed: 0,test,lambda,chi2,dof,pval,cramer,power
0,pearson,1.0,22.717227,1.0,1.876778e-06,0.273814,0.997494
1,cressie-read,0.666667,22.931427,1.0,1.678845e-06,0.275102,0.997663
2,log-likelihood,0.0,23.557374,1.0,1.212439e-06,0.278832,0.998096
3,freeman-tukey,-0.5,24.219622,1.0,8.595211e-07,0.282724,0.998469
4,mod-log-likelihood,-1.0,25.071078,1.0,5.525544e-07,0.287651,0.998845
5,neyman,-2.0,27.457956,1.0,1.605471e-07,0.301032,0.999481


# Stationary Tests
This section lists statistical tests that you can use to check if a time series is stationary or not.

## 1. Augmented Dickey-Fuller Unit Root Test
Tests whether a time series has a unit root, e.g. has a trend or more generally is autoregressive.

Assumptions:

* Observations in are temporally ordered.

Interpretation:

* H0: a unit root is present (series is non-stationary).
* H1: a unit root is not present (series is stationary).

In [35]:
# Example of the Augmented Dickey-Fuller unit root test
from statsmodels.tsa.stattools import adfuller
data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
stat, p, lags, obs, crit, t = adfuller(data)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably not Stationary')
else:
 print('Probably Stationary')

stat=0.517, p=0.985
Probably not Stationary


## 2. Kwiatkowski-Phillips-Schmidt-Shin
Tests whether a time series is trend stationary or not.

Assumptions:

* Observations in are temporally ordered.

Interpretation:

* H0: the time series is trend-stationary.
* H1: the time series is not trend-stationary.

In [36]:
# Example of the Kwiatkowski-Phillips-Schmidt-Shin test
from statsmodels.tsa.stattools import kpss
data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
stat, p, lags, crit = kpss(data)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably Stationary')
else:
 print('Probably not Stationary')

stat=0.594, p=0.023
Probably not Stationary


# Parametric Statistical Hypothesis Tests
This section lists statistical tests that you can use to compare data samples.

## 1. T-test
Tests whether the means of two independent samples are significantly different.

Assumptions:

* Observations in each sample are independent and identically distributed (iid).
* Observations in each sample are normally distributed.
* Observations in each sample have the same variance.

Interpretation:

* H0: the means of the samples are equal.
* H1: the means of the samples are unequal.

In [37]:
# Example of the t-test
from scipy.stats import ttest_ind
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = ttest_ind(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably the same distribution')
else:
 print('Probably different distributions')

stat=-0.326, p=0.748
Probably the same distribution


In [38]:
## using pingouin

print(pg.ttest(data1, data2))

               T  dof alternative    p-val          CI95%  cohen-d   BF10  \
T-test -0.325616   18   two-sided  0.74847  [-1.29, 0.94]  0.14562  0.413   

           power  
T-test  0.060981  


## 2. Paired T-test
Tests whether the means of two paired samples are significantly different.

Assumptions:

* Observations in each sample are independent and identically distributed (iid).
* Observations in each sample are normally distributed.
* Observations in each sample have the same variance.
* Observations across each sample are paired.

Interpretation:

* H0: the means of the samples are equal.
* H1: the means of the samples are unequal.

In [39]:
# Example of the Paired Student's t-test
from scipy.stats import ttest_rel
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = ttest_rel(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably the same distribution')
else:
 print('Probably different distributions')

stat=-0.334, p=0.746
Probably the same distribution


In [40]:
print(pg.ttest(data1, data2))

               T  dof alternative    p-val          CI95%  cohen-d   BF10  \
T-test -0.325616   18   two-sided  0.74847  [-1.29, 0.94]  0.14562  0.413   

           power  
T-test  0.060981  


## 3. Analysis of Variance Test (ANOVA)
Tests whether the means of two or more independent samples are significantly different.

Assumptions:

* Observations in each sample are independent and identically distributed (iid).
* Observations in each sample are normally distributed.
* Observations in each sample have the same variance.

Interpretation:

* H0: the means of the samples are equal.
* H1: one or more of the means of the samples are unequal.

In [41]:
# Example of the Analysis of Variance Test
from scipy.stats import f_oneway
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204]
stat, p = f_oneway(data1, data2, data3)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably the same distribution')
else:
 print('Probably different distributions')

stat=0.096, p=0.908
Probably the same distribution


In [43]:
import pandas as pd

In [47]:
# Read an example dataset
df = pg.read_dataset('mixed_anova')

# Run the ANOVA
aov = pg.anova(data=df, dv='Scores', between='Group', detailed=True)
print(aov)

   Source          SS   DF        MS         F   p-unc       np2
0   Group    5.459963    1  5.459963  5.243656  0.0232  0.028616
1  Within  185.342729  178  1.041251       NaN     NaN       NaN


In [48]:
df.head()

Unnamed: 0,Scores,Time,Group,Subject
0,5.971435,August,Control,0
1,4.309024,August,Control,1
2,6.932707,August,Control,2
3,5.187348,August,Control,3
4,4.779411,August,Control,4


In [49]:
df1 = pd.DataFrame({'Data1': data1, 'Data2': data2, 'Data3': data3})

df1.head()

Unnamed: 0,Data1,Data2,Data3
0,0.873,1.142,-0.208
1,2.817,-0.432,0.696
2,0.121,-0.938,0.928
3,-0.945,-0.729,-1.148
4,-0.055,-0.846,-0.213


In [52]:
## using pingouin

aov = pg.anova(data=df1, dv="Data1", between="Data2", detailed=True)
print(aov)

   Source         SS  DF        MS  np2
0   Data2  19.101819   9  2.122424  1.0
1  Within   0.000000   0       NaN  NaN


  mserror = sserror / ddof2


## 2. Repeated Measures ANOVA Test
Tests whether the means of two or more paired samples are significantly different.

Assumptions:

* Observations in each sample are independent and identically distributed (iid).
* Observations in each sample are normally distributed.
* Observations in each sample have the same variance.
* Observations across each sample are paired.

Interpretation:

* H0: the means of the samples are equal.
* H1: one or more of the means of the samples are unequal.

In [53]:
pg.rm_anova(data=df, dv='Scores', within='Time', subject='Subject', detailed=True)

Unnamed: 0,Source,SS,DF,MS,F,p-unc,ng2,eps
0,Time,7.628428,2,3.814214,3.912796,0.022629,0.039981,0.998751
1,Error,115.027023,118,0.974805,,,,


# Nonparametric Statistical Hypothesis Tests

## 1. Mann-Whitney U Test
Tests whether the distributions of two independent samples are equal or not.

Assumptions:

* Observations in each sample are independent and identically distributed (iid).
* Observations in each sample can be ranked.

Interpretation:

* H0: the distributions of both samples are equal.
* H1: the distributions of both samples are not equal.

In [54]:
# Example of the Mann-Whitney U Test
from scipy.stats import mannwhitneyu
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = mannwhitneyu(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably the same distribution')
else:
 print('Probably different distributions')

stat=40.000, p=0.473
Probably the same distribution


## 2. Wilcoxon Signed-Rank Test
Tests whether the distributions of two paired samples are equal or not.

Assumptions:

* Observations in each sample are independent and identically distributed (iid).
* Observations in each sample can be ranked.
* Observations across each sample are paired.

Interpretation:

* H0: the distributions of both samples are equal.
* H1: the distributions of both samples are not equal.

In [55]:
# Example of the Wilcoxon Signed-Rank Test
from scipy.stats import wilcoxon
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = wilcoxon(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably the same distribution')
else:
 print('Probably different distributions')

stat=21.000, p=0.557
Probably the same distribution


## 3. Kruskal-Wallis H Test
Tests whether the distributions of two or more independent samples are equal or not.

Assumptions:

* Observations in each sample are independent and identically distributed (iid).
* Observations in each sample can be ranked.

Interpretation:

* H0: the distributions of all samples are equal.
* H1: the distributions of one or more samples are not equal.

In [56]:
# Example of the Kruskal-Wallis H Test
from scipy.stats import kruskal
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = kruskal(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably the same distribution')
else:
 print('Probably different distributions')

stat=0.571, p=0.450
Probably the same distribution


## 4. Friedman Test
Tests whether the distributions of two or more paired samples are equal or not.

Assumptions:

* Observations in each sample are independent and identically distributed (iid).
* Observations in each sample can be ranked.
* Observations across each sample are paired.

Interpretation:

* H0: the distributions of all samples are equal.
* H1: the distributions of one or more samples are not equal.

In [57]:
# Example of the Friedman Test
from scipy.stats import friedmanchisquare
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204]
stat, p = friedmanchisquare(data1, data2, data3)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
 print('Probably the same distribution')
else:
 print('Probably different distributions')

stat=0.800, p=0.670
Probably the same distribution
