<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>

### Kolmogorov-Smirnov test

Test the null hypothesis that two independent smaples are drawn from the same distribution. It is a nonparametric test. It works by comparing the **emperical distribution functions (EDF)** of two samples. [Reference](http://www.stats.ox.ac.uk/~massa/Lecture%2013.pdf). 


$F(x) = \frac{\# \ observations\ below \ x}{\# \ observations}$ - Empirical distribution function

In its original form, we compare the EDF of data $F_{obs}$ with the expected CDF of null distribution, $F_{exp}$: <br>
$D_{n} = max_{x}|F_{exp}(x) − F_{obs}(x)|$

In 2-sample KS test, the statistic becomes: <br>
$D_{m, n} = max_{x}|F_{1, n}(x) − F_{2, m}(x)|$

Critical value
$D_{crit} = c(\alpha) * \sqrt{\frac{n + m}{n * m}}$ If $D_{m, n}$ > $D_{crit}$, then reject the null hypothesis.
[Reference](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test)

In [9]:
from scipy import stats
import numpy as np

In [10]:
# For a different distribution, we reject the null hypothesis since the pvalue is below 1%:
np.random.seed(46)

n1 = 200
n2 = 300  

s1 = stats.norm.rvs(size=n1, loc=0., scale=1)
s2 = stats.norm.rvs(size=n2, loc=0.5, scale=1.5)

stats.ks_2samp(s1, s2)

Ks_2sampResult(statistic=0.19, pvalue=0.0003047192071684579)

In [11]:
# For a slightly different distribution, we cannot reject the null hypothesis at a 10% or lower alpha since the p-value at 0.38 is higher than 10%
s3 = stats.norm.rvs(size=n2, loc=0.01, scale=1.0)
stats.ks_2samp(s1, s3)

Ks_2sampResult(statistic=0.08166666666666667, pvalue=0.38607884035069273)

In [12]:
# For an identical distribution, we cannot reject the null hypothesis since the p-value is high.
s4 = stats.norm.rvs(size=n2, loc=0.0, scale=1.0)
stats.ks_2samp(s1, s4)

Ks_2sampResult(statistic=0.09166666666666666, pvalue=0.25476533491263986)

### Mann-Whitney U test

The test is used to investigate whether two independent samples were selected from populations having the same distribution. It is a nonparametric test - null hypothesis: it is equally likely that a randomly selected value from on population will be less than or greater than a randomly selected value from a second population.