### Reference:
* https://towardsdatascience.com/inferential-statistics-series-t-test-using-numpy-2718f8f9bf2f

In [1]:
## Import the packages
import numpy as np
from scipy import stats


## Define 2 random distributions
#Sample Size
N = 10
#Gaussian distributed data with mean = 2 and var = 1
a = np.random.randn(N) + 2
#Gaussian distributed data with with mean = 0 and var = 1
b = np.random.randn(N)


## Calculate the Standard Deviation
#Calculate the variance to get the standard deviation

#For unbiased max likelihood estimate we have to divide the var by N-1, and therefore the parameter ddof = 1
var_a = a.var(ddof=1)
var_b = b.var(ddof=1)

#std deviation
s = np.sqrt((var_a + var_b)/2)
s

1.2976678998605544

In [2]:
## Calculate the t-statistics
t = (a.mean() - b.mean())/(s*np.sqrt(2/N))



## Compare with the critical t-value
#Degrees of freedom
df = 2*N - 2

#p-value after comparison with the t 
p = 1 - stats.t.cdf(t,df=df)


print("t = " + str(t))
print("p = " + str(2*p))
### You can see that after comparing the t statistic with the critical t value (computed internally) we get a good p value of 0.0005 and thus we reject the null hypothesis and thus it proves that the mean of the two distributions are different and statistically significant.


## Cross Checking with the internal scipy function
t2, p2 = stats.ttest_ind(a,b)
print("t = " + str(t2))
print("p = " + str(p2))

t = 3.320943603694394
p = 0.0038013734909294605
t = 3.320943603694394
p = 0.0038013734909295377


### t-test vs ks test:
* https://towardsdatascience.com/when-to-use-the-kolmogorov-smirnov-test-dd0b2c8a8f61
* https://towardsdatascience.com/kolmogorov-smirnov-test-84c92fb4158d
* https://stackoverflow.com/questions/10884668/two-sample-kolmogorov-smirnov-test-in-python-scipy

- you can compare the p-value to a level of significance alpha, usually alpha = 0.05 or 0.01 (you decide, the lower a is, the more significant). 
- **If p-value is lower than alpha, then it is very probable that the two distributions are different**

In [3]:
from scipy.stats import ks_2samp
import numpy as np
np.random.seed(12345678)

x = np.random.normal(0, 1, 1000)
y = np.random.normal(0, 1, 1000)
z = np.random.normal(1.1, 0.9, 1000)

In [4]:
ks_2samp(x, y)

KstestResult(statistic=0.023, pvalue=0.9542189106778983)

In [5]:
ks_2samp(x, z)

KstestResult(statistic=0.418, pvalue=1.2040448267583641e-78)

### x and z distributions are different