# Notebook for Exploring Statistical Tests

In [20]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ks_2samp, chisquare

## Kolmogorov-Smirnov Test

The KS test is a _nonparametric_ test of the equality of continuous (or discontinuous), one dimensional probability distributions. There is a two-sample KS test that compares equality between two separate distributions.

- Null Hypothesis: the two distributions are identical
- Alternative Hypothesis: the two distributions are NOT identical 

In [18]:
np.random.seed(0)

# generate two datasets from different distributions 
data1 = np.random.randn(100) # standard normal distribution
data2 = np.random.lognormal(3, 1, 100) # log-normal distribution
data3 = np.random.randn(100) # standard normal distribution

# perform Kolmogorov-Smirnov test
ks_2samp(data1, data2)

KstestResult(statistic=0.99, pvalue=4.417521386399011e-57)

Since the **p_value** here is less than 0.05 (subjective), we reject the null hypothesis - therefore, we have sufficient evidence that the samples do NOT come from the same distribution, as expected.

## Chi-Square Test

The Chi-Square test can be used to decide whether a relationship exists between observed and expected frequencies in a categorical variable of a given population.

- Null Hypothesis: The categorical variables have no relationship (independent)
- Alternative Hypothesis: A relationship exists between the variables (dependent)

In [13]:
f_observed = [16, 18, 16, 14, 12, 12]
f_expected = [16, 16, 16, 16, 16, 8]

chisquare(f_observed, f_exp=f_expected)

Power_divergenceResult(statistic=3.5, pvalue=0.6233876277495822)

Since the **p_value** here is greater than 0.05 (subjective), we accept the null hypothesis - therefore, we have sufficient evidence that the samples do NOT come from the same distribution.