### Two-sided t-tests

T-test involve the means of two independent samples of scores.

This is a two-sided test with the null hypothesis that two independent samples have identical expected values. This test assumes that the populations have identical variances by default. More details are [here](https://en.wikipedia.org/wiki/Student%27s_t-test#Independent_two-sample_t-test).

For samples with different variance, refer to [Welch's T-test](https://en.wikipedia.org/wiki/Welch%27s_t-test).

In [4]:
from scipy import stats
import numpy as np

np.random.seed(46)

#### Implement with scipy library

In [20]:
# T-test assumes two sample comes from normal distribution
s1 = stats.norm.rvs(loc=100,scale=10,size=800)
s2 = stats.norm.rvs(loc=100,scale=10,size=800)

stats.ttest_ind(s1,s2)

Ttest_indResult(statistic=-0.08135664525905334, pvalue=0.9351685148627957)

In [21]:
# Let's have a case where variance is different, or size is different.
s3 = stats.norm.rvs(loc=100,scale=15,size=800)
s4 = stats.norm.rvs(loc=100,scale=15,size=900)

stats.ttest_ind(s1,s3, equal_var = False)

Ttest_indResult(statistic=0.18577519012602514, pvalue=0.8526477285619974)

In [22]:
stats.ttest_ind(s1,s4, equal_var = False)

Ttest_indResult(statistic=0.7729981657989475, pvalue=0.43963779971856365)

### Levene test

Tests the null hypothesis that k independent samples are from populations with equal variance. It is also called test of homogenity of variance. 

$H_0:$ ${\sigma}_1^2 = {\sigma}_2^2 = ... = {\sigma}_k^2$ <br>
$H_a:$ At least a pair of (${\sigma}_i^2$, ${\sigma}_j^2$) is unequal

Test statistics: <br>
Let N be total numer of samples in k subgroups, $N_i$ be number of samples in each subgroup.

$$W = \frac{N-k}{k-1} \frac{\sum_{i=1}^k N_i(\bar{Z_i}.-\bar{Z..})^2}{\sum_{i=1}^k \sum_{j=1}^{N_i}(Z_{ij}-\bar{Z_i.})^2}$$

where $Z_{ij} = |Y_{ij} - \bar{Y_{i.}}|$, where $\bar{Y_{i.}}$ is mean of ith subgroup.<br>

It is like a ratio between-sample variance and within-sample variance.<br>
If strong assumption can be made about normality of data, then it is advised to use [Barlett test](https://www.itl.nist.gov/div898/handbook/eda/section3/eda357.htm).

In [8]:
# Let's create two samples a and b that are from two separate population with the same variance, 
# and another sample c, with different variance 
a = stats.norm.rvs(loc=0,scale=10,size=100)
b = stats.norm.rvs(loc=10,scale=10,size=100)
c = stats.norm.rvs(loc=0,scale=30,size=100)

stat, p = stats.levene(a, b)
print(p)

0.3893547376663349


In [9]:
stat, p = stats.levene(b, c)
print(p)

7.027152470153796e-14


In [10]:
[np.var(x, ddof=1) for x in [a, b, c]]

[110.17941157100952, 113.87984456169812, 1071.0150891672524]