# T-test: From scratch and in-built way

Imagine we are running a A/B testing. Our data is extracted from a population which is normally distributed and each one of our samples have the same size N= 30. Our group A is normally distributed with mean 2 and variance 1 and our group B have a standard normal distribution. 

Suppose that we want to test the following hypothesis:

$$ H_0 : \bar{\mu_A} = \bar{\mu_B}$$



$$ H_1 : \bar{\mu_A} \neq \bar{\mu_B}$$

Where $\bar{\mu_i}$ represents the mean of the group $i$.  If we set a level of significance $\alpha= 0.05$, should we reject the null hypothesis?

Just for comprehend what is going on behind scenes, we will compute the two-sided t-statistic and p-value form scratch and using in-built functions in scipy from two groups which are i.i.d.

In [1]:
import numpy as np
from scipy import stats

We are going to generate some random data which will represent values of random variables normally distributive with mean 2 in group A and mean 0 in group B. Both samples samples will have variance 1.

In [2]:
N = 30
a = np.random.randn(N) + 2
b = np.random.randn(N)

In [3]:
# degrees of freedom = 1 because we are considering sample standard deviations
var_a = a.var(ddof = 1)
var_b = b.var(ddof = 1)

# pooled standard deviation
s = np.sqrt((var_a + var_b)/2)

Finally we can compute the two-sided t-statistic as we have unknown means and known and equal variances and our population is normally distributed:

In [4]:
t = (a.mean() -b.mean())/ (s * np.sqrt(2/N))

Now we will calculate the degrees of freedom:

In [5]:
df = 2* N -2

And finally we can compute the p-value:

In [6]:
#computing cdf for t-distribution with 2*N-2 degrees of freedom at value t

p = 1 - stats.t.cdf(t, df= df)

print("t-statistic: {}".format(t))

print("p-value: {}".format(2*p))

t-statistic: 6.1482039131818755
p-value: 7.754518538405364e-08


Now we will compute the same using the p in-built function in python:

In [7]:
t2 , p2 = stats.ttest_ind(a, b)

print("t-statistic: {}".format(t2))

print("p-value: {}".format(p2))

t-statistic: 6.148203913181875
p-value: 7.754518546132632e-08


As $p << 0.05$, we have that there is a high probability of having difference in the means of groups A and B. Under this basis, we can reject the null hypothesis $H_0$ in this case.