# Student T-Test

## Was there a statistically significant change?

The student's t-test is comparison of means between two distributions to see if they are different.

The population has a mean.
Various samples of that population will have slightly different means.
So if we take a sample from a control group, and a sample from a group with some change, how do we know if the mean is different because of sampling variances, or if the second mean is actually different (that is, the population mean shifted due to the change)?

This is what the t-test measures.

### t-score
A t-score is ratio that looks at different within groups compared to different between groups. 

A score of 3 means that the groups are 3x different from each other as within each other. This helps us see if the group is truly different, or just falling within the control groups normal variance.

### p-value

P value is the probability that the results were by chance, or normal variation within a group.

### Types

* Independent t-test: compares means of two groups
* Paired t-test: compares mean of a single group at different times
* One Sample t-test: compares mean of a group to a known population mean

## Independent

In [40]:
import numpy as np
import scipy.stats as ss
from scipy.stats import ttest_ind

In [41]:
# Variables
alpha = 0.05

In [42]:
np.random.seed(1)

# slightly different mean
sample1 = 5 * np.random.randn(100) + 50
sample2 = 5 * np.random.randn(100) + 51

In [43]:
# Sample means
mean1, mean2 = np.mean(sample1), np.mean(sample2)
print(mean1, mean2)

# Sample standard deviations
std1, std2 = np.std(sample1, ddof=1), np.std(sample2, ddof=1)
print(std1, std2)

# Sample standard errors
se1, se2 = ss.sem(sample1), ss.sem(sample2)
print(se1, se2)

50.30291426037849 51.763973888101
4.4480773365620605 4.6834501758393845
0.4448077336562061 0.4683450175839384


In [44]:
# Standard Error on the different
sed = (se1**2 + se2**2)**0.5
print(sed)

0.6459109655487124


In [45]:
# t statistic
t_stat = (mean1 - mean2)/sed

# degress of freedom
df = len(sample1) + len(sample2) - 2

print(t_stat, df)

-2.2620139704259556 198


In [46]:
# Critical value
cv = ss.t.ppf(1 - alpha, df)
print(cv)

# calculate P value
p = (1 - ss.t.cdf(abs(t_stat), df))*2
print(p)

1.6525857836172075
0.024782819014639745


### Rules / Results

In [47]:
if abs(t_stat) <= cv: 
    print('Accept null, means are equal')
else:
    print('Reject null, means are not equal')

Reject null, means are not equal


In [48]:
if p > alpha: 
    print('Accept null, means are equal')
else:
    print('Reject null, means are not equal')

Reject null, means are not equal


## Dependent

This is where it is the same sample, pre and post "change"

In [51]:
n = len(sample1)

# Sample standard deviations
sum_squared_differences = sum([(sample1[i]-sample2[i])**2 for i in range(n)])
sum_differences = sum([(sample1[i]-sample2[i]) for i in range(len(sample1))])
sd = ((sum_squared_differences - (sum_differences**2 / n)) / (n - 1))**0.5

# Standard Error on the different
sed = sd / n**0.5

print(sed)

0.6159867778587226


In [53]:
# t statistic
t_stat = (mean1 - mean2)/sed

# degress of freedom
df = len(sample1) - 1

print(t_stat, df)

-2.3719009567078753 99


In [54]:
# Critical value
cv = ss.t.ppf(1 - alpha, df)
print(cv)

# calculate P value
p = (1 - ss.t.cdf(abs(t_stat), df))*2
print(p)

1.6603911559963895
0.019630798337125777


### Rules / Results


In [55]:
if abs(t_stat) <= cv: 
    print('Accept null, means are equal')
else:
    print('Reject null, means are not equal')

Reject null, means are not equal


In [56]:
if p > alpha: 
    print('Accept null, means are equal')
else:
    print('Reject null, means are not equal')

Reject null, means are not equal
