In [33]:
from scipy import stats
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

A T-Test is used to decide whether the differences between two sample sets of data are significantly different. The test can either be one or two-tailed and this is related to the question being asked. If you have 2 sets of grades from students and you ask "Is class one significantly (higher or lower) to class 2", then the test is one-tailed. If you ask "Are the results between class one and two significantly different?", then it would be two-tailed as it is open-ended, either class could be higher or lower. It also matters if the data is paired or unpaired, an example of paired would be the same persons height being measured at different ages, these values are related and connected due to being the same person at different times in their lives. Unpaired could be a comparison between two classes of students, as neither student is in both classes it does not matter the order of the data and whether they are grouped together or not, thus they are unpaired

The tests that I have imported above are of either unpaired or paired (stats.ttest_ind and stats.ttest_rel), but they are both two-tailed so they work for open-ended questions. ttest_ind is independent, so unpaired and ttest_rel is related so paired. A t-test will result in a p-value (probability value) to determine whether the differences are significant. Within science, a p-value of less than 0.05 shows that the results have a more than 95% chance of being significant, this confidence interval is the most common however some may ask for a value of less than 0.01 for extreme certainty. For a t-test to work, both sample populations must have at least 10 samples, however they do not have to be the same size, this is a useful test for small sampling populations but the accuracy will improve with more samples.

I will run some t-tests on some data to get a sense of how they work. 

In [10]:
grasshoppers = pd.DataFrame({
    'Length': [15,17,24,22,19,22,16,20,24,17,15,23,21,17,19,24,20,19,16,15],
    'Sex': ['Male', 'Male','Male','Male','Male','Male','Male','Male','Male','Male', 
            'Female','Female','Female','Female','Female','Female','Female','Female','Female','Female',]
})

In [11]:
grasshoppers

Unnamed: 0,Length,Sex
0,15,Male
1,17,Male
2,24,Male
3,22,Male
4,19,Male
5,22,Male
6,16,Male
7,20,Male
8,24,Male
9,17,Male


I've split the above dataframe into the lengths of the grasshoppers based on sex to see if there is a significant difference between them. As we can see they have very similar means, but are they signficantly different?

In [29]:
males = grasshoppers[(grasshoppers['Sex'] == 'Male')]
males = males['Length']
print(males)
print(males.mean())

0    15
1    17
2    24
3    22
4    19
5    22
6    16
7    20
8    24
9    17
Name: Length, dtype: int64
19.6


In [30]:
females = grasshoppers[(grasshoppers['Sex'] == 'Female')]
females = females['Length']
print(females)
print(females.mean())

10    15
11    23
12    21
13    17
14    19
15    24
16    20
17    19
18    16
19    15
Name: Length, dtype: int64
18.9


Here I run the independent t-test as the samples are unpaired, it results in a pvalue of 0.635 which is a 63.5% chance that the results are not significant or a 36.5% chance that they are significant which is not high enough to be scientific. 

In [34]:
stats.ttest_ind(males, females)

Ttest_indResult(statistic=0.48266297757416876, pvalue=0.6351527881055652)

Here I have an example of paired data, as this is comparing the heights of named individuals at different ages, thus the data is related so we use stats.ttest_rel. We can also see that the means are very different, however, is this significant?

In [36]:
year_height_diff = pd.DataFrame({
    'Name': ['Elliot', 'Steve', 'Joss', 'Abbie', 'Wendy', 'Halle', 'Milo', 'Emma', 'Nigel', 'Peter'],
    'Age at 5': [65,58,54,69,75,53,49,73,56,67],
    'Age at 15': [150,135,178,147,137,155,173,129,144,168]
})
year_height_diff

Unnamed: 0,Name,Age at 5,Age at 15
0,Elliot,65,150
1,Steve,58,135
2,Joss,54,178
3,Abbie,69,147
4,Wendy,75,137
5,Halle,53,155
6,Milo,49,173
7,Emma,73,129
8,Nigel,56,144
9,Peter,67,168


In [37]:
print(year_height_diff['Age at 5'].mean())
print(year_height_diff['Age at 15'].mean())

61.9
151.6


Running the t-test results in a very small pvalue far below the threshold of 0.05. Therefore the results are statistically significant!

In [38]:
print(stats.ttest_rel(year_height_diff['Age at 5'], year_height_diff['Age at 15']))

Ttest_relResult(statistic=-12.209010653171209, pvalue=6.645772210450116e-07)


In conclusion, we have investigated what situations a t-test could be used for and also the different parameters such as unpaired and paired, one or two-tailed and the confidence thresholds required for a result to be statistically significant. This method is an attempt to stop us relying on location estimates such as means to generalize an answer between two sets of data and instead have an empirically backed number to tell us whether the differences are legitimate or due to random chance.   