# T-Tests and P-Values

Let's say we're running an A/B test. We'll fabricate some data that randomly assigns order amounts from customers in sets A and B, with B being a little bit higher:

In [1]:
import numpy as np
from scipy import stats

A = np.random.normal(26.0,5.0,10000)
B = np.random.normal(25.0,5.0,10000)

stats.ttest_ind(A,B)

Ttest_indResult(statistic=12.942377594452902, pvalue=3.693057978254679e-38)

The t-statistic is a measure of the difference between the two sets expressed in units of standard error. Put differently, it's the size of the difference relative to the variance in the data. A high t value means there's probably a real difference between the two sets; you have "significance". The P-value is a measure of the probability of an observation lying at extreme t-values; so a low p-value also implies "significance. For a statistically significant result, the p-value has to be low and a high t-statistic (well, a high absolute value of the t-statistic more precisely). In the real world, statisticians seem to put more weight on the p-value result.

Let's change things up so both A and B are just random, generated under the same parameters. So there's no "real" difference between the two:

In [3]:
B = np.random.normal(25.0,5.0,10000)
stats.ttest_ind(A,B)

Ttest_indResult(statistic=13.537842691344727, pvalue=1.4256659202072425e-41)

In [3]:
A = np.random.normal(25.0, 5.0, 100000)
B = np.random.normal(25.0, 5.0, 100000)

stats.ttest_ind(A, B)

Ttest_indResult(statistic=0.7294273914972799, pvalue=0.4657411216867331)

Our p-value actually got a little lower, and the t-test a little larger, but still not enough to declare a real difference.

In [11]:
A = np.random.normal(25.0,5.0,1000000)
B = np.random.normal(25.0,5.0,1000000)

stats.ttest_ind(A,B)


Ttest_indResult(statistic=0.1640702822743969, pvalue=0.8696758326450706)

If we compare the same set to itself, by definition we get a t-statistic of 0 and p-value of 1:

In [5]:
stats.ttest_ind(A, A)

Ttest_indResult(statistic=0.0, pvalue=1.0)

The threshold of significance on p-value is really just a judgment call. As everything is a matter of probabilities, therefore it can never be said that an experiment's results are "significant".