# 5. Experimentation Basics: $t$-tests

This notebook will follow [this](https://en.wikipedia.org/wiki/Student%27s_t-test) pretty closely as it has one of the best explanations to the $t$-test. The goal is to simplify it even further and tie it back to A/B tests.

## $t$-test Introduction

A $t$-test is any hypothesis test where the test statistic follows a Student's $t$-distribution under the null.

It is often used when the test statistic *would* follow a normal distribution if the standard deviation (population) was known but is not - therefore must be estimated based on the observed data, when this occurs, under certain conditions the data is proven to follow a Student's $t$-distribution. Thus, the $Z$-test often yields similar results to the $t$-test, and converges as the sample size increases.

Questions such as "why or when would we ever know if our data fits that criteria" will be answered below!

## Student's $t$-distribution

The $t$-distribution is a continuous probability distribution that resembles the normal distribution, except that it is typically a little fatter and lower (i.e., heavier tails). The $t$-distribution, also referred to as $t_\nu$, has a parameter $\nu$ that controls this width. For example, $\nu=1$ means that $t_\nu$ becomes what's known as a standard *Cauchy distribution*, which has fat tails, so fat in that it has an undefined mean despite being symmetric and "bell"-shaped. When $\nu\rightarrow\infty$, it becomes the standard normal distribution ($\mathcal{N}(0,1)$). Therefore, the higher the $\nu$, the "thinner" the distribution gets. Due to this parameter, the $t$-distribution is referred to as a generalization of the standard normal distribution.

The $t$-distribution has the following PDF:

$$f(t)=\frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\pi\nu}\Gamma(\frac{\nu}{2})}\left(1+\frac{t^2}{\nu}\right)^{-(\nu+1)/2}$$

where the $\nu$ parameter is the number of *degrees of freedom* and $\Gamma$ is the gamma function, defined as:

$$\Gamma(n)=(n-1)!$$


## The uses of the $t$-test

$t$-tests are used for one-sample and two-sample tests, similar to the $Z$-test's usecases.

- A **one-sample** $t$-test: tests whether the mean of a population has a value specified in a null hypothesis
- A **two-sample** $t$-test: tests if the means of two populations are equal. When the variance of the two populations are assumed to be equal, it is called a "Student's $t$-test", but when this assumptions is dropped it is called "Welch's $t$-test". These tests are often referred to as **unpaired* or *independent samples* $t$-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping, such as in a stereotypical A/B test.


## Assumptions of the $t$-test

While the previous section mentioned when we use this test, there should still be some leftover questions such as, "these sound similar to a $Z$-test, when should we use one over the other?", or even "When would these assumptions even ever be fulfilled?". This section should demystify some of these questions.

Speaking generally, most test statistics actually have the form $t=Z/s$, where $Z$ and $s$ are both functions of the data. $s$ acts as a scaling parameter that allows the distribution of $t$ to be determined.

As an example, in a one-sample $t$-test,

$$t=\frac{Z}{s}=\frac{\bar X-\mu}{\hat\sigma/\sqrt{n}}$$

This should look similar to the $Z$ statistic, except that instead of a known population standard deviation $\sigma$, we instead use $\hat\sigma$, which is an *estimate* of the standard deviation of the population, because one differentiating assumption with the $t$-test is that we assume to not know what the population standard deviation is.

The assumptions for the one-sample $t$-test are:
- $\bar X$ follows a normal distribution $\mathcal{N}(\mu, \sigma^2/n)$
- $s^2(n-1)\sigma^2$ follows a $\chi^2$ distribution with $n-1$ degrees of freedom. This assumption is met when the observations used for estimating $s^2$ come from a normal distribution (and i.i.d. for each group).
- $Z$ and $s$ are independent.

For a two-sample $t$-test,
- the means of the two populations should both follow normal distributions. As discussed previously, this often follows in large samples due to the central limit theorem even if the population being sampled is not normal.
- The two populations being compared are assumed to have the same variance if using the original Student's $t$-test, although there are variations of this test that relax that assumption (Welch's $t$-test).
- The data is sampled independently from the two populations being compared or be fully paired.

From the above, we can see that given a stereotypical A/B test, the assumptions above should generally hold, especially the normality assumption, as typical A/B tests have higher sample sizes than how these methods were traditionally derived. Regarding the assumption that variance following $\chi^2$, the *Slutsky's theorem* implies that the distribution of the sample variance has little effect on the distribution of the test statistic.

## Paired vs Unpaired two-sample $t$-tests

A stereotypical A/B test may be an "unpaired $t$-test". *Paired* $t$-tests are special designs that lead to higher power than unpaired tests because of less noise. A paired test is when the two datasets (control and treatment) have an obvious and meaningful one-to-one correspondence. For example, if the two datasets we are comparing are a "before" and "after", then each person is compared to themselves. One has to be careful in which tests to use in different paired test designs though.

## Summary of the above

To summarize the above, a typical A/B test will be a unpaired two-sample $t$-test or a two-sample $Z$-test. The threshold for which to use matters less and less as the sample size grows, and is often not a consideration in A/B tests where samples are in the thousands.