[Back to Table of Contents](https://www.shannonmburns.com/Psyc158/intro.html)

[Previous: Chapter 19 - Model Bias](https://colab.research.google.com/github/smburns47/Psyc158/blob/main/chapter-19.ipynb)

In [45]:
# Run this first so it's ready by the time you need it
install.packages("dplyr")
install.packages("ggformula")
library(dplyr)
library(ggformula)
trackscores <- read.csv("https://raw.githubusercontent.com/smburns47/Psyc158/main/trackscores.csv")

Installing package into ‘/Library/Frameworks/R.framework/Versions/4.2-arm64’
(as ‘lib’ is unspecified)




The downloaded binary packages are in
	/var/folders/mg/1wy1xcls587_h0tqnj42l5740000gn/T//Rtmpo3HL6S/downloaded_packages


Installing package into ‘/Library/Frameworks/R.framework/Versions/4.2-arm64’
(as ‘lib’ is unspecified)




The downloaded binary packages are in
	/var/folders/mg/1wy1xcls587_h0tqnj42l5740000gn/T//Rtmpo3HL6S/downloaded_packages


# Chapter 20 - Traditional Statistical Tools

## 20.1 Different teaching philosophies

In this course you have spent a long time learning about how to make predictions and do inference with the general linear model. However, most statistics students don't learn this framework for doing analysis. Instead, they learn an assortment of different statistical procedures like the t-test, ANOVA, chi-square, etc. Indeed many practicing researchers also use these tools instead of the GLM.

So why learn the GLM at all if not as many people use it? There are two pedagogical reasons for why this course emphasized this method instead of the traditional content:

1) **Traditional tools require learning how to do hypothesis testing first** - Traditional methods fundamentally depend on Null Hypothesis Significance testing. They start with defining a null hypothesis and cannot be interpreted without computing a t/F score and finding a p-value. This means to learn how to use these tools, students first have to learn about sampling distributions and significance testing. This is a rather abstract thing to learn first and impedes understanding for many people. This leads them to using intellectual short cuts like "always look for p < 0.05" despite the dangers of overrelying on that number. 

2) **Traditional tools are more variable, harder to remember** - When first starting out, the list of all the common statistical tools often seems daunting. They are calculated all different ways with different names and it's hard to remember what goes where. In contrast, the GLM is ultimatly one tool - $Y_i = b_0 + b_1X_i + e_i$. How many predictors you add to the equation and how you interpret the coefficients varies depending on your use situation, but just knowing about this one equation gets you 80% of the way there, making it a better starting point for building conceptual understanding.

Despite the advantages of the GLM however, many people still use the traditional tools. Thus, it is good for you to be able to identify them and understand how they map onto what you have learned with the GLM. Ultimately, the GLM and these tools will give you the same answer and you can choose which you prefer in your own research. Knowing both will enable you to read and understand research reports no matter what method the authors used. You may even find that most of these tools build on concepts you've already learned. They're just a different way of showing that information. 

## 20.2 T-tests 

### One-sample t-test

Previously in chapter 11, we were introduced to the idea of a t-test. To explain it in more depth, let's start with the **one-sample t-test**. 

Let's say you are a track coach and you're assessing the progress of your runners' training. They all recorded their best times for running the 400m when they initially joined the team. Now after a month of training with track drills, you want to know if they, as a group, have improved. 

Here is some data from your team. We will filter it down to just those scores at tryouts and 1 month later at the first meet, then calculate their change in running times. Negative scores mean they got faster (less time to complete the 100m), positive scores mean they ran slower.

In [50]:
#take a look at how this dataset was organized
head(trackscores)

tryout_scores <- filter(trackscores, session=="tryouts", condition=="track-training")
meet1_scores <- filter(trackscores, session=="meet1", condition=="track-training")

change_scores <- meet1_scores$time - tryout_scores$time

Unnamed: 0_level_0,athlete,session,condition,weight_training,track_training,time
Unnamed: 0_level_1,<int>,<chr>,<chr>,<int>,<int>,<dbl>
1,1,tryouts,no-training,0,0,48.5
2,2,tryouts,no-training,0,0,48.87
3,3,tryouts,no-training,0,0,49.28
4,4,tryouts,no-training,0,0,46.97
5,5,tryouts,no-training,0,0,48.18
6,6,tryouts,no-training,0,0,50.36


If our question is restricted to just our team of runners and no one else, we can figure out their improvement really easily. We simply find the mean of their change scores:

In [51]:
mean(change_scores)

Great! This number means that, on average, our team ran faster than before after training for a month. If we only care about evaluating our current team, we can stop here and don't need to use any more statistics. 

But let's say our question goes beyond just our team. If we were to recruit new runners the following year, would they also improve their scores with this training regimen? Is it good enough to use for everyone?

Now we are hypothesizing about data we don't have. At this point we need inferential statistics, and a one-sample t-test can help.

A one-sample t-test answers a particular kind of question: is the mean of some data $\bar{X}$ likely or unlikely to be from a population with a specific $\mu$? In other words, if we think the population $\mu$ is a particular value and our data sample is drawn from that population, is it reasonable or surprising to find that our sample's mean is the value we measured? 

For how this translates to our specific example: is our data sample likely to happen even in a population of runners who on average didn't improve? 

In order to do a one-sample t-test to answer this question, we first need to set down our idea about what $\mu$ should be. This is specifying the null hypothesis. For our particular example, the data we are evaluating are improvements in running time. We want to know if these data came from a population of scores where there's no improvement. In such a population some people might run a bit faster than their initial time, some a bit slower, but on average there's no change. Thus, to set our null hypothesis: 

$$H_0: \mu = 0$$

When we calculated our data sample earlier, it was about -0.32, not 0. Does that mean we can conclude these runners did not come from a population of no improvement? 

Not necessarily. Remember, due to random sampling, it is frequently possible to get a sample mean that is not the same value as the population mean:

In [52]:
set.seed(10)
sim_sample <- rnorm(10, mean=0, sd=1) #random sample of 10 scores from population with mean 0
mean(sim_sample)

Thus we need a way refining our question: given a population $\mu$ *and* an expected variation among samples from that population, is our sample really surprising or not?

Enter the t-value. It is a way of scaling the difference between a sample and hypothesized population mean by the standard error of that population. For a one-sample t-test, this is calculated as: 

$$t = \frac{\bar{X} - \mu}{SEM}$$

If you recall from chapter 15, SEM in turn is calculated as:

$$SEM = \frac{\hat{σ}}{\sqrt{N}}$$

All this together means that a t-value will be larger when there's a bigger difference between our sample mean and the hypothesized population mean, or when the standard error is smaller. 

The type of t-score we can get falls somewhere in the t distribution that we first encountered in chapter 16:

<img src="images/ch16-tdist.png" width="600">

You'll notice that the exact shape of the t distribution changes based on the degrees of freedom for our sample. That's because higher sample size Ns reduce the standard error, making it easier to get large t values even when a sample really is from a population with mean $\mu$. The cumulative probability outside of our t-value is the corresponding p-value for that t-score, so t-scores in smaller sample sizes have larger p-values because there's a relatively larger amount of the distribution that is more extreme than that value compared to the t-distribution made of larger sample sizes.  

Altogether, this means that to answer our question about whether our data likely come from a population of no improvement, we need to calculate a t-score and its associated p-value. The larger this t-score is, the more suprising it would be to draw a sample with our mean given a true population $\mu = 0$. If the associated p-value is <0.05, we would decide it's so surprising that our sample probably came from a different sort of population instead - one with some alternative $\mu$ that suggests improvement due to our training regimen. 

R provides us with a built-in function to quickly compute a one sample t-test:

In [54]:
t.test(change_scores)


	One Sample t-test

data:  change_scores
t = -1.7664, df = 9, p-value = 0.1111
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.72751897  0.08951897
sample estimates:
mean of x 
   -0.319 


The output of this function gives us information about the calculated t-score (-1.7664), the degrees of freedom of the test (9, which is N-1), and the p-value (0.1111). It also gives us a 95% CI for our estimate of the mean, [-0.73, 0.09]. Note that a t-value can be positive or negative, signifying if our sample mean is higher or lower than the hypothesized population mean.

Based on these results, there is not enough evidence to reject the null hypothesis that our training regimen results in no improvement. In APA format, we would write these results as "The mean change in running time was -0.32 seconds [-0.73, 0.09], but there was no significant improvement in running time for this sample of runners (*t(9)* = -1.77, *p* = 0.11). 

The different parts of doing a t-test (calculating a mean, null hypothesis, standard error, and p-value) are all concepts you've learned before. For this reason, a t-test is actually just another way of getting to the same answer as we previously did with the general linear model. Check out the t and p values in the model output below, and compare them to the results of the above t-test:

In [55]:
summary(lm(change_scores ~ NULL))


Call:
lm(formula = change_scores ~ NULL)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.7510 -0.3685 -0.1510  0.2415  1.1690 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.3190     0.1806  -1.766    0.111

Residual standard error: 0.5711 on 9 degrees of freedom


In the null model of the GLM framework, we are estimating the sample mean as $b_0$ and evaluating if it is significantly different from 0. In a one-sample t-test, we are evaluating whether a sample mean $\bar{X}$ is significantly different from the null hypothesis $\mu = 0$. These are the exact same question, just posed different ways. Thus, you can get the same result using a null model form of the GLM or the one-sample t-test.

However, we should also point out that in the null model, the significance of $b_0$ is specifically compared in the context of a null hypothesis where $\beta_0 = 0$. In the one-sample t-test, we can be more general that this. We don't have to restrict ourselves to a null hypothesis of 0. For instance, let's imagine that instead of running improvement, we are interested in whether our team's new running times are significantly better than the [Division III track and field recruiting cut off](https://www.ncsasports.org/mens-track-and-field/scholarship-standards) of 51.76. First we calculate the mean of everyone's new running times:

In [57]:
mean(meet1_scores$time)

Then we specify a null hypothesis:

$$H_0: \mu = 51.76$$

This is a different null hypothesis than before. Now, we are wondering if our sample is likely to come from DIII track and field athletes, or if we should conclude they come from an even faster population.

To specify a null hypothesis in a t-test that is different than 0, add the ```mu=``` argument to the ```t.test()``` with this specific null hypothesis value.

In [58]:
t.test(meet1_scores$time, mu=51.76)


	One Sample t-test

data:  meet1_scores$time
t = -9.5046, df = 9, p-value = 5.453e-06
alternative hypothesis: true mean is not equal to 51.76
95 percent confidence interval:
 47.20909 48.95891
sample estimates:
mean of x 
   48.084 


Based on these results, our sample mean of 48.08 would be very unusual if it were drawn from a population with $\mu = 51.76$ (p < 0.001). Thus we reject this null hypothesis in favor of the alternative hypothesis that our team is from a faster population.  

### independent-samples t-test

A one sample t-test is used when you have one group of data, and want to understand how likely it is that that one group came from a particular population. If you have two separate groups of data and want to be able to distingush them from each other, this calls for a different kind of t-test called an **independent samples t-test** or **two-sample t-test**. 

Let's say you're still the track coach, but this time you want to compare how two different kinds of training impact your athletes. This time, you have half the team run track drills for a month and half the team do weights in the gym. Afterwards, you assess their running time on the 400m at their first meet. In this situation you no longer have one group of data like last time. You have two groups who went through different experiences that might make them different from each other. 

In [59]:
track_training <- filter(trackscores, session=="meet1", condition=="track-training")
weight_training <- filter(trackscores, session=="meet1", condition=="weight-training")

An independent samples t-test answers the question "are these two groups likely drawn from the same population?" The null hypothesis of this question would be: 

$$H_0: \mu_1 = \mu_2$$

Where $\mu_1$ is the population from which group 1 came, and $\mu_2$ is the population from which group 2 came. The alternative hypothesis, then, is that they come from different populations.

An independent samples t-score is calculated a bit differently than a one-sample t-score. This one comes out to: 

$$ t = \frac{\bar{X}_1 - \bar{X}_2}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} $$

where $s_p$ stands for the *pooled* standard deviation, a way of combining the standard deviations of two samples:

$$s_p = \sqrt{\frac{(n_1 - 1)s^2_1 + (n_2 - 1)s^2_2}{n_1 + n_2 + 2}} $$

While this equation is more complicated, in general it gets larger based on the same factors as the one-sample t-test. You will get a larger t-score, and thus more likely to reject the null hypothesis, if there is a large difference between the group means *or* if the sample sizes are large. 



In [60]:
t.test(track_training$time, weight_training$time)


	Welch Two Sample t-test

data:  track_training$time and weight_training$time
t = -0.10901, df = 16.815, p-value = 0.9145
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.0796673  0.9736673
sample estimates:
mean of x mean of y 
   48.084    48.137 


In [62]:
trainingdata <- filter(trackscores, session=='meet1',
                       (condition=='track-training' | condition=='weight-training'))
summary(lm(time ~ condition, data = trainingdata))


Call:
lm(formula = time ~ condition, data = trainingdata)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7370 -0.7115  0.1945  0.6955  1.7660 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)               48.0840     0.3438 139.861   <2e-16 ***
conditionweight-training   0.0530     0.4862   0.109    0.914    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.087 on 18 degrees of freedom
Multiple R-squared:  0.0006597,	Adjusted R-squared:  -0.05486 
F-statistic: 0.01188 on 1 and 18 DF,  p-value: 0.9144


## 20.3 ANOVA tests


## 20.4 Chi-square


## 20.5 Repeated measures tests


[Next: Chapter 21 - Alternate Approaches - Bayesian Statistics](https://colab.research.google.com/github/smburns47/Psyc158/blob/main/chapter-21.ipynb)