In this tutorial, we are going to see how to use R to run different statistical tests to compare the values in one variable with respect to a null hypothesis. 

# Continuous variable (one sample t-test)

Let's say that we have a sample dataset with the IQ scores of 35 subjects that live in a remote village. We want to see how the IQ scores in that village compare with the known average IQ of people (110).    

In [1]:
set.seed(123)

N<-35
X.1<-rnorm(N, mean = 115, sd=15)
mu.0<-110

Let's first build our t-statistic. If you remember from the slides, this is computed as follows:

$$t=\frac{<X> - \mu_0}{\sigma/\sqrt{N}}$$

In [2]:
mean.x <- mean(X.1)
std.x <- sd(X.1)
t<- (mean.x - mu.0)/(std.x/sqrt(N))
t

Now that we have our statistic, we need to check whether that value lies within the critical region so that if it does, we will reject the null hypothesis. 



If you remember, the critical region at a certain $\alpha$ is specified by a certain cut-off $t_\alpha$ such that everything observed that lies above or below will make us reject the null hypothesis.  



Now, where to define the critical region depends on our research question. For example:

1. Is the IQ of this village smaller than the theoretical IQ of 110?

This is a one-sided research question and therefore we will check whether our observed values live in the LEFT tail of the hypothesized distribution. That is, to reject the null, we need to check whether our observed $t$ satisfies the condition $t

2. Is the IQ of this village greater than the theoretical IQ of 110?

This is another one-sided research question, but here we will check whether our observed values live in the RIGHT tail of the hypothesized distribution. That is, to reject the null, we need to check whether our observed $t$ satisfies the condition $t>t_\alpha$.

3. Is the IQ of this village different than the theoretical IQ of 110?

This, instead, is a two-sided research question, so we will check whether our observed values live either in the RIGHT or LEFT tail of the hypothesized distribution to reject the null. However, since we are testing twice (one for the left tail and another for the right tail), we need to adjust our cut-offs. In this case, instead of $t_\alpha$, we need to use $t_{\alpha/2}$, so our conditions for rejecting the null would be whether $|t| > |t_{\alpha/2}|$ 

In order to compute these thresholds, we will make use of the quantile functions in the t-distributions, that is, we will use the function `qt`. Let's see these cut-offs for each of the above questions are then:

In [3]:
# Here, by setting the argument lower.tail = TRUE,
# We are computing the cumulative probability starting from the left
t05.1<-qt(0.05, df=N-1, lower.tail = TRUE) 
t05.1

For this case, since t is not smaller than $t_\alpha$, we have to stick to the null.

In [4]:
t05.2<-qt(0.05, df=N-1, lower.tail = FALSE)
t05.2

For this case, since t is  greater than $t_\alpha$, we can reject the null.

In [5]:
t05.3<-qt(0.05/2, df=N-1)
t05.3

For this case, since |t| is  greater than $|t_{\alpha/2}|$, we can reject the null as well.

All of the above conclusions could've been reached as well using p-values. In this case, we need to compute the probability of getting a value as or more extreme than the observed one and, if it is below $\alpha$, we reject the null. Calculating probabilities under the t-distribution can be achieved very easily by means of the `pt` function.

In [6]:
p.1<-pt(t, df=N-1, lower.tail = TRUE)
p.1

In [7]:
p.2<-pt(t, df=N-1, lower.tail = FALSE)
p.2

In [8]:
p.3<-2*pt(abs(t), df=N-1, lower.tail = FALSE)
p.3

All of the above is great, but we won't need to be doing all these calculations anytime we want to run a one sample t-test. We can just use the function R built-in function `t.test`. Let's have a look at its documentation:

In [9]:
?t.test

As you can see, in order to be able to run a one sample t-test on a given sample, we will just need to specify the assumed population mean (our null hypothesis) and whether the alternative hypothesis involves testing a greater or smaller value (one-sided), or both (two-sided). 

Finally, any time that you run this (and any other) statistical test in R, you will get a list with different results.

In [10]:
res.1<-t.test(X.1, mu=110, alternative = "less")
res.1

res.2<-t.test(X.1, mu=110, alternative = "greater")
res.2

res.3<-t.test(X.1, mu=110, alternative = "two.sided")
res.3


	One Sample t-test

data:  X.1
t = 2.3186, df = 34, p-value = 0.9867
alternative hypothesis: true mean is less than 110
95 percent confidence interval:
     -Inf 119.6196
sample estimates:
mean of x 
 115.5628 



	One Sample t-test

data:  X.1
t = 2.3186, df = 34, p-value = 0.01328
alternative hypothesis: true mean is greater than 110
95 percent confidence interval:
 111.5059      Inf
sample estimates:
mean of x 
 115.5628 



	One Sample t-test

data:  X.1
t = 2.3186, df = 34, p-value = 0.02656
alternative hypothesis: true mean is not equal to 110
95 percent confidence interval:
 110.6870 120.4385
sample estimates:
mean of x 
 115.5628 


In [11]:
names(res.1)

In [12]:
res.1$statistic
res.1$p.value

In [13]:
res.2$statistic
res.2$p.value

In [14]:
res.3$statistic
res.3$p.value

<div class="alert alert-info"> <b>Practice</b>: In one of the first tutorials of the course, we used a data frame "sleep", which had a column, "extra", indicating the increase in hours of sleep after supplying a couple of drugs to 20 patients. Let's test whether our observed data statistically support that the drugs indeed caused an average increase in hours of sleep. Assume, for this, that the data follow a gaussian distribution:
   <ol>
       <li> First, before doing any calculation, write down (in a markdown cell) your null hypothesis and alternative statements.</li> 
       <li> Using the R function `pt` (the cumulative probability function for the Student's t distribution), compute the p-value for the population mean of the sleep variable, i.e. calculate the probability of getting a t-statistic equal or greater than the observed one. </li>
       <li> Repeat the same as the previous point, but using the function `t.test` instead. </li>       
       </ol>
    
At a significance level (or type I error) $\alpha$=0.05, would you say that the observed data (our sleep variable) are compatible with an average increase in hours of sleep caused by the drugs?   
</div>

In [15]:
head(sleep)

Unnamed: 0_level_0,extra,group,ID
Unnamed: 0_level_1,<dbl>,<fct>,<fct>
1,0.7,1,1
2,-1.6,1,2
3,-0.2,1,3
4,-1.2,1,4
5,-0.1,1,5
6,3.4,1,6


Your response 1 here

In [38]:
# Your response 2

In [39]:
# Your response 3

# Tests for a categorical variable

For one categorical variable, we will be testing how likely the observed numbers of occurrences per category are with respect to a null hypothesis of expected ones.

In order to test this, we will be using either a $\chi^2$-test or binomial test. In both cases, we will need to pass the number of occurrences per category. 

Since the binomial test only works for variables with 2 categories, then it will be enough to just pass the number in one category.

For the  $\chi^2$-test, however, you will need to pass the number of occurrences in every single category as a vector. You can easily get this using the function `table`. Let's see an example:

In [18]:
cat.var<-c("a", "a", "a", "b", "b", "c")
table(cat.var)

cat.var
a b c 
3 2 1 

## $\chi^2$-test 

For one variable, it is also known as Pearson's $\chi^2$-test or a goodness-of-fit test. It compares the observed occurrences to a null hypothesis of probabilities per category. In R, we can perform this test with the R built-in function `chisq.test`. Let's have a look at the documentation: 

In [19]:
?chisq.test

The first argument, x, is compulsory and will be a vector with the number of occurrences per category. As we said earlier, given a variable, we can easily calculate this using the function `table`. 

The other input to specify in this function will be the occurrence probability per category. This argument is optional, such that if not is specified, the same probability per category will be considered. For example, if we have a variable with 3 different categories, then it will be considered that each has 1/3 occurrence probability.   

For this tutorial, let's generate a vector of 50 observations consisting of three categories (happy, neutral, and sad), with more or less the same number of occurrences in each of them. 

In [20]:
set.seed(123)

X<-sapply(c(1:50), function(x) sample(c("happy","neutral","sad"), size=1))
table(X)

X
  happy neutral     sad 
     16      15      19 

In [21]:
res<-chisq.test(table(X))
res


	Chi-squared test for given probabilities

data:  table(X)
X-squared = 0.52, df = 2, p-value = 0.7711


Assuming a significance level $\alpha$ = 0.05, we can't reject the null.

The conclusion would be different if, for example, one of the categories (e.g. happy) had more occurrences than the other two.

In [22]:
set.seed(123)

X<-sapply(c(1:35), function(x) sample(c("happy","neutral","sad"), size=1))
X<-c(rep("happy", 15), X)
table(X)

X
  happy neutral     sad 
     25      11      14 

In [23]:
res<-chisq.test(table(X))
res


	Chi-squared test for given probabilities

data:  table(X)
X-squared = 6.52, df = 2, p-value = 0.03839


Here, p $\leq$ 0.05, so we would reject the null hypothesis that all categories are equally probable, given what we observe.

As with `t.test`, running this test yields a list with several results, but most notably, the observed statistic and the p-value.

In [24]:
names(res)

In [25]:
res$statistic
res$p.value

Obviously, hadn't we assumed that the proportions across the three categories should be equal , i.e. p=(1/3, 1/3, 1/3) in this case, results would be different. For example, let's say we knew that we had more chances of being 'happy', so we assumed that the number of cases in this category should also be greater, for example: p=c(1/2, 1/4, 1/4).

In [26]:
chisq.test(table(X), p = c(1/2, 1/4, 1/4))


	Chi-squared test for given probabilities

data:  table(X)
X-squared = 0.36, df = 2, p-value = 0.8353


In this case, as we changed the null hypothesis, our p-value also changed.

<div class="alert alert-info"> <b>Practice</b>: In the lecture we ran the following practice example: 

"A random sample of 150 recent donations at a certain blood bank reveals that 82
were type A blood. At a significance level α = 0.05, does this suggest that the
actual percentage of type A donations differs from 40%, the percentage of the
population having type A blood?"
    
And we came to the conclusion that we could reject the null. We did this by computing the $\chi^2$ statistic and checking whether it was inside or outside the critical region. Try to arrive at the same conclusion but using now `chisq.test`function.
</div>

In [27]:
# You may use this data for this practice question
set.seed(1234)
bloodtype.dat<-data.frame(bloodtype=sample(c(rep("A", 82), rep("notA", 68)), 150))
head(bloodtype.dat)

Unnamed: 0_level_0,bloodtype
Unnamed: 0_level_1,<chr>
1,A
2,A
3,notA
4,notA
5,notA
6,notA


## Binomial test

R has a built-in function for running the binomial test, the function `binom.test`. Let's have a look at its documentation: 

In [28]:
?binom.test

Since this test works only for two categories, it is enough to pass the number of observed occurrences in one category (x in this function), n, the total number of data points, and p, for the proportion given by the null hypothesis.  If this last one is not supplied, an equal probability in both categories, i.e. 0.5, is assumed.

Let's see the use of this function for a vector of observations from tossing a fair coin 20 times.

In [29]:
set.seed(123)

X<-sapply(c(1:20), function(x) sample(c("head","tail"), size=1))
table(X)

X
head tail 
  11    9 

As you can see, we have a similar number of occurrences of heads and tails in these 20 tosses, so if we assume that the coin was fair (p=0.5, equal probability for both outcomes), then we should see a result that makes us stick with the null.

In [30]:
# Here using the number of occurrences in tails
binom.test(x = 9, n = 20)


	Exact binomial test

data:  9 and 20
number of successes = 9, number of trials = 20, p-value = 0.8238
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.2305779 0.6847219
sample estimates:
probability of success 
                  0.45 


In [31]:
# but it would be same using the number of occurrences in the other category.
binom.test(x = 11, n = 20)


	Exact binomial test

data:  11 and 20
number of successes = 11, number of trials = 20, p-value = 0.8238
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.3152781 0.7694221
sample estimates:
probability of success 
                  0.55 


As before, changing the null hypothesis also changes the p-values. For example, let's suppose that we were assuming the coin was unfair and had more chances of giving heads, so that instead of p=0.5 for heads, say we are assuming the probability was p=0.2. That means that in 20 coin tosses, we were expecting to see only 4 heads. However, we observed 9, which is more than we expected under the assumed probability (our Null hypothesis). 

In [32]:
# Here using the number of occurrences in tails
binom.test(x=9, n=20, p=0.2)


	Exact binomial test

data:  9 and 20
number of successes = 9, number of trials = 20, p-value = 0.009982
alternative hypothesis: true probability of success is not equal to 0.2
95 percent confidence interval:
 0.2305779 0.6847219
sample estimates:
probability of success 
                  0.45 


As a result, under this null hypothesis, p-value $\leq$ 0.05, so we reject it.

In [33]:
# the same conclusion concentrating on the number of occurrences in the other category.
binom.test(x=11, n=20, p=0.8)


	Exact binomial test

data:  11 and 20
number of successes = 11, number of trials = 20, p-value = 0.009982
alternative hypothesis: true probability of success is not equal to 0.8
95 percent confidence interval:
 0.3152781 0.7694221
sample estimates:
probability of success 
                  0.55 


On the other hand, it is important to remind you that the binomial test can be one-sided, in contrast to the $\chi^2$-test. This means that we can test whether the observed occurrences are greater or less than an assumed value.

Here our null hypothesiss is $p \leq 0.2$. As a result, given what we observe, we are going to reject it.

In [34]:
binom.test(9, 20, p = 0.2, alternative = "greater")


	Exact binomial test

data:  9 and 20
number of successes = 9, number of trials = 20, p-value = 0.009982
alternative hypothesis: true probability of success is greater than 0.2
95 percent confidence interval:
 0.2586506 1.0000000
sample estimates:
probability of success 
                  0.45 


Here our null is $p\geq 0.2$. So from what we observe, we will stick to this null.

In [35]:
binom.test(9, 20, p = 0.2, alternative = "less")


	Exact binomial test

data:  9 and 20
number of successes = 9, number of trials = 20, p-value = 0.9974
alternative hypothesis: true probability of success is less than 0.2
95 percent confidence interval:
 0.0000000 0.6530686
sample estimates:
probability of success 
                  0.45 


**N.B.** Finally, it is important to note that in the case of two categories, you may use the  $\chi^2$-test as well (so long as you have enough expected occurrences in every single category). However, it will give a different result with respect to the binomial test.

In [36]:
binom.test(x = 9, n = 20)


	Exact binomial test

data:  9 and 20
number of successes = 9, number of trials = 20, p-value = 0.8238
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.2305779 0.6847219
sample estimates:
probability of success 
                  0.45 


In [37]:
chisq.test(table(X))


	Chi-squared test for given probabilities

data:  table(X)
X-squared = 0.2, df = 1, p-value = 0.6547


The difference between both tests is particularly important for small sample sizes, where the binomial test, being an exact test, has more power than the $\chi^2$-test. As a consequence, in these cases, the binomial test is more recommended. For large sample sizes, both should give you similar answers.