generated from nmfs-opensci/NOAA-quarto-simple
-
Notifications
You must be signed in to change notification settings - Fork 32
/
3_Introduction_to_hypothesis_testing_via_binomial_test_solutions.Rmd
158 lines (119 loc) · 5.63 KB
/
3_Introduction_to_hypothesis_testing_via_binomial_test_solutions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
title: "3. Introduction to hypothesis testing via binomial tests"
bibliography: ../references.bib
---
<!-- COMMENT NOT SHOW IN ANY OUTPUT: Code chunk below sets overall defaults for .qmd file; these inlcude showing output by default and looking for files relative to .Rpoj file, not .qmd file, which makes putting filesin different folders easier -->
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_knit$set(root.dir = rprojroot::find_rstudio_root_file())
source("../globals.R")
```
Remember you should
* add code chunks by clicking the *Insert Chunk* button on the toolbar or by
pressing *Ctrl+Alt+I* to answer the questions!
* use visual mode or **render** your file to produce a version that you can see!
* **render** your file to make sure it runs (and that you haven't been working
out of order)
* save your work often
* **commit** it via git!
* **push** updates to github
## Overview
This practice reviews the [Hypothesis testing starting with binomial tests
lecture](../chapters/Binomial.qmd){target="_blank"}.
## Hypothesis Testing and the Binomial Distribution
### Example
Using the bat paper from class (Geipel et al. 2021), let's consider how to analyze
data showing all 10 bats chose the walking over the motionless model.
```{r}
binom.test(10,10)
```
We use the binom.test function. We only need arguments for # of succeses and #
of trials. By default it runs a 2-sided test against a null hypothesis value of
p = .5. You can see how to update thee options by
looking at the help file.
```{r, eval=F}
?binom.test
```
Note the confidence interval is assymetric since its estimated to be 1! We can see
other options using the binom.confint function from the *binom* package.
```{r}
library(binom)
binom.confint(10,10)
```
All of these correct for the fact that most intervals use a normal approximation,
which as you remember from our earlier discussions is not good when sample sizes
are small and/or the p parameter is extreme (close to 0 or 1).
## Practice!
Make sure you are comfortable with null and alternative hypotheses for all examples.
### 1
Are people eared (do they prefer one ear or another)? Of 25 people observed
while in conversation in a nightclub, 19 turned their right ear to the speaker
and 6 turn their left ear to the speaker. How strong is the evidence for
eared-ness given this data (adapted from Analysis of Biological Data)?
* state a null and alternative hypothesis
+ *H~o~: proportion of right-eared people is equal to .5*
+ *H~a~: proportion of right-eared people is note equal to .5*
* calculate a test statistic (signal) for this data
```{r}
19/25 #sample proportion
```
*The signal from the data is the proportion of right-eared people `r 19/25`*
* Make you understand how to construct a null distribution
+ using sampling/simulation (code or written explanation)
```{r}
sampling_experiment = rbinom(10000, 25, .5)
hist(sampling_experiment, breaks = 0:25, xlab = "# of Right-eared people out of 25", ylab = "Probability of being drawn \n from population of p = 0.5", cex.main = 2, cex.axis = 1.5, cex.lab = 2)
```
+ by using an appropriate distribution (code or written explanation)
```{r}
using_distribution = dbinom(0:25,25,.5)
using_distribution
sum(using_distribution)
Number_righteared = c(0:25)
pdf = data.frame(Number_righteared, using_distribution)
plot(0:25, using_distribution)
```
*Each of these show the expected distribution of signal under the null hypothesis.
Note this implies multiple samples are taken. This is theory that underlies NHST
(null hypothesis significance testing) and definition of p-value (coming up!).*
* Calculate and compare p-values obtained using
+ simulation (calculation won’t be required on test, but make sure you understand!) (code or written explanation)
```{r}
length(sampling_experiment[sampling_experiment >= 19 | sampling_experiment <= 6])/length(sampling_experiment)
```
+ equations for binomial distribution (code or written explanation)
```{r}
(1-pbinom(18,25,.5)) * 2
```
+ R functions (required)(code)
```{r}
binom.test(19,25, p=.5)
```
*Note we can calculate a p-value using the simulated distribution, the actual
distribution (which is exact in this case), and the test (which is usign the
actual distribution!).*
* Calculate a 95% confidence interval for the proportion of people who are right-eared
```{r}
library(binom)
binom.confint(x=19, n=25, alpha=.05, method="all") #use Agresti-coull
#or
binom.confint(x=19, n=25, alpha=.05, method="agresti-coull")
```
*Our 95% CI is .562 - .888. Note it does not include .5!*
* How do your 95% confidence interval and hypothesis test compare?
*The p-value from all methods are <.05, so I reject the null hypothesis that the proportion of right-eared people is equal to .5. The 95% 5% CI is .562 - .888. Note it does not include .5!*
### 2
A professor lets his dog take every multiple-choice test to see how it
compares to his students (I know someone who did this). Unfortunately, the
professor believes undergraduates in the class tricked him by helping the dog
do better on a test. It’s a 100 question test, and every questions has 4 answer
choices. For the last test, the dog picked 33 questions correctly. How likely
is this to happen, and is there evidence the students helped the dog?
MAKE SURE TO THINK ABOUT YOUR TEST OPTIONS
```{r}
#use sided test as you only care if students helped the dog
binom.test(33,100, alternative="greater", p=.25)
```
*I chose to use a sided test since the professor wants to know if the students helped the dog.
I found a p-value of .04, so I reject the null hypothesis that the proportion
of correct answers is .25 (what I would expect by chance).*