content/solutions/3_Introduction_to_hypothesis_testing_via_binomial_test_solutions.Rmd

---
title: "3. Introduction to hypothesis testing via binomial tests"
bibliography: ../references.bib
---

<!-- COMMENT NOT SHOW IN ANY OUTPUT: Code chunk below sets overall defaults for .qmd file; these inlcude showing output by default and looking for files relative to .Rpoj file, not .qmd file, which makes putting filesin different folders easier  -->

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_knit$set(root.dir = rprojroot::find_rstudio_root_file())
source("../globals.R")

```


Remember you should

* add code chunks by clicking the *Insert Chunk* button on the toolbar or by
pressing *Ctrl+Alt+I* to answer the questions!
* use visual mode or **render** your file to produce a version that you can see!
* **render** your file to make sure it runs (and that you haven't been working
out of order)
* save your work often 
  * **commit** it via git!
  * **push** updates to github
  
## Overview

This practice reviews the [Hypothesis testing starting with binomial tests
lecture](../chapters/Binomial.qmd){target="_blank"}.

## Hypothesis Testing and the Binomial Distribution

### Example

Using the bat paper from class (Geipel et al. 2021), let's consider how to analyze
data showing all 10 bats chose the walking over the motionless model.  

```{r}
binom.test(10,10)
```
We use the binom.test function. We only need arguments for # of succeses and #
of trials. By default it runs a 2-sided test against a null hypothesis value of 
p = .5. You can see how to update thee options by 
looking at the help file.

```{r, eval=F}
?binom.test
```


Note the confidence interval is assymetric since its estimated to be 1! We can see
other options using the binom.confint function from the *binom* package.

```{r}
library(binom)
binom.confint(10,10)
```

All of these correct for the fact that most intervals use a normal approximation,
which as you remember from our earlier discussions is not good when sample sizes 
are small and/or the p parameter is extreme (close to 0 or 1).

## Practice!

Make sure you are comfortable with null and alternative hypotheses for all examples.

### 1

Are people eared (do they prefer one ear or another)?  Of 25 people observed 
while in conversation in a nightclub, 19 turned their right ear to the speaker 
and 6 turn their left ear to the speaker.  How strong is the evidence for 
eared-ness given this data (adapted from Analysis of Biological Data)?

* state a null and alternative hypothesis
  + *H~o~:  proportion of right-eared people is equal to .5*
  + *H~a~: proportion of right-eared people is note equal to .5*

* calculate a test statistic (signal) for this data
```{r}
19/25 #sample proportion
```
*The signal from the data is the proportion of right-eared people  `r 19/25`*

* Make you understand how to construct a null distribution
  + using sampling/simulation (code or written explanation)
```{r}
sampling_experiment = rbinom(10000, 25, .5)
hist(sampling_experiment, breaks = 0:25, xlab = "# of Right-eared people out of 25", ylab = "Probability of being drawn \n from population of p = 0.5", cex.main = 2, cex.axis = 1.5, cex.lab = 2)

```
  
  + by using an appropriate distribution (code or written explanation)
```{r}
using_distribution = dbinom(0:25,25,.5)
using_distribution
sum(using_distribution)
Number_righteared = c(0:25)
pdf = data.frame(Number_righteared, using_distribution)
plot(0:25, using_distribution)
```

*Each of these show the expected distribution of signal under the null hypothesis.
Note this implies multiple samples are taken.  This is theory that underlies NHST
(null hypothesis significance testing) and definition of p-value (coming up!).*
  
* Calculate and compare p-values obtained using 
  + simulation (calculation won’t be required on test, but make sure you understand!) (code or written explanation)
```{r}
length(sampling_experiment[sampling_experiment >= 19 | sampling_experiment <= 6])/length(sampling_experiment)
```
  
  + equations for binomial distribution (code or written explanation)
```{r}
(1-pbinom(18,25,.5)) * 2
```
  
  + R functions (required)(code)
```{r}
binom.test(19,25, p=.5)

```
*Note we can calculate a p-value using the simulated distribution, the actual
distribution (which is exact in this case), and the test (which is usign the 
actual distribution!).*

* Calculate a 95% confidence interval for the proportion of people who are right-eared

```{r}
library(binom)
binom.confint(x=19, n=25, alpha=.05, method="all") #use Agresti-coull 
#or
binom.confint(x=19, n=25, alpha=.05, method="agresti-coull")
```
*Our 95% CI is .562 - .888.  Note it does not include .5!*

* How do your 95% confidence interval and hypothesis test compare?

*The p-value from all methods are <.05, so I reject the null hypothesis that the proportion of right-eared people is equal to .5. The 95% 5% CI is .562 - .888.  Note it does not include .5!*

### 2

A professor lets his dog take every multiple-choice test to see how it 
compares to his students (I know someone who did this).  Unfortunately, the
professor believes undergraduates in the class tricked him by helping the dog 
do better on a test. It’s a 100 question test, and every questions has 4 answer 
choices.  For the last test, the dog picked 33 questions correctly.  How likely
is this to happen, and is there evidence the students helped the dog?	

MAKE SURE TO THINK ABOUT YOUR TEST OPTIONS 

```{r}
#use sided test as you only care if students helped the dog
binom.test(33,100, alternative="greater", p=.25)
```
*I chose to use a sided test since the professor wants to know if the students helped the dog.  
I found a p-value of .04, so I reject the null hypothesis that the proportion 
of correct answers is .25 (what I would expect by chance).*