# Problem 9

## Recalling and formatting the two types of errors

Recall Type I and Type II errors:

- Type I error: the null hypothesis is true but our test rejects it. This would be when we observe $T_n(\omega)\geq c$, where $\omega$ is the actual world, and where the application of our rule leads us to mistakenly conclude that the null hypothesis is false.
- Type II error: the null hypothesis is false but our test fails to reject it. This would be when we observe $T_n(\omega)<c$, where $\omega$ is the actual world, and so our test does not rule out the null hypothesis even though it is false.

Here is a diagram of the Type I and Type II errors:

|          | $H_0$ false | $H_0$ true | 
| -------- | -------- | -------- | 
| Reject $H_0$    |    | Type I error 
| Fail to reject $H_0$    | Type II error   |

It is conventional to let

- $\alpha$ be the probability of a Type I error 
- $\beta$ be the probability of a Type II error

Technically, $\beta$ only makes sense once one has fixed a specific alternative to the null hypothesis, but is often used despite this. 

It is further conventional to call $1-\beta$ the *power* of the test.

Hence, with this notation in place, we can fill in the probabilities to the chart:

|          | $H_0$ false | $H_0$ true | 
| -------- | -------- | -------- | 
| Reject $H_0$    |  $1-\beta$ (power)  | Type I error $\alpha$
| Fail to reject $H_0$    | Type II error $\beta$  | $1-\alpha$

Note that in evaluating these probabilities, we are conditioning on the column headers ($H_0$ false, $H_0$ true) rather than on the row headers. Hence, the values in each of the columns adds up to one. If we add one more row that highlights when the non-error happened in each column, we get:

|          | $H_0$ false | $H_0$ true | 
| -------- | -------- | -------- | 
| Reject $H_0$    |  $1-\beta$ (power)  | Type I error $\alpha$
| Fail to reject $H_0$    | Type II error $\beta$  | $1-\alpha$
| The probability of non-error                        | $1-\beta$ | $1-\alpha$





## Sensitivity and specificity

In the analysis of medical diagonsis tests, there is a similar rubric with its own conventional terminology:

|          | Condition positive | Condition negative | 
| -------- | -------- | -------- | 
| Test positive    |  True positive $TP$ | False positive $FP$
| Test negative    | False negative $FN$ | True negative $TN$
| The frequency of non-error                        | sensitivity $\frac{TP}{TP+FN}$ | specificity $\frac{TN}{FP+TN}$

A couple of notes:

- Unlike the above tables, the values $TP, FN, FP, TN$ are counts of outcomes and hence natural numbers rather than probabilities. 

- A test which has high sensitivity but low specificity has fewer false negatives but more false positives.

- A test which has low sensitivity but high specificity has more false negatives and fewer false positives.  

Here are some simple examples:

1. Digital mammograms have high sensitivity but lower specificity (see [this paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5991925/))

2. Over the counter covid tests have lower sensitivity but high specificity (see bottom of [this nytimes page](https://www.nytimes.com/wirecutter/reviews/at-home-covid-test-kits/))



There are two options for this problem. 

Option 1: [Mayo's idea](https://logic-teaching.github.io/philstatsbook/Chap08.html#the-chief-virtue-of-mayo-s-account) is that one should defend the classical view by arguing that as a general rule, we should accept only those hypotheses which have undergone a severe test in the sense that they have low probability of a type I error. Under the above correspondence (borne out in part by Option 2), this would suggest that, as a general rule, we should accept only those hypotheses which have undergone a test with high specificity or a test with very few false positives. Write a 1 paragraph essay which evaluates whether this is plausible, bearing in mind that one will have to have a more expansive notion of "condition" that works outside of health contexts.

Option 2: Given a medical diagnosis test as above with patients being the union of TP, FN, FP, TN, construct a hypothesis test 

- whose power $1-\beta$ is the sensitivity of the diagnosis test
- whose $1-\alpha$ is the specificity of the diagnosis test. 

This shows that the hypothesis testing is a more general concept than the medical diagonsis case. 

*Hint (kind of an outline of a solution)*: make the sample space the union of the four sets  $TP$, $FN$, $FP$, $TN$ (draw a picture of them as four quadrants of a square). Let $X$ be the random variable which is $1$ on $TP\cup FP$ and 0 on $FN\cup TN$. Consider a two element parameter space $\theta^-$ (condition negative) and $\theta^+$ (condition positive). Define  

- $p_{\theta^-}(1)=\frac{FP}{FP+TN}$ and $p_{\theta^-}(0)=1-p_{\theta^-}(1)$
- $p_{\theta^+}(0)=\frac{FN}{TP+FN}$ and $p_{\theta^+}(1)=1-p_{\theta^+}(1)$.

Consider the test $T\geq c$, where $T=X$ and $c=1$, so that the test is $X\geq 1$. Then show 

- if $X\sim p_{\theta^-}$ and $\alpha=P(T\geq c)$, then $1-\alpha =  \frac{TN}{FP+TN}$
- if $X\sim p_{\theta^+}$ and $\beta = P(T<c)$, then $1-\beta = \frac{TP}{TP+FN}$.

*Optionally*, further show that there are probability measures $P^{-}$ and $P^+$ on the sample space such that $P^{-}(X=i)=p_{\theta^-}(i)$ and $P^{+}(X=i)=p_{\theta^+}(i)$ for $i=0,1$. Hint, for this, make $P^{-}$ and $P^+$ have probability zero on certain columns (as depicted in the four quadrants).