<a href="https://colab.research.google.com/github/gibsonea/Biostats/blob/main/Labs/17_Error_Types_and_Power_of_Tests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <a name="25intro">6.5 Error Types and Power of Tests</a>

---

The result of a hypothesis test has one of two possibilities:

-   If <font color="mediumseagreen">**p-value $\leq \alpha$, we reject $H_0$**</font>, and we have enough evidence to support the claim in $H_a$.
-   If <font color="tomato">**p-value $> \alpha$, we fail to reject $H_0$**</font>. However, in this case we do not accept $H_0$. The test is inconclusive.

As with confidence intervals, it is possible we do all of our analysis perfectly without any mistakes, but our conclusion is incorrect due to the variability in sampling. Even if we choose our sample randomly, there is a small possibility we are unlucky and pick a sample that is not representative of the population and arrive at an incorrect conclusion.


In this section, we explore the following questions:

-   What type of errors are possible with hypothesis testing?
-   What are the practical implications of making errors?
-   How can we calculate the probability of *correctly* rejecting $H_0$?

# <a name="25error-types">Type I and Type II Errors</a>

---

There are two possible errors in a hypothesis test:

-   A <font color="dodgerblue">**type I error**</font> occurs if we incorrectly reject $H_0$ when it is true.
    -   This is known as a <font color="dodgerblue">**false positive**</font>.
-   A <font color="tomato">**type II error**</font> occurs if we incorrectly fail to reject $H_0$ when it is false.
    -   This is known as a <font color="tomato">**false negative**</font>.


For example, when a jury is deciding a case in court, the hypotheses would be:

- $H_0$: The accused person is innocent (we assume the person is innocent).
- $H_a$: The accused person is guilty (requires evidence beyond a reasonable doubt).

A jury can make two possible errors:

- If they falsely convict an innocent person, they have made a <font color="dodgerblue">type I error</font>.
- If they do not convict a guilty person, they have made a <font color="tomato">type II error</font>.



## <a name="25q1"> Question 1</a>

---

A hospital is testing to see whether a donated organ is a match for a
recipient in need of an organ transplant.

-   $H_0$: The organ is not a match (boring).
-   $H_a$: The organ is a match (interesting).

Describe the type I and type II errors in this context. What are the
practical consequences of making these errors?

### <a name="25sol1">Solution to  Question 1</a>

---

<br>  
<br>  
<br>

## <a name="25q2"> Question 2</a>

---

A lab runs viral tests to see whether a person is currently infected
with COVID-19.

-   $H_0$: The person is not currently infected with COVID-19 (boring).
-   $H_a$: The person is currently infected with COVID-19 (interesting).

Describe the type I and type II errors in this context. What are the
practical consequences of making these errors?

### <a name="25sol2">Solution to  Question 2</a>

---

<br>  
<br>  
<br>

## <a name="25q3"> Question 3</a>

---

The cholesterol level of healthy men is normally distributed with a mean
of 180 mg/dL and a standard deviation of 20 mg/dL, whereas men
predisposed to heart disease have a mean cholesterol level of 300 mg/dL
with a standard deviation of 30 mg/dL. The cholesterol level 225 mg/dL
is used to demarcate healthy from predisposed men.

### <a name="25q3a"> Question 3a</a>

---

Given that a man is healthy, what is the probability they are diagnosed
as predisposed?

#### <a name="25sol3a">Solution to  Question 3a</a>

---

In [None]:
# code cell to help with calculations


<br>  
<br>

### <a name="25q3b"> Question 3b</a>

---

Given that a man is not healthy, what is the probability they are not
diagnosed as predisposed?

#### <a name="25sol3b">Solution to  Question 3b</a>

---

In [None]:
# code cell to help with calculations


<br>  
<br>

### <a name="25q3c"> Question 3c</a>

---

Which of the previous answers gives the probability of a type I error and
which is for a type II error? Explain.

#### <a name="25sol3c">Solution to  Question 3c</a>

---

<br>  
<br>  
<br>

## <a name="25q4"> Question 4</a>

---

Suppose we want to test whether a ten-sided die is fair (with sides
numbered 0 to 9). Let $p$ be the proportion of all rolls that land on an
even number.

### <a name="25q4a"> Question 4a</a>

---

Set up the hypotheses to test our claim.

#### <a name="25sol4a">Solution to  Question 4a</a>

---

-   $H_0$:

-   $H_a$:

<br>  
<br>

### <a name="25q4b"> Question 4b</a>

---

Roll the die 20 times, and record how many times it lands on an even
number (0, 2, 4, 6, or 8).  *If you do not have a ten-sided die, use the code cell below to simulate rolling a fair, ten-sided die $n=20$ times.*

#### <a name="25sol4b">Solution to  Question 4b</a>

---



In [None]:
# run code cell if you do not have a 10-sided die
sample(0:9, 20, replace = TRUE)



<br>  
<br>

### <a name="25q4c"> Question 4c</a>

---

Calculate the p-value of your sample.

#### <a name="25sol4c">Solution to  Question 4c</a>

---

In [None]:
# code cell to help with calculations


<br>  
<br>

### <a name="25q4d"> Question 4d</a>

---

What (if anything) can you conclude about the hypothesis at 10%
significance level?

#### <a name="25sol4d">Solution to  Question 4d</a>

---

<br>  
<br>  
<br>

# <a name="25sig">The Significance Level Revisited</a>

---

The <font color="dodgerblue">**significance level**</font> of a
hypothesis test is the **largest value of $\mathbf{\alpha}$ we
find acceptable for the probability for a type I error**.

## <a name="25q5"> Question 5</a>

---

<figure>
<img
src="https://www.seobility.net/en/wiki/images/5/54/Social-Sharing.png"
width = "45.0%" alt="Social Sharing"
alt="Credit: Seobility CC BY-SA 4.0" />
<figcaption aria-hidden="true">
Credit: Seobility <a href="https://creativecommons.org/licenses/by-sa/4.0/deed.en">CC BY-SA 4.0</a>
</figcaption>
</figure>

A company claims that only 3% of people who use their facial lotion
develop an allergic reaction (a rash). You are suspicious of their claim
based on hearing some of your friends had an allergic reaction, and you
believe it is more than 3%. You pick a random sample of 50 people and
have them try the lotion. **If more than 3 out of the 50 people develop
the rash, you will blow up social media with posts about the dishonesty
of the company's claim.**

### <a name="25q5a"> Question 5a</a>

---

Set up hypotheses for this test.

#### <a name="25sol5a">Solution to  Question 5a</a>

---

-   $H_0$:

-   $H_a$:

<br>  
<br>

### <a name="25q5b"> Question 5b</a>

---

Explain what type I and type II errors are in this case. Make sure you
explain in the context of this example.

#### <a name="25sol5b">Solution to  Question 5b</a>

---

<br>  
<br>  
<br>

### <a name="25q5c"> Question 5c</a>

---

What is the probability of making a type I error?

#### <a name="25sol5c">Solution to  Question 5c</a>

---

In [None]:
# code cell to help with calculations


<br>  
<br>  
<br>

### <a name="25q5d"> Question 5d</a>

---

If you were to perform the hypothesis test at a 5% significance level,
and you observe $X=4$, what would be the result of the test?

#### <a name="25sol5d">Solution to  Question 5d</a>

---

<br>  
<br>  
<br>

### <a name="25q6e"> Question 5e</a>

---

For what values of $X$ would you reject $H_0$ at a 5% significance
level?

#### <a name="25sol6e">Solution to  Question 5e</a>

---

<br>  
<br>  
<br>

# <a name="25reject">Rejection Regions</a>

---

When performing a hypothesis test at a significance level of $\alpha$,
the <font color="dodgerblue">**rejection or critical region**</font>, denoted <font color="dodgerblue">$\mathscr{R}$</font>, is the set of all values of the test statistic for which we reject $H_0$. The endpoint(s) of the region are
called <font color="dodgerblue">**critical values**</font>.

## <a name="25q6"> Question 6</a>

---

In [Question 4](#25q4) we tested whether or not a ten-sided die is fair by rolling it 20 times and counting the number of rolls that land on an even number. If $p$ is the proportion of all rolls that land on an even number, then
we have

$$H_0: p = 0.5 \qquad \mbox{vs.} \qquad H_a: p \ne 0.5.$$

### <a name="25q6a"> Question 6a</a>

---

If you found only $X=7$ rolls landed on an even number, what is the
p-value?

#### <a name="25sol6a">Solution to  Question 6a</a>

---

In [None]:
# code cell to help with calculations


<br>  
<br>

### <a name="25q6b"> Question 6b</a>

---

Find the critical values and rejection region if we use a significance
level of 10%.

#### <a name="25sol6b">Solution to  Question 6b</a>

---

In [None]:
# code cell to help with calculations


<br>  
<br>

# <a name="25power">The Power of a Test</a>

---

## <a name="25q7"> Question 7</a>

---

Suppose you are interested in the lengths of a certain species of snake
in an ecosystem. Assume the lengths (in cm.) are normally distributed
with unknown mean $\mu$, but the standard deviation of the population is
known to be $\sigma = 4$ cm. It has been claimed that the mean length of
this species is 25 cm. You believe the actual mean length is greater
than 25 cm. You collect a random sample of 30 snakes. You will test
using a significance level of $\alpha = 0.05$.

### <a name="25q7a"> Question 7a</a>

---

Set up hypotheses for the test.

#### <a name="25sol7a">Solution to  Question 7a</a>

---

-   $H_0$:

-   $H_a$:

<br>  
<br>

### <a name="25q7b"> Question 7b</a>

---

Find the critical value, and give the rejection region.

#### <a name="25sol7b">Solution to  Question 7b</a>

---

In [None]:
# code cell to help with calculations


<br>  
<br>

### <a name="25q7c"> Question 7c</a>

---

If in fact $\mu = 27$ cm, what is the probability of making a type II
error?

#### <a name="25sol7c">Solution to  Question 7c</a>

---

In [None]:
# code cell to help with calculations


<br>  
<br>

### <a name="25q7d"> Question 7d</a>

---

What is the probability of correctly rejecting $H_0$ when $H_a$ is true?

#### <a name="25sol7d">Solution to  Question 7d</a>

---

<br>  
<br>  
<br>

# <a name="25defpower">Definition of the power of a test</a>

---

The <font color="dodgerblue">**power**</font> of a test is the
**probability of correctly rejecting $H_0$**.

$${\color{dodgerblue}{\mbox{power} = P(\mbox{Reject } H_0 \  | \  H_a \mbox{ is true}) = 1 - {\color{tomato}{\beta}}}},$$

where <font color="tomato">$\beta$</font> denotes the <font color="tomato">probability of a type II error</font>.

In biostatistics, the calculation of power is used to plan a study, usually before any data have been obtained, except possibly from a small preliminary study called a pilot study.

Why should power concern us? The power of a test tells us how likely it is that a statistically significant difference will be detected based on a finite sample size n, if the alternative hypothesis is true (i.e., if $\mu = \mu_{1}$) that is, if the true mean $\mu$ differs from the mean under the null hypothesis ($\mu_{0}$). If the power is too low, then there is little chance of finding a significant difference and nonsignificant results are likely even if real differences exist between the true mean $\mu$ of the group being studied and the null mean $\mu_{0}$. An inadequate sample size is usually the cause of low power to detect a scientifically meaningful difference.


# <a name="25prac">Sample Size and Power</a>

---

## <a name="25q8"> Question 8</a>

---

Suppose we want to test the hypothesis that mothers with low socio- economic status (SES) deliver babies whose birthweights are lower than “normal.” To test this hypothesis, a list is obtained of birthweights from 10 consecutive, full-term, live-born deliveries from the maternity ward of a hospital in a low-SES area. The mean birthweight ($\bar{x}$) is found to be 115 oz with a sample standard deviation (s) of 24 oz. Suppose we know from nationwide surveys based on millions of deliveries that the mean birthweight in the United States is 120 oz. Can we actually say the underlying mean birthweight from this hospital is lower than the national average?

We are testing the following hypothesis:

$$H_0: \mu=120 \qquad \mbox{versus} \qquad H_a: \mu < 120$$



### <a name="25q8a"> Question 8a</a>

---

Find the critical value of the test at the 5% significance level and the rejection region.  Do you accept or reject the null hypothesis?

#### <a name="25sol8a">Solution to  Question 8a</a>

---

<br>
<br>

In [None]:
#code cell for calculations

### <a name="25q8b"> Question 8b</a>

---

Assume that the alternative hypothesis is true: $\mu = 115$ and assume the standard deviation is the sample standard deviation, $\sigma = 24$.  Find the probability of a Type II error and the power of the test.

#### <a name="25sol8b">Solution to  Question 8b</a>

---

<br>
<br>

In [None]:
#code cell for calculations

### <a name="25q8c"> Question 8c</a>

---

Increasing the sample size of the study will increase the power of the test.  Assume you now have 100 birthweight measurements, what would the power of your test be?  Note that changing the sample size also changes the critical value so you will need to re-calculate that first.

#### <a name="25sol8c">Solution to  Question 8c</a>

---
<br>
<br>


In [None]:
#code cell for calculations

# <a name="25append">Appendix: Summary of Hypothesis Testing</a>

---

1.  State the <font color="dodgerblue">**hypotheses**</font> and identify (from the alternative claim in $H_a$) if it is a one or two-tailed test.

    -   $H_0$ is the “boring” claim. Express using an equal sign $=$.
    -   $H_a$ is the claim we want to show is likely true. Use inequality sign ($>$, $<$, or $\ne$).
    -   State both $H_0$ and $H_a$ in terms of population parameters such as $\mu_1-\mu_2$ and $p_1-p_2$.

2.  Compute the <font color="dodgerblue">**test statistic**</font>.

    -   If the observed sample contradicts the null claim, the result is significant.
    -   A standardized test statistic measures how many SE's the observed stat is from the null claim.
    -   A standardized test statistic with a large absolute value is supporting evidence to reject $H_0$.

3.  Using the null distribution, compute the <font color="dodgerblue">**p-value**</font>. The p-value is the probability of getting a sample with a test statistic as or more extreme than the observed sample assuming $H_0$ is true.

    -   The p-value is the area in one or both tails beyond the test statistic.
    -   The p-value is a probability, so we have $0 < \mbox{p-value} < 1$.
    -   The smaller the p-value, the stronger the evidence to reject $H_0$.

4.  Based on the <font color="dodgerblue">**significance level**</font>, $\alpha$, make a decision to reject or not reject the null hypothesis

    -   If p-value $\leq \alpha$, we reject $H_0$.
    -   If p-value $> \alpha$, we do not reject $H_0$.

5.  <font color="dodgerblue">**Summarize the results**</font> in practical terms, **in the context of the example**.

    -   If we reject $H_0$, this means there is enough evidence to support the claim in $H_a$.
    -   If we do not reject $H_0$, this means there is not evidence to reject $H_0$ nor support $H_a$. The test is inconclusive.

# <a name="25CC License">Creative Commons License Information</a>
---

![Creative Commons
License](https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png)

*Statistical Methods: Exploring the Uncertain* by [Adam
Spiegler (University of Colorado Denver)](https://github.com/CU-Denver-MathStats-OER/Statistical-Theory)
is licensed under a [Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International
License](http://creativecommons.org/licenses/by-nc-sa/4.0/). This work is funded by an [Institutional OER Grant from the Colorado Department of Higher Education (CDHE)](https://cdhe.colorado.gov/educators/administration/institutional-groups/open-educational-resources-in-colorado).

For similar interactive OER materials in other courses funded by this project in the Department of Mathematical and Statistical Sciences at the University of Colorado Denver, visit <https://github.com/CU-Denver-MathStats-OER>.