# The Multiple Comparisons Problem

## Introduction

In this lesson, you'll learn about the problems that can arise from doing multiple comparisons in a single experiment.

## Objectives

You will be able to:

* Understand and explain the concept of spurious correlation
* Understand and explain why multiple comparisons increases the likelihood of misleading results
* Understand and use corrections such as the Bonferroni Correction to deal with multiple comparisons


## What is the Multiple Comparisons Problem?

Obtaining an incredibly low p-value does not guarantee that the null-hypothesis is incorrect. For example, a p-value of 0.001 states that there is still a 1 in 1000 chance that the null hypothesis is true. Yet, as you've seen, p-values alone can be misleading. For example, if you perform repeated experiments, at some point you're apt to stumble upon a small p-value, whether or not the null hypothesis is valid.

To restate this, imagine we take 100 scientific studies with a p-values of .03. Are all of these conclusions valid? Sadly, probably not. Remember, for any experiment with a p-value of .03, there is still a 3% chance that the null-hypothesis is actually true. So collectively, the probability that **all** of these null hypotheses are false is actually quite small. You can be fairly confident in each study, but there is also apt to be a false-conclusion drawn somewhere. (In fact the p-value itself implies that, on average, 3 of these 100 conclusions will be false.)

In [16]:
lst=[]
for i in range(23):
    lst.append(str(i))

In [17]:
from itertools import combinations

In [11]:
a = ['ABCDEFGHIJKLMNOPQRSTUVW']

In [18]:
b = itertools.combinations(lst, 2)

In [19]:
list(b)

[('0', '1'),
 ('0', '2'),
 ('0', '3'),
 ('0', '4'),
 ('0', '5'),
 ('0', '6'),
 ('0', '7'),
 ('0', '8'),
 ('0', '9'),
 ('0', '10'),
 ('0', '11'),
 ('0', '12'),
 ('0', '13'),
 ('0', '14'),
 ('0', '15'),
 ('0', '16'),
 ('0', '17'),
 ('0', '18'),
 ('0', '19'),
 ('0', '20'),
 ('0', '21'),
 ('0', '22'),
 ('1', '2'),
 ('1', '3'),
 ('1', '4'),
 ('1', '5'),
 ('1', '6'),
 ('1', '7'),
 ('1', '8'),
 ('1', '9'),
 ('1', '10'),
 ('1', '11'),
 ('1', '12'),
 ('1', '13'),
 ('1', '14'),
 ('1', '15'),
 ('1', '16'),
 ('1', '17'),
 ('1', '18'),
 ('1', '19'),
 ('1', '20'),
 ('1', '21'),
 ('1', '22'),
 ('2', '3'),
 ('2', '4'),
 ('2', '5'),
 ('2', '6'),
 ('2', '7'),
 ('2', '8'),
 ('2', '9'),
 ('2', '10'),
 ('2', '11'),
 ('2', '12'),
 ('2', '13'),
 ('2', '14'),
 ('2', '15'),
 ('2', '16'),
 ('2', '17'),
 ('2', '18'),
 ('2', '19'),
 ('2', '20'),
 ('2', '21'),
 ('2', '22'),
 ('3', '4'),
 ('3', '5'),
 ('3', '6'),
 ('3', '7'),
 ('3', '8'),
 ('3', '9'),
 ('3', '10'),
 ('3', '11'),
 ('3', '12'),
 ('3', '13'),
 ('3', '

In [20]:
len(list(b))

0

In [11]:
.97**100 #Probability all 100 experiments with p=0.03 are all true 

0.04755250792540563

Similarly, if you are testing multiple metrics simultaneously in an experiment, the chances that one of these will satisfy your alpha threshold increases. A fun similar phenomenon is spurious correlation. If we start comparing a multitude of quantities, we are bound to find some that are highly correlated, whether or not an actual relationship exists. Tyler Vigen set out to find such relationships; here are several entertaining one's (of many):  

<img src="images/nicolas_cage_vs_drowning.svg">
<img src="images/chicken_vs_oil.svg">
<img src="images/math_phds_vs_uranium.svg">
<img src="images/spelling_vs_spiders.svg">



As we can see, although these graphs show that each of pair of quantities are strongly correlated, it seems unreasonable to expect that any of them have any causal relationships. Regardless of what the statistics tells us, there is no relationship through which the length of spelling bee word affects then number of people killed by venomous spiders.


## How do Multiple Comparisons Increase the Chances of Finding Spurious Correlations?

Spurious correlation is a **_Type 1 Error_**, meaning that it's a type of **_False Positive_**. We think we've found something important, when really there isn't.  With each comparison we make in an experiment, we try to set a really low p-value to limit our exposure to type 1 errors.  When we only reject the null hypothesis when p < 0.05, for example, we are effectively saying "I'm only going to accept these results as true if there is less than a 5% chance that I didn't actually find anything important, and my data only looks like this due to randomness".  However, when we make **_Multiple Comparisons_** by checking for many things at once, each of small risks of a Type 1 Error become cumulative! 

Here's another easy to way to phrase this--a p-value threshold of < .05 means that we will only make a Type 1 error 1 in every 20 times. This means that statistically, if I have 20 findings where my p-value is less than < .05 at the same time, 1 of them is almost guaranteed to be a Type 1 error (False Positive)--but I have no idea of which one!

## The Bonferroni Correction

Back to the problem of multiple comparisons. Due to the cumulative risk of drawing false conclusions when statistically testing multiple quantities simultaneously, statisticians have devised methods to minimize the chance of type I errors. One of these is the **_Bonferroni Correction_**.  With the Bonferroni correction, you divide $\alpha$ by the number of comparisons you are making to set a new, adjusted threshold rejecting the null hypothesis.

For example, if you desire $\alpha = 0.05$, but are making 10 comparisons simultaneously, the Bonferroni Correction would advise you set an our adjusted p-value threshold to $\frac{0.05}{10} = 0.005$!  The stricter p-value threshold helps control for Type 1 errors.  This doesn't mean that you are immune to them--it just helps reduce the cumulative chance that one occurs. That said, the effective power of these tests is therefore reduced (and in turn type II errors are more likely).


## Additional Resources

* [Tyle Vigen - Spurious Correlations](http://tylervigen.com/spurious-correlations)
* [Nick Cage Movies Vs. Drownings, and More Strange (but Spurious) Correlations](https://www.nationalgeographic.com/science/phenomena/2015/09/11/nick-cage-movies-vs-drownings-and-more-strange-but-spurious-correlations/)

## Summary

In this lesson, you learned about the problems that can arise from doing multiple comparisons in a single experiment, as well as some entertaining spurious correlations that exist with real world data.