# Structuring Experiments

### Introduction

In the last few lessons we saw that even though an explanatory variable may be *correlated* with an outcome, we cannot assume that it is the *cause* of an outcome, as there may be alternative reasons why the correlation works.  In this lesson, we'll see how experiments can allow us to overcome to issues with correlation, and infer causation.

### What's an Experiment?

The idea of an experiment is that we take two groups that are similar, and the administer a *treatment* to one of them.  If the group that gets the treatment has a different outcome than the original group, we say that the treatment caused an effect.  

That's the gist.  But as we'll see things can go wrong with experiments along the way.  In fact, in the 2010s, many studies were thrown in doubt or discredited, in part, because of poorly designed experiments.  Let's take the time to go through the fundamental components of executing an experiment.

> **Warning**: Experiments **are not** a perfect way to infer cause and effect, and we'll discuss some of the things that can go wrong in turn.  But for now, let's just make sure we see some of the benefits of experiments, and then we can see some of the things that can still go wrong with experiments.

### 1. Define the hypothesis

> A hypothesis is a testable explanation for an outcome.

Let's say our hypothesis is the following: 
   > Aspirin reduces the chance of a heart attack for those who have been diagnosed with heart disease.

As we'll see in the future, it's important to have a level of specificity in our hypothesis.  Doing so will help us to define our problem, as well as prevent against p-hacking (which we'll learn more about later.)

We can use the PICOT acronym to help us define the following components of a hypothesis.

* P (population of interest), 
* I (Intervention to be studied), 
* C (comparator intervention), 
* O (outcomes to be evaluated) and 
* T (is there a time duration for intervention/outcome ascertainment time).

Let's go through these in turn:

* The **population** is a set of similar items or events which is of interest for some question or experiment.
> Here it's individuals diagnosed with heart disease.

* The **intervention** is the change we are testing.  
> Here it is taking 75 to 100 milligrams of aspirin daily.

* The **comparator** is the approach you're comparing the intervention to
    * It can be the placebo, doing nothing, or alternative treatment already in use.
    > Here, the comparator is doing nothing.
* **Outcome**
    * Here the outcome is the number of heart attacks a person has during the study. 

### 2. Select/enroll a **sample** of participants

Now it's rare that we can directly gather data on each member in our population, so instead we select or enroll a subset of the population in the study, called a sample.

> A sample is a set of items or events selected from our population and that we gather data in, in the study.    

We work with our sample as a way to discover insights about the broader population.  

### 3. Distinguish between Treatment and Control

Once we have a sample of our population we:


* Randomly separate the participants into a treatment group and a control group
*  Deliver a treatment to one group and nothing (or a placebo) to the other

### 4. See the results

* **If** after the treatment, **we see a difference** in outcomes in the groups:
* **Then** we believe we can **attribute it to the treatment** itself
* **Because** otherwise the two groups would have looked the same.

### Summary

In this lesson, we learned some of the fundamentals behind how experiments work, and importantly how experiments protect against the alternative explanations to correlation.

With experiments, we start with the participants of an experiment, and then randomly separate them into treatment and control groups.  The idea is that the two groups are similar except that one group is administered the treatment and the other isn't. If after administering the treatment, we see a change in only one of the groups, because nothing else should have changed, we assume the change was due to the treatment.

### Resources
* [Guide to RCTs](https://obgyn.onlinelibrary.wiley.com/doi/pdf/10.1111/aogs.13309)

* [Hypothesis Writing](http://www.pocog.org.au/doc/hypothesiswriting.pdf)

* [HDL Experiments](https://www.nih.gov/news-events/nih-research-matters/when-hdl-cholesterol-doesnt-protect-against-heart-disease)

* [HDL and Heart Attacks](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2903818/)

* [College for Masses](https://www.nytimes.com/2015/04/26/upshot/college-for-the-masses.html)

* [Steve Levitt Interviewed by Famous Journalist](https://dailynorthwestern.com/2004/03/01/archive-manual/names-with-racial-connotations-not-a-disadvantage-speaker-says/#)