/
p_hackathon.Rmd
60 lines (42 loc) · 1.85 KB
/
p_hackathon.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
title: "P-hackathon"
output: html_document
---
[P-hacking](https://www.ncbi.nlm.nih.gov/pubmed/22006061) and [the garden of forking paths](http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf) are terms that were invented to describe ways in which data analysts can obtain a statistically significant result by changing the way the data are processed. Some examples of potential processing steps that might not be questioned are:
* Normalizing data or not
* Removing outliers or not
* Creating factor variables or not
* Adjusting for different sets of covariates
* Restricting the range of values being considered
* Data transformations
* Model choices
To illustrate these concepts we are going to do an exercise that I'm calling a "p-hackathon". We are going to do the p-hackathon with this data set:
```{r}
library(NHANES)
data(NHANES)
```
Before you start analyzing install the matahari R package
```{r}
devtools::install_github("jhudsl/matahari")
```
Then run this command to start recording everything you do in R.
```{r}
dance_start()
```
Our study is concerned with the association between income (variable name `HHIncomeMid` and gender `Gender`). A simple linear association is:
```{r}
tidy(lm(HHIncomeMid ~ Gender ,data=NHANES))
```
The rules of the p-hackathon are:
(1) The goal is to get the tiniest p-value for association possible between the two variables.
(2) You can only use any transformation/data change you can justify with plausible statistical reasons. (write them down as you go).
(3) You must keep track of _everything_ you did
When you have finished hacking and have your final p-value save your model as:
```{r}
lm_final = "Your final model goes here"
tidy(lm_final)
dance_stop()
matahari_history = dance_tbl()
save(matahari_history,file="lastname_phacking.rda")
```
And post your completed rda to the class slack channel.