-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.Rmd
235 lines (140 loc) · 7.14 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
---
title : Reproducible research
subtitle : Few concepts and a couple of tools, v0.1
author : Joona Lehtomäki
job :
framework : io2012 # {io2012, html5slides, shower, dzslides, ...}
highlighter : highlight.js # {highlight.js, prettify, highlight}
hitheme : tomorrow #
widgets : [] # {mathjax, quiz, bootstrap}
mode : selfcontained # {standalone, draft}
--- #license
## CC-BY
<img src="assets/img/ccby_badge.png" class="centered-img" title="Dan McGlin, CC-BY 3.0"></img>
--- #attribution
## Attribution
* [Git tutorial](https://github.com/johnmcdonnell/Git-Tutorial) by John McDonnel
* [Introduction to Version Control with a focus on Git](https://github.com/ethanwhite/USU-SWC-Bootcamp/blob/master/version_control_slides.pdf) by Dan McGlinn
* [Reproducible research in computational science](https://github.com/karthikram/smb_git) by Karthik Ram
* [knitr and RStudio](https://github.com/yihui/yihui.github.com/blob/master/slides/2012-knitr-RStudio.html) by Yihui Xie
### Sources for this presentationa available at
* [Reproducible research: Concepts and tools](https://github.com/jlehtoma/reproScience)
--- #content &vcenter
## Content
1. Background
2. Concepts
3. Tools: RStudio and Git
--- #motivation1 bg:url(assets/img/motivation_composite.png) bg-repeat:no-repeat bg-position:center
## My own PhD
--- #open-science &twocol w1:50% w2:50%
## Open science
"Opening up access to the data and software, not just the final publication, is one of goals of the open science movement" <br />(Ram, 2013)
*** left
- Open access
- Open data
- Open notebook science
- Open source software
*** right
<img src="http://upload.wikimedia.org/wikipedia/commons/b/bb/Open_Science_Logo.jpg" class="centered-img" style="max-height:50%;" title="By G.emmerich (Own work) CC-BY-SA-3.0, via Wikimedia Commons"></img>
--- ¬itle bg:black
<img src="http://bit.ly/10FmaGf" class="centered-img"></img>
--- #concepts
## Reproducibility
> * With replication, independent investigators address a scientific hypothesis and build up evidence for or against it (Peng, 2011)
> * Allows others to build upon existing work and use it to test new ideas and develop methods (Ram, 2013)
> * While currently there is unilateral emphasis on "first" discoveries, there should be as much emphasis on replication of discoveries ([Ioannidis, 2005](http://www.telegraph.co.uk/technology/3342867/Is-the-spirit-of-Piltdown-man-alive-and-well.html))
--- #problems1
<img src="assets/img/wolkowich_et_al_data_sharing.png" class="centered-img" title="Wolkowich et al. 2012"></img>
Wolkovich et al. 2012
--- #problems2
<img src="assets/img/peng_comp_repro.png" class="centered-img" title="Peng 2011"></img>
Peng 2011
--- #problems3 bg:url(assets/img/reproducibility_initiative.png) bg-repeat:no-repeat bg-position:center
--- #problems4
<img src="assets/img/sirkia_et_al_workflow.png" class="centered-img" style="margin-left: 0; float: left;" title="Sirkiä et al. 2012"></img>
Sirkiä et al. 2012
--- #problems5
<iframe src="http://bit.ly/10FnSHI"></iframe>
--- #rstudio1
## Tools for reproducible research
<br />
<img src="http://bit.ly/Z2DkiR" class="centered-img" style="max-height:50%;" title="RStudio, GPLv3"></img>
--- #rstudio2 bg:url(assets/img/rstudio_analysis.png) bg-repeat:no-repeat bg-position:center
--- #rstudio3
## Reproducible?
> * Automatically regenerate documents when code, data, or assumptions change
> * Eliminate transposition errors that occur when copying results into documents
> * Preserve contextual narrative about why analysis was performed in a certain fashion
> * Documentation for the analytic and computational processes from which conclusions are drawn
[Xie, 2012](https://github.com/yihui/yihui.github.com)
--- #rstudio4 ¬itle
<br />
<br />
<img src="assets/img/tech_logos.png" class="centered-img" style="max-height:100%;" title="RStudio, GPLv3"></img>
--- #rstudio5 bg:url(assets/img/rstudio_knitr_input.png) bg-repeat:no-repeat bg-position:center
--- #rstudio6 bg:url(assets/img/rstudio_knitr_output.png) bg-repeat:no-repeat bg-position:center
--- #rstudio7 bg:url(assets/img/knitr_workflow.png) bg-repeat:no-repeat bg-position:center
<span style="position: fixed; bottom: 75px;">
Adapted from [Markus Kainu](http://markuskainu.fi/r-tutorial/repro/suomeksi.html)
</span>
--- #rstudio8
## More R-code
```{r}
# Quick summary
library(ggplot2)
summary(cars)
```
--- #rstudio8 ¬itle
```{r, echo=TRUE,message=FALSE}
# Quick plot of the data
qplot(speed, dist, data=cars) + geom_smooth()
```
--- #rstudio9 &vcenter
[**source**](http://bit.ly/Y8y8uO)
--- #git1
## Tools for reproducible research
<br />
<img src="http://bit.ly/Z2DNBq" class="centered-img" style="max-height:50%;" title="Jason Long, CC-BY 3.0, via Wikimedia Commons"></img>
--- #git2 bg:url(assets/img/github_linux.png) bg-repeat:no-repeat bg-position:center
--- #git3 bg:url(assets/img/github_landing_page.png) bg-repeat:no-repeat bg-position:center
--- #git4
## What is git? #
<br />
![threecommits](assets/img/threecommits.png)
* It stores snapshots of your projects
* ...It also stores the relationships between those snapshots
[McDonnel, 2012](https://github.com/johnmcdonnell/Git-Tutorial)
--- #git5 &image
<img src="http://www.phdcomics.com/comics/archive/phd101212s.gif" class="centered-img"></img>
--- #git6
## Why use git in science?
> 1. Lab notebook
> 2. Facilitating collaboration
> 3. Backup and failsafe against data loss
> 4. Freedom to explore new ideas and methods
> 5. Mechanism to solicit feedback and reviews
> 6. Increase transparency and verifiability
> 7. Managing large data
> 8. Lowering barriers to reuse
--- #git7 &image
<br />
<br />
<img src="assets/img/smb_karihikram_git_workflow.png" class="centered-img"></img>
<br />
Ram, 2013
--- #git8
## Git: steep learning curve?
<img src="http://d22zlbw5ff7yk5.cloudfront.net/images/cm-27811-1508860c9c2366.gif" class="centered-img" style="margin-top: 150px;"></img>
--- #git9
<img src="assets/img/git_cli.png" class="centered-img" title="Nathan de Vries, CC-BY-SA 3.0"></img>
--- #git10
<img src="assets/img/rstudio_git_panel.png" class="centered-img"></img>
--- #git11 bg:url(assets/img/rstudio_git_history.png) bg-repeat:no-repeat bg-position:center
--- #thanks &vcenter
**Thank you!**
--- #reference .reference
### References:
1. Peng RD (2011): Reproducible research in computational science. Science. 334(6060):1226–7. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3383002&tool=pmcentrez&rendertype=abstract
2. Ram K. (2013): Git can facilitate greater reproducibility and increased transparency in science. Source Code for Biology and Medicine. 8(1):7. Available from: http://www.scfbm.org/content/8/1/7
3. Sirkiä S, Lehtomäki J, Lindén H, Tomppo E, Moilanen A (2012): Defining spatial priorities for capercaillie Tetrao urogallus lekking landscape conservation in south-central Finland. Wildlife Biology. 2012;18(4):337–53.
4. Wolkovich EM, Regetz J, O’Connor MI (2012): Advances in global change research require open science by individual researchers. Global Change Biology. 18(7):2102–10. Available from: http://doi.wiley.com/10.1111/j.1365-2486.2012.02693.x