Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
151 lines (124 sloc) 7.01 KB
---
title: "A manifesto for doing open science with R"
subtitle: "Opinionated and practical for doing reproducible and open R projects"
author: "Luke Johnston"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
> Note: The exact practice of and tools to do open science are in constant
development. As such this is living document and is often updated and evolving,
so that over time it becomes more refined. Because of this, some parts of this
document may be incomplete.
## Underlying philosophy
The main philosophy of this toolkit is to encourage reproducible and open
scientific practices by automating difficult tasks and providing a (strongly)
opinionated view on which tools and processes to use when doing open science.
The goal here is to reduce the burden on researchers to actually do open
and reproducible science, in particular for creating abstracts, slides, posters,
and manuscripts. Currently, the target users are biomedical, medical, and health
researchers, though this toolkit could easily be used by other disciplines.
## Tools and services to use
The number of and possible tools and services to use to do open science is broad and
diverse. This gives a lot of advantages, especially as this is the time for open
science to grow and evolve. However, this is also a huge barrier for those
researchers and scientists who are just starting out. There are too many choices
and very little guidance on what to use. Given the possible choices
available, this toolkit takes a strong stance on what tools and services to use.
So, described below is a brief overview of the tools used in the larger steps
involved in creating scientific output, which is then followed by more detailed
explanations.
- *File management* will be done using [Git] and either stored on
[GitHub] or on [GitLab]. This keeps track of changes to the files
and code, focuses the project into a single folder structure and emphasizes a final
"product" from the coding and writing. It also makes it easier to disseminate
and publish later on.
- *Bibliography management* ... This is still in development. <!--{{need to
expand on this... Likely bibtex, but I don't care as long as it is open source
or available in plain text}}.-->
- *Coding and analyses* will all be done in [R]. This obvious for now,
since this is an R package! However, I would like this toolkit to eventually be
more language agnostic... but baby steps.
- *Writing* will be done in [R Markdown] files. This provides the ability
to insert R code chunk so that figures, tables, and results can be regenerated
easily and reproducibly. This also includes not just manuscripts, but slides and
posters (more on this below).
- *Dissemination* of the whole project will be done using [Zenodo] to create
a DOI once the manuscript has been submitted, or the poster or slides have been
presented. Manuscripts will be submitted to [bioRxiv] while posters and slides
will be submitted to [figshare].
- *All activities* will be done in [RStudio].
- *Reproducibility* (more advanced users and currently optional) ... This is
still in development. <!--{{use travis, set up a docker/rocker}}-->
### Managing files and tracking changes
All projects, files, coding, writing, and other activities will be done in
[RStudio]. RStudio is an exceptional environment to work with [R] (and other
languages) and has many features that make it simpler to follow open scientific
workflows.
[Version control](https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control)
is vital to an open scientific workflow. Since [Git]
is well established in the open source, software development, and R package
development world, it is the natural choice for version control.
RStudio also has an excellent and fairly easy to use
[interface](https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN)
with Git.
### The coding and analysis phase
Analyses should follow the style similar to creating R packages (read this
excellent [book on R package](http://r-pkgs.had.co.nz/) creation for more details).
- R code should as much as possible be written as
[functions](https://swcarpentry.github.io/r-novice-inflammation/02-func-R/) in
the `R/` folder.
- Data, results output, and other output should be saved in `data/`.
- Use devtools and usethis workflow and commands to load and use the data and R
code (e.g. `load_all()`).
### The writing phase:
All writing should be done using [R Markdown]. Big picture output such as figures
and tables should be created as single purpose functions (e.g. `table_one()` or
`figure_two()`) and inserted into a single code chunk (especially important for
figures). Inserting citations is [fairly simple](https://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html),
with the bibliography file preferably saved in the same folder as the
manuscript or other scientific output (in the `doc/` folder). Exploratory
analyses should be done in a R Markdown file in the `doc/` folder as well.
General collaboration should be done through GitHub [Pull Requests](https://help.github.com/articles/about-pull-requests/)
or through GitLab [Merge Requests](https://docs.gitlab.com/ee/user/project/merge_requests/).
If technical skill of Git is low, designate a team member with a bit more skill
in Git (or willing/able to put time into learning) as the "Git liaison".
Slides should preferably be created using the [options available](https://rmarkdown.rstudio.com/lesson-11.html)
in R Markdown. The best way to create posters is still in development.
### Dissemination
Once the manuscript, slides, or poster have been finalized, several actions should
be taken:
1. Increase the version number of the project (more will be added to this).
2. The GitHub/GitLab project should be made public (if sensitive data is in
the project, that should be excluded) and should be uploaded to [Zenodo] to
obtain an DOI.
3. The manuscript should be submitted to [bioRxiv] and the slides or posters
should be uploaded to [figshare] (can add an embargo time if presenting at a
conference).
4. Give your presentation or submit your manuscript to a journal!
For number 1 above, increasing the version number of the project can be used to
track bigger picture milestones in the project. For small abstracts this isn't
really necessary, since they are likely part of a bigger scientific output.
[R]: https://cran.r-project.org/
[Git]: https://git-scm.com/
[RStudio]: https://www.rstudio.com/
[Zenodo]: https://zenodo.org/
[figshare]: https://figshare.com/
[GitHub]: https://github.com/
[GitLab]: https://gitlab.com/
[bioRxiv]: https://www.biorxiv.org/
[R Markdown]: https://rmarkdown.rstudio.com/
<!--
Notes:
- Strong opinion on how projects should be structured or organized and how to be open/where to send code
- Use R package framework
- Use travis or gitlab-ci
- Some type of CI should be able to generate your manuscript without errors
- Bibliography files... as long as it interacts with R Markdown{{link to biblio}}
- CSL{{link}} for citation styling.
-->
You can’t perform that action at this time.