Rmarkdown documents are great to keep reproducible scientific workflows, tightly integrating code, results and text. I keep a collection of Rmarkdown templates (including some for writing scientific articles, or manuscript reviews) here.
Once we are dealing with more complicated data analysis, and writing custom code and functions for a research project, structuring our project as an R package can bring many advantages (e.g. see here and here).
Hence this package works as a template for new research projects, with the idea of having everything (data, R scripts, functions and manuscripts reporting results) self-contained in the same package (a "research compendium") to facilitate collaboration and promote reproducibility.
A short presentation introducing this approach on 'Structuring data analysis projects as R packages' is available here: https://github.com/Pakillo/template/blob/master/slides/Projects_as_Packages.pdf
library("devtools")
install_github("sirusb/template")
- First, load the package
library("template")
- Now run the function
new_project
to create a directory with all the scaffolding (slightly modified from R package structure). For example, to start a new project about tree growth, just use:
new_project("treegrowth")
If you want to create a GitHub repository for the project at the same time, use instead:
new_project("treegrowth", github = TRUE, private.repo = FALSE, travis = TRUE)
This will create a new folder with this structure:
Note that to create a GitHub repo you will need to have configured your system as explained in http://www.rdocumentation.org/packages/devtools/functions/use_github. And for Travis to run you will need to activate it at https://travis-ci.org/profile.
-
Now edit
README.Rmd
and theDESCRIPTION
file with some basic information about your project: title, brief description, licence, package dependencies, etc. You may also check that project options in Rstudio are fine for you. -
Place original (raw) data in
data-raw
folder. Save all R scripts used for data preparation in the same folder. -
Save final (clean, tidy) datasets in the
data
folder. You may save them as plain text (txt, csv) orrda
format (usingsave
,saveRDS
ordevtools::use_data
). You may write documentation for these data (see http://r-pkgs.had.co.nz/data.html#documenting-data). -
R scripts or Rmarkdown documents used for data analyses may be placed at the
analyses
folder. The final manuscript/report may be placed at themanuscript
folder. You may want to use an Rmarkdown template from e.g. rmdTemplates or rticles. -
If your analyses use functions from other CRAN packages, include them as dependencies (
Imports
) in theDESCRIPTION
file. Also useRoxygen
@import
or@importFrom
in function definitions to import these dependencies in the namespace. -
If you write custom functions, place them in the
R
folder. Document all your functions withRoxygen
(see http://r-pkgs.had.co.nz/man.html). Write tests for your functions (see http://r-pkgs.had.co.nz/tests.html) and place them in thetests
folder. -
Write a
makefile
or master script to organise and execute all parts of the analysis. Render Rmarkdown reports usingrmarkdown::render
, and use RstudioBuild
menu to create/update documentation, run tests, build package, etc. -
Save all the figures to the
figures
folder. You can create sub-directories inside to keep it organized. -
Save the ppt reports to the
reports
folder. -
Share.
- Carl Boettiger and his template package
- Jeff Hollister and his manuscriptPackage
- Robert Flight: http://rmflight.github.io/posts/2014/07/analyses_as_packages.html
- Hadley Wickham: http://r-pkgs.had.co.nz/
- Yihui Xie: http://yihui.name/knitr/
- Rstudio