Generic template for research projects structured as R packages (adapted from Pakillo/template)
Rmarkdown documents are great to keep reproducible scientific workflows, tightly integrating code, results and text. I keep a collection of Rmarkdown templates (including some for writing scientific articles, or manuscript reviews) here.
Once we are dealing with more complicated data analysis, and writing custom code and functions for a research project, structuring our project as an R package can bring many advantages (e.g. see here and here).
Hence this package works as a template for new research projects, with the idea of having everything (data, R scripts, functions and manuscripts reporting results) self-contained in the same package (a "research compendium") to facilitate collaboration and promote reproducibility.
A short presentation introducing this approach on 'Structuring data analysis projects as R packages' is available here: https://github.com/Pakillo/template/blob/master/slides/Projects_as_Packages.pdf
- First, load the package
- Now run the function
new_projectto create a directory with all the scaffolding (slightly modified from R package structure). For example, to start a new project about tree growth, just use:
If you want to create a GitHub repository for the project at the same time, use instead:
new_project("treegrowth", github = TRUE, private.repo = FALSE, travis = TRUE)
This will create a new folder with this structure:
Note that to create a GitHub repo you will need to have configured your system as explained in http://www.rdocumentation.org/packages/devtools/functions/use_github. And for Travis to run you will need to activate it at https://travis-ci.org/profile.
Developing the project
DESCRIPTIONfile with some basic information about your project: title, brief description, licence, package dependencies, etc. You may also check that project options in Rstudio are fine for you.
Place original (raw) data in
data-rawfolder. Save all R scripts used for data preparation in the same folder.
Save final (clean, tidy) datasets in the
datafolder. You may save them as plain text (txt, csv) or
devtools::use_data). You may write documentation for these data (see http://r-pkgs.had.co.nz/data.html#documenting-data).
R scripts or Rmarkdown documents used for data analyses may be placed at the
analysesfolder. The final manuscript/report may be placed at the
manuscriptfolder. You may want to use an Rmarkdown template from e.g. rmdTemplates or rticles.
If your analyses use functions from other CRAN packages, include them as dependencies (
Imports) in the
DESCRIPTIONfile. Also use
@importFromin function definitions to import these dependencies in the namespace.
If you write custom functions, place them in the
Rfolder. Document all your functions with
Roxygen(see http://r-pkgs.had.co.nz/man.html). Write tests for your functions (see http://r-pkgs.had.co.nz/tests.html) and place them in the
makefileor master script to organise and execute all parts of the analysis. Render Rmarkdown reports using
rmarkdown::render, and use Rstudio
Buildmenu to create/update documentation, run tests, build package, etc.
Save all the figures to the
figuresfolder. You can create sub-directories inside to keep it organized.
Save the ppt reports to the
- Carl Boettiger and his template package
- Jeff Hollister and his manuscriptPackage
- Robert Flight: http://rmflight.github.io/posts/2014/07/analyses_as_packages.html
- Hadley Wickham: http://r-pkgs.had.co.nz/
- Yihui Xie: http://yihui.name/knitr/