Skip to content
Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
162 lines (141 sloc) 6.02 KB

From Projects to Packages: Organizing R

R Projects, Packages, Git, and Gitlab

This brownbag is an overview of R Packages and R Projects with a focus on organizing analytic work. The general idea is that a little organizational work up front can save a lot of time in the future, especially when you need to revisit work from the past.

R Projects are a way to separate R analysis in a way that makes it easier to resume work, maintain a structure, and reduce overlap. An R project preserves it's own R history, persistant R environment, and working directory. With RStudio it is easy to switch between projects in the upper right hand corner and keep track of the current project. Within a project it is a good idea to follow the same directory layout across all of your projects so it is easy to find what you are looking for:

project_name/
|-----------/analysis.R
|-----------/figures/
|-----------/data/
|-----------/reports/
|-----------/finished_analysis/

R Packages are a way to store, organize, document, and distribute R functions. If you find yourself using the same custom function across multiple scripts, be it by copy pasting or sourcing a specific R file, it is a good candidate for an R package. Then you can just add a call to library(myAwesomeFunctions) at the start of the script -- no need to worry about keeping the copied code updated or sourcing the right file! Even better you can add documentation to the function to automatically create help functions for you, future-you, or others to use. Further if you keep the package on Gitlab it is an easy one line call to update the package on your system.

An example of an R Package hosted on Gitlab is my utilities package.

Initial Setup

  • Install SourceTree and/or Git Bash
  • Login to GitLab
  • Add SSH Key to Gitlab
  • Configure SSH (may already be configured)
  • Install RStudio, RTools for Windows or the devtools package

I am willing to help setup any step in this process, especially if there are issues getting the SSH keys working correctly with SourceTree.

A Basic Workflow Demonstration

  1. Create git repository
  • SourceTree: Clone/New -> Create New Repository -> ~/Documents/brownbag/

  • Git Bash:

    mkdir brownbag
    cd brownbag
    git init
    
  1. Initialize R project
  • RStudio: File -> New Project -> Existing Directory -> ~/Documents/brownbag/
  1. Create README.md
  • In text editor of choice, write a short description
  1. Commit the Readme and .Rproj file
git add README.md
git add brownbag.Rproj
git commit -m "Initial commit"
  1. Push to GitLab
  • Create project in gitlab called Brownbag

  • Add as the remote of the local brownag repository

  • git remote add origin git@gitlab.prod.rm:michael.hutchins/brownbag.git

  • SourceTree: Select brownbag, Settings -> Remotes -> Add - Name: origin
    - URL: git@gitlab.prod.rm:michael.hutchins/brownbag.git

  • Push to remote and set as the upstream remote

    git push -u origin master
    
  1. Add an analysis script
  • New R Script
  1. Git Commit
    git add analysis.R
    git commit
    
  • SourceTree: Add file, Commit
  1. Generate a figure
  • Have script generate and save a figure
  1. Push to Gitlab
  • git push
  • SourceTree: Push
  1. View on GitLab in your projects!

Workflow Goals

  • Every project should have similar file/folder structure to easily see what is happening
  • All final figures/statistics should be recreated by running one script
  • Analysis follow a logical structure for importing, processing, and displaying data
  • Track projects with GitLab allow for easy overview and sharing of code

R Packages

If there is time we can create a simple R package for storing commonly used custom functions and share it via GitLab. The advantages are:

  • No need to source multiple files
  • Do not need to copy around or e-mail out different file versions
  • Functions can have help functions and descriptions
  • Others may find your functions useful and time saving

Authoring R Packages requires installation of:

  • RTools (Windows only)
  • devtools
  • roxygen2
  1. Initialize project

    1. With RStudio: File -> New Project -> New Directory -> R Package (check git repository)
    2. Manually:
      1. Create git repository with README
      2. Initialize R package project with package.skeleton()
  2. Push empty package to Gitlab

  3. Setup RStudio to Build with ROxygen: Build -> Configure Build Tools -> Generate documentation with Roxygen (check all)

  4. Enter a name and e-mail in the Maintainer field of the DESCRIPTION file (it will not build until this is edited)

  5. Add a basic function with documentation in R/math.R

    #' My First Function 
    #'  
    #' This awesome function takes two numbers and adds them together  
    #' @export  
    #' @param a The first number to add  
    #' @param b The second number to add  
    #' @return The numeric sum of the two inputs  
    #' @examples  
    #' x <- awesome(1, 2)  
    #' # x = 3  
    awesome <- function(a, b) {  
        x <- a + b  
        return(x)  
    }  
    
  6. Commit new function to repository

    git add R/math.R
    git commit -m "Added my first awesome function."
    
  7. Build in RStudio: Build -> Build and Reload

  8. Load package and test function

    library(myAweseomeFunctions)
    awesome(10, 13)
    ## View documentation
    ?awesome
    
  9. Commit documentation

    git add man/*md  
    git add NAMESPACE  
    git commit -m "Documentation Update"  
    

    Note: while we don't need to commit the documentation (as it can be generated from the source) it allows for others to install over Gitlab.

  10. Push to Gitlab

  11. Share with a friend!

    • They can either clone the repository and build it themselves
    • Or they can install directly with the devtools package:
    devtools::install_git("git@gitlab.prod.rm:michael.hutchins/myAwesomeFunctions.git")
    
You can’t perform that action at this time.