biocadv_6x/bioc2_rpacks.Rmd

---
layout: page
title: "Understanding and building R packages"
---

```{r options, echo=FALSE}
library(knitr)
opts_chunk$set(fig.path=paste0("figure/", sub("(.*).Rmd","\\1",basename(knitr:::knit_concord$get('infile'))), "-"))
```
```{r getpacksa,echo=FALSE,results="hide"}
suppressPackageStartupMessages({
suppressMessages({
library(BiocStyle)
library(devtools)
library(AnnotationDbi)
library(ggbio)
library(gwascat)
library(GenomicRanges)
library(ERBS)
library(OrganismDbi)
library(harbChIP)
library(yeastCC)
library(TxDb.Scerevisiae.UCSC.sacCer3.sgdGene)
})
})
```

<a name="defpack"></a>

## What is an R package?

Conceptually, an R package is a collection of functions, data
objects, and documentation that coherently support a family
of related data analysis operations.

Concretely, an R package is a structured collection of folders,
organized and populated according to the rules of
[Writing R Extensions](http://cran.r-project.org/doc/manuals/r-release/R-exts.html).

<a name="skel"></a>

### A new software package with `package.skeleton`

We can create our own packages using `package.skeleton`.  We'll illustrate that now
with an enhancement to the ERBS package that was created for the course.
We'll create a new package that utilizes the peak data, defining
a function `juxta` that allows us to compare binding peak patterns for the two cell
types on a selected chromosome.  (I have commented out code that
uses an alternative graphics engine, for optional exploration.)

Here's a definition of `juxta`.  Add it to your R session.
```{r makej}
juxta = function (chrname="chr22", ...) 
{
    require(ERBS)
    data(HepG2)
    data(GM12878)
    require(ggbio)
    require(GenomicRanges)  # "subset" is overused, need import detail
    ap1 = autoplot(HepG2[which(seqnames(HepG2)==chrname)])
    ap2 = autoplot(GM12878[which(seqnames(GM12878)==chrname)])
    tracks(HepG2 = ap1, Bcell = ap2, ...)
# alternative code for Gviz below
#    require(Gviz)
#    names(ap1) = "HepG2"
#    names(ap2) = "B-cell"
#    ax = GenomeAxisTrack()
#    plotTracks(list(ax, ap1, ap2))
}
```

Now demonstrate it as follows.

```{r doj,fig=TRUE}
library(ERBS)
juxta("chr22", main="ESRRA binding peaks on chr22")
```

In the video we will show how to use `package.skeleton` and the Rstudio
editor to generate, document, and install this new package!  We will not
streamline the code in `juxta` to make use of inter-package
symbol transfer by properly writing the DESCRIPTION and NAMESPACE
files for the package, but leave this for an advanced course in
software development.

<a name="org"></a>

### A new annotation package with OrganismDbi

We have found the `Homo.sapiens` package to be quite convenient.
We can get gene models, symbol to GO mappings, and so on, without
remembering any more than `keytypes`, `columns`, `keys`, and `select`.
At present there is no similar resource for *S. cerevisiae*.
We can make one, following the OrganismDbi vignette.  This is
a very lightweight integrative package.

```{r doodb}
library(OrganismDbi)
gd = list( join1 = c(GO.db="GOID", org.Sc.sgd.db="GO"),
           join2 = c(org.Sc.sgd.db="ENTREZID",
              TxDb.Scerevisiae.UCSC.sacCer3.sgdGene="GENEID"))
if (!file.exists("Sac.cer3")) # don't do twice...
makeOrganismPackage(pkgname="Sac.cer3",  # simplify typing!
  graphData=gd, organism="Saccharomyces cerevisiae",
  version="1.0.0", maintainer="Student <ph525x@harvardx.edu>",
  author="Student <ph525x@harvardx.edu>",
  destDir=".",
  license="Artistic-2.0")
```

At this point we have a folder structure in our
working folder that can support an installation.
```{r doinst}
install.packages("Sac.cer3", repos=NULL, type="source")
library(Sac.cer3)
Sac.cer3
columns(Sac.cer3)
genes(Sac.cer3)
```

<a name="devtools"></a>

## Using devtools

Let's use `r CRANpkg("usethis")` to create a package centered
around the `juxta` function.  We will learn about
roxygen too.

In the following code chunk, we use `usethis::create_package`
to generate a package folder structure in a temporary
directory.

<a name="create"></a>

```{r doit,message=FALSE}
curd = getwd()
kk = dir.create(tmpd <- tempfile()) # always new
setwd(tmpd)
library(usethis)
create_package("erbsViz", list(Imports="ggbio",
    Description="Juxtapose ESRRA binding and gene models",
    Title="demonstration package"))
setwd("erbsViz")
usethis::use_package("ERBS")    # add ERBS to the Imports field
```

<a name="source"></a>

We are now in the top folder of the package hierarchy.
We will create a text file with documentation (prefixed
by hashtag single quote) in the `r CRANpkg("roxygen2")` format.
The text vector `jlines` defines the content.  (Usually
you will create your documentation and function code
in a text editor.)
```{r continu}
# when using R markdown, we need to setwd in every chunk
setwd(tmpd)
setwd("erbsViz")
jlines = 
"#' render a chromosome and locations of ESRRA binding sites
#' @param chrname character(1) giving UCSC chromosome name
#' @examples
#' juxta()
#' @export
juxta = function (chrname='chr22', ...) 
{
    require(ERBS)
    data(HepG2)
    data(GM12878)
    require(ggbio)
    require(GenomicRanges)
    ac = as.character
    ap1 = autoplot(HepG2[which(ac(seqnames(HepG2))==chrname)])
    ap2 = autoplot(GM12878[which(ac(seqnames(GM12878))==chrname)])
    tracks(HepG2 = ap1, Bcell = ap2, ...)
}
"
```
We descend to the `R` folder and write our text file.
```{r desc}
setwd(tmpd)
setwd("erbsViz")
setwd("R")
writeLines(jlines, "jux.R")
```

<a name="docinst"></a>

We ascend to the package root and run `document` and `install`
to get access to the new package.
```{r dorest}
setwd(tmpd)
setwd("erbsViz")
devtools::document()
devtools::install()
setwd(curd) # go back to where you started
library(erbsViz)
juxta
```

In summary:

- We have used `writeLines` to generate a combination of
roxygen documentation and R code in the file `jux.R`, in
folder `R`, which was
created as a subfolder of folder `erbsViz`.  
- We then changed to the `erbsViz` folder and ran `devtools::document()` that
will translate the roxygen lines to statements in `NAMESPACE`
and to a .Rd file in `man`.
- We then ran `install()` to install the package in R, returned
to our initial folder, and used `library(erbsViz)` to attach
our new package.

Full development would include production of a vignette
and a suite of unit tests, giving
a meaningful basis for using `check()` in devtools.  
In a more extensive course,
these would be addressed, but you can learn about them yourself
by looking at any Bioconductor package.  A good example is 
`r Biocpkg("IRanges")`, which has extensive unit testing.
There are many other good examples.

<a name="wrapup"></a>

## Wrapping up

You are now in a good position to revisit the [motivations
and core values](http://genomicsclass.github.io/book/pages/biomotiv.html) section of the course pages.

In that section we described the concepts of
functional object-oriented programming and continuous integration
as they pertain to the development of highly reliable and
relevant software tools, used in a variety of subdomains of genome
science.  The software engineering and reproducible research
underpinnings of the Bioconductor project are at the heart
of its scientific impact.  For a nice set of remarks on
usability in upcoming cloud contexts, see this recent
[Nature Toolbox commentary](https://www.nature.com/articles/d41586-017-07833-1?utm_source=TWT_NatureNews&sf175334725=1).