# R libraries and Bioconductor

## Packages and Libraries

R is at heart a collection of 'packages'. There is a 'base' system that contains the truly basic commands, such as the assignment operator `->` or the command to create a vector. In addition to that, there are 'standard R' packages that are included when you install the R kernel (in the Jupyter notebook), or 'R' as a program to run either at the command line or with Rstudio. (I've shown some examples of these different ways to run R in class).

### Libraries

Many packages, even those included in [standard R] (https://www.r-project.org/), will need to be 'loaded' to be used. In other words, they exist on your computer (or in your container), but the R kernel doesn't know about them. This is because if it did, R would be using computer memory (RAM) to remember all their functions and variables. If all the available packages were loaded, you might not have any RAM left!

A consequence of this is that you often have to tell R explicitly that you want to use a particular package. You do that using `library`. Let's read in the titanic data set to have something to play with.



In [None]:
titanic <- read.csv("titanic.csv")

In [None]:
head(titanic)

There is a cool R function that will allow us to look at some random rows from a data frame. It's called `sample_n`. Let's try it:

In [None]:
sample_n(titanic, 10)

Oops. It turns out `sample_n` is in the dplyr package. It's installed in your container - but R doesn't know that! Let's tell R we want to use it:

In [None]:
library(dplyr)

In [None]:
sample_n(titanic, 10)

### Installed and installing packages

Now, `dplyr` is actually not part of standard R. It's *installed* separately. There are a multitude of R packages out there. Anyone can write one (yes, even you!!!). They are shared with the public using the [CRAN archive.] (https://cran.r-project.org/) In order to be listed in CRAN, packages need to meet specific criteria for documentation purposes, testing, etc.

You can check to see what packages are installed using `installed.packages()`

In [None]:
installed.packages()

You can install new packages using the command `install.packages()`


In [None]:
install.packages("auk")

In [None]:
remove.packages("auk")

## BioConductor

CRAN is home to many, many R packages. But there is a whole other world out there when it comes to bioinformatics in R. It's called [BioConductor](https://bioconductor.org/). BioConductor is a comprehensive toolkit for all things having to do with high-throughput sequencing data processing and analysis. In this course, we will use the BioConductor package `DESeq2` to perform differential expression analysis. It's the end of the pipeline, after QC, clipping and trimming, aligning and counting. 

### Installing BioConductor packages

BioConductor has it's own installation procedure (and it's own criteria for documentation, testing, etc.) - separate from CRAN. Let's have a look at the page for [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)

In [None]:
source("https://bioconductor.org/biocLite.R")
biocLite("DESeq2")

### DESeq2 and S4 Objects

We'll walk through an example using a sample data set called 'airway'. Airway is an object of type 'SummarizedExperiment'. This kind of object is the basis for many objects used in Bioconductor packages.

In [None]:
library("airway")
data("airway")
se <- airway

In [None]:
str(se)

[This tutorial](https://bioconductor.org/packages/devel/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html) gives a great introduction to the SummarizedExperiment object. We'll take a peek, and then move on to DESeq2 

In [None]:
assays(se)

In [None]:
assays(se)$counts

In [None]:
rowRanges(se)

In [None]:
colData(se)

In [None]:
metadata(se)

In [None]:
# Just a list - we can add elements

metadata(se)$formula <- counts ~ dex + albut

metadata(se)

In [None]:
# subset the first five transcripts and first three samples
se[1:5, 1:3]

In [None]:
assays(se[1:5,1:3])$counts


In [None]:
library("DESeq2")


dds <- DESeqDataSet(se, design = ~ cell + dex)
dds



In [None]:
# remove rows with less than 10 total transcripts

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

In [None]:
# Specify reference level

dds$condition <- factor(dds$condition, levels = c("untreated","treated"))

#alternative
dds$condition <- relevel(dds$condition, ref = "untreated")


In [None]:
dds <- DESeq(ddsSE)
res <- results(dds)
res