# What is Bioconductor

* Is a open source and open development software repository of R packages for bioinformatics, with some rules and guiding principles;
* It has emphasized reproducible research since its start, and has been an early adapter and driver of tools to do this;
* Why? Productivity and flexibility;
* [2004 Bioconductor: open software development for computational biology and bioinformatics](https://genomebiology.biomedcentral.com/articles/10.1186/gb-2004-5-10-r80);
* [2015 Orchestrating high-throughput genomic analysis with Bioconductor](https://www.nature.com/articles/nmeth.3252)

---
# Installing bioconductor


In [4]:
source("http://www.bioconductor.org/biocLite.R")

Bioconductor version 3.5 (BiocInstaller 1.26.1), ?biocLite for help
A newer version of Bioconductor is available for this version of R,
  ?BiocUpgrade for help


In [5]:
biocLite()

BioC_mirror: https://bioconductor.org
Using Bioconductor 3.5 (BiocInstaller 1.26.1), R 3.4.4 (2018-03-15).
installation path not writeable, unable to update packages: codetools, lattice,
  MASS, spatial
Old packages: 'blob', 'devtools', 'knitr', 'lambda.r', 'matrixStats', 'mgcv',
  'openssl', 'packrat', 'pbdZMQ', 'plogr', 'Rcpp', 'RCurl', 'repr', 'reshape2',
  'rlang', 'R.oo', 'RSQLite', 'stringi', 'stringr', 'testthat', 'tibble',
  'VGAM', 'viridisLite', 'withr', 'XML', 'yaml', 'zoo'


In [6]:
biocValid()

---
# R base types

* Remind what I didn't know


In [35]:
.Machine$integer.max

---
# GRanges - Overview

* Data structure for storing genomic intervals in R;
* It is fast and efficient;
* Many entities in genomics are intervals or sets of intervals (of integers): Promoters, genes, SNPs, CpG islands, ..; sequencing reads, mapped and processed;
* Many tasks involves relating sets of intervals to each other:
    * Which promoter contains SNPs?
    * Which TF binding sites overlap a promoter?
    * Which genes are covered by sequencing reads?

> Functionality in the GenomicRanges and IRanges packages.

* [2013 Software for Computing and Annotating Genomic Ranges](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003118);
* Much functionalities overlaps bedtools.

---
# IRanges - basic usage

In [1]:
library(IRanges)

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colMeans, colnames,
    colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match,
    mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching packag

* IRanges is a vector that contains integer intervals

In [2]:
ir1 <- IRanges(start=c(1,3,5), end=c(3,5,7))
print(ir1)

IRanges object with 3 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1         3         3
  [2]         3         5         3
  [3]         5         7         3


* It's just necessary two arguments because the last is infered

In [5]:
ir2 <- IRanges(start=c(1,3,5), width=3)
print(ir2)

IRanges object with 3 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1         3         3
  [2]         3         5         3
  [3]         5         7         3


In [6]:
start(ir1)

In [13]:
width(ir2) <- 1 # it resize the irange
print(ir2)

IRanges object with 3 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1         1         1
  [2]         3         3         1
  [3]         5         5         1


In [14]:
names(ir1) <- paste('A', 1:3, sep='')
print(ir1)

IRanges object with 3 ranges and 0 metadata columns:
         start       end     width
     <integer> <integer> <integer>
  A1         1         3         3
  A2         3         5         3
  A3         5         7         3


In [15]:
dim(ir1) # vectors don't have dimension

NULL

In [17]:
length(ir1)

* Select using idex or name

In [19]:
ir1[1]

IRanges object with 1 range and 0 metadata columns:
         start       end     width
     <integer> <integer> <integer>
  A1         1         3         3

In [20]:
ir1['A1']

IRanges object with 1 range and 0 metadata columns:
         start       end     width
     <integer> <integer> <integer>
  A1         1         3         3

* Combine ir vectors

In [21]:
c(ir1, ir2)

IRanges object with 6 ranges and 0 metadata columns:
         start       end     width
     <integer> <integer> <integer>
  A1         1         3         3
  A2         3         5         3
  A3         5         7         3
             1         1         1
             3         3         1
             5         5         1

* Normal irange

In [25]:
ir <- IRanges(start=c(1,3,7,9), end=c(4,4,8,10))
ir

IRanges object with 4 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1         4         4
  [2]         3         4         2
  [3]         7         8         2
  [4]         9        10         2

* Resize an ir

In [26]:
resize(ir, width=1, fix='start')

IRanges object with 4 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1         1         1
  [2]         3         3         1
  [3]         7         7         1
  [4]         9         9         1

In [27]:
resize(ir, width=1, fix='center')

IRanges object with 4 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         2         2         1
  [2]         3         3         1
  [3]         7         7         1
  [4]         9         9         1

In [28]:
ir1 <- IRanges(start=c(1,3,5), width=1)
ir2 <- IRanges(start=c(4,5,6), width=1)
ir1
ir2

IRanges object with 3 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1         1         1
  [2]         3         3         1
  [3]         5         5         1

IRanges object with 3 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         4         4         1
  [2]         5         5         1
  [3]         6         6         1

* Union ir is a combination of concatenate and reduce functions

In [29]:
union(ir1, ir2)

IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1         1         1
  [2]         3         6         4

In [33]:
reduce(c(ir1, ir2))

IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1         1         1
  [2]         3         6         4

In [34]:
intersect(ir1, ir2)

IRanges object with 1 range and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         5         5         1

* `findOverlaps()` allows us to relate two sets of IRanges to each other

In [35]:
ir1 <- IRanges(start=c(1,4,8), end=c(3,7,10))
ir2 <- IRanges(start=c(3,4), width=3)
ir1
ir2

IRanges object with 3 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1         3         3
  [2]         4         7         4
  [3]         8        10         3

IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         3         5         3
  [2]         4         6         3

In [36]:
queryHits(ov)

ERROR: Error in from(x, ...): object 'ov' not found


In [37]:
unique(queryHits(ov))

ERROR: Error in from(x, ...): object 'ov' not found


In [38]:
args(findOverlaps)

In [40]:
countOverlaps(ir1, ir2) # How often do I see overlaps between a query set and a subject set?

In [41]:
ir1
ir2

IRanges object with 3 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1         3         3
  [2]         4         7         4
  [3]         8        10         3

IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         3         5         3
  [2]         4         6         3

In [42]:
nearest(ir1, ir2) # Which of these IRanges in ir2 are closer to the ones in ir1? 

---
# GenomicRanges - GRanges