# Test Protocol for the Benchmark Analysis

The revised idea for the benchmark analysis pipeline, is to ask from the user the path to the data files that constitute the benchmark data. This input is fed into the pipeline and the results (tool outputs, visualizations are engendered and stored on physical drive as well). 

Let us commence by instituting the work environment. The tools in question are Chipenrich, Broadenrich, Seq2pathway, Enrichr, and GREAT. We now install the tools and load their respective libraries. 

The following function code installs the R packages for Chipenrich, Broadenrich, and Seq2pathway. The tools *Enrichr* and *GREAT* are only available as web-interfaces and so the input-output for these have to be catered externally. 

## Installing Tool Packages

In [1]:
source("./protocolFunctions/installPackagesBenchmark.R")
installPackagesBenchmark()

Bioconductor version 3.4 (BiocManager 1.30.10), R 3.3.3 (2017-03-06)

Installing package(s) 'BiocVersion', 'seq2pathway'

“package ‘BiocVersion’ is not available (for R version 3.3.3)”
“dependencies ‘latticeExtra’, ‘mvtnorm’ are not available”
also installing the dependencies ‘bit’, ‘checkmate’, ‘bit64’, ‘blob’, ‘DEoptimR’, ‘pcaPP’, ‘Formula’, ‘acepack’, ‘htmlTable’, ‘iterators’, ‘DBI’, ‘RSQLite’, ‘fit.models’, ‘robustbase’, ‘rrcov’, ‘dynamicTreeCut’, ‘fastcluster’, ‘matrixStats’, ‘Hmisc’, ‘impute’, ‘foreach’, ‘doParallel’, ‘preprocessCore’, ‘GO.db’, ‘AnnotationDbi’, ‘robust’, ‘WGCNA’, ‘GSA’, ‘biomaRt’, ‘seq2pathway.data’


“unable to access index for repository https://bioconductor.org/packages/3.4/workflows/bin/macosx/mavericks/contrib/3.3:
  cannot download all files”


Now that we have all the R-based tools synced in, we shall now move towards assembling the test data. The BED files for the respective samples are available at a local folder. However, for the fuller version of the analysis pipeline I shall have the user input the path for the folder that holds the data files for the benchmark data.  

## Installing Support Packages

### Devtools

In [2]:
## 'devtools' provides multiple utilitarian functionalities. Let us install this package as well.

BiocManager::install('devtools')
library(devtools)



Bioconductor version 3.9 (BiocManager 1.30.10), R 3.6.0 (2019-04-26)

Installing package(s) 'devtools'




The downloaded binary packages are in
	/var/folders/hm/c3_fjypn62v5xh5b5ygv267m0000gn/T//RtmpXXHc6s/downloaded_packages


Old packages: 'Rdpack', 'bibtex', 'broom', 'BH', 'DBI', 'DT', 'MASS',
  'RSQLite', 'SparseM', 'boot', 'caTools', 'digest', 'e1071', 'foreign',
  'latticeExtra', 'mime', 'pillar', 'quantreg', 'recipes', 'repr',
  'reticulate', 'rsconnect', 'vctrs'

Loading required package: usethis



### GenomicRanges

In [3]:
BiocManager::install('GenomicRanges')
library(GenomicRanges)

Bioconductor version 3.9 (BiocManager 1.30.10), R 3.6.0 (2019-04-26)

Installing package(s) 'GenomicRanges'




The downloaded binary packages are in
	/var/folders/hm/c3_fjypn62v5xh5b5ygv267m0000gn/T//RtmpXXHc6s/downloaded_packages


Old packages: 'Rdpack', 'bibtex', 'broom', 'BH', 'DBI', 'DT', 'MASS',
  'RSQLite', 'SparseM', 'boot', 'caTools', 'digest', 'e1071', 'foreign',
  'latticeExtra', 'mime', 'pillar', 'quantreg', 'recipes', 'repr',
  'reticulate', 'rsconnect', 'vctrs'

“package ‘GenomicRanges’ was built under R version 3.6.1”
Loading required package: stats4

Loading required package: BiocGenerics

Loading required package: parallel


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colnames, dirname, do.call,
    duplicated, eval, evalq, get, grep, g

## Importing Data

The function **dataImportClean** solicits the directory path from the user that holds the constituent BED files for the benchmark dataset. The input (as depicted below) must be made as a character expression followed by a backslash. The files must have the fundamental attributes of a genomic region, viz. *chrom*, *start*, and *end*. The data is sourced as GRanges objects for subsequent manipulations. 

In [12]:
source("./protocolFunctions/dataImportClean.R")
dataImportClean("./testData/")

The list variable 'samplesInBED' holds the data files. Let us look at the data.

In [13]:
samplesInBED <- readRDS("samplesInBED.rds")
ChIPSeqSamples <- readRDS("ChIPSeqSamples.rds")

In [14]:
samplesInBED

GRangesList object of length 10:
$GSM1847178 
GRanges object with 17638 ranges and 0 metadata columns:
          seqnames            ranges strand
             <Rle>         <IRanges>  <Rle>
      [1]     chr1     875900-876000      *
      [2]     chr1     878500-878600      *
      [3]     chr1     901700-901800      *
      [4]     chr1     941900-942000      *
      [5]     chr1     999500-999800      *
      ...      ...               ...    ...
  [17634]     chrY 58991300-58991600      *
  [17635]     chrY 58993100-58993300      *
  [17636]     chrY 59000500-59000600      *
  [17637]     chrY 59013700-59013800      *
  [17638]     chrY 59020600-59020800      *

...
<9 more elements>
-------
seqinfo: 64 sequences from hg38 genome; no seqlengths

So, we see that we have 10 samples listed in GRanges format. These will be our input to the tools and the basis for comparison.

## Executing Chipenrich, Broadenrich, Seq2pathway

While executing the following function we make an attempt to save the results as R objects and concurrently remove them from active memory for sufficiency.

In [15]:
source("./protocolFunctions/executeChipenrichBroadenrichSeq2pathway.R")
executeChipenrichBroadenrichSeq2pathway("./testData/")

Reading peaks from ./testData/GSM1847178.bed



ERROR: Error in seq2pathway_run(regenerated_samples[[i]]): could not find function "seq2pathway_run"
