# Speed comparison between PyPairs and the R verison - R version

Here we ran the sandbag part of the original Pairs method on the oscope dataset for a growing subset of genes. Taking note of the required execution time. Single core time is taken. For the python part and result please see: [2.3 Differences in code - Python](./2.3%20Differences%20in%20code%20-%20Python.ipynb)

## Neccessary Imports

In [1]:
# Loading required packages
library(scran)
library(microbenchmark)

Loading required package: BiocParallel
Loading required package: SingleCellExperiment
Loading required package: SummarizedExperiment
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    anyDuplicated, append, as.data.frame, cbind, colMeans, colnames,
    colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match,
    mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rowM

## Loading the Oscope dataset

In [2]:
# Loading Oscope dataset
oscope.gencounts <- read.csv("./../../data/GSE64016_H1andFUCCI_normalized_EC_human.csv")

# Getting gene names
oscope.gennames <- oscope.gencounts[, grepl( "X" , names( oscope.gencounts ) )]

# Defining sorted dataset where cell cycle phases are annotated for sandbag
oscope.gencounts.sorted <- oscope.gencounts[, grepl( "S_|G1_|G2_" , names( oscope.gencounts ) )]
oscope.gencounts.sorted.matrix <- as.matrix(oscope.gencounts.sorted)
rownames(oscope.gencounts.sorted.matrix) <- oscope.gennames

# Getting annotation of cell cycle for samples
is.G1 <- grepl("G1_", names(oscope.gencounts.sorted))
is.S <- grepl("S_", names(oscope.gencounts.sorted))
is.G2M <- grepl("G2_", names(oscope.gencounts.sorted))

In [3]:
dim(oscope.gencounts.sorted.matrix)[1]

## Running sandbag with increasing number of genes

In [9]:
for (g in c(10,100,500,1000,5000,10000,19000)){
    sub <- sample(1:19084, g, replace=F)
    m <- oscope.gencounts.sorted.matrix[sub, ]
    ptm <- proc.time()
    oscope.predictor <- sandbag(m, list(G1=is.G1, S=is.S,G2M=is.G2M), fraction=0.65)
    time_sandbag <- proc.time() - ptm
    print(time_sandbag[["elapsed"]])
}

[1] 0.01
[1] 0.08
[1] 1.49
[1] 6.37
[1] 180.56
[1] 803.64
[1] 2761
