# Sleepwalk on Colab test drive

This notebook simply tests if Sleepwalk can be deployed on Colab, hopefully even viewing the dynamic output in Colab or as a fallback simply generate the web view statics pages and copy them to some storage.

## Source
Started from https://github.com/IRkernel/IRkernel/blob/master/example-notebooks/Demo.ipynb

In [1]:
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install()


Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Bioconductor version 3.10 (BiocManager 1.30.10), R 3.6.1 (2019-07-05)
Installing package(s) 'BiocVersion'
Old packages: 'digest', 'rlang', 'roxygen2', 'rprojroot', 'scales', 'selectr',
  'tidyverse', 'xtable'


In [2]:
BiocManager::valid()

“8 packages out-of-date; 0 packages too new”


* sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] BiocManager_1.30.10 compiler_3.6.1      IRdisplay_0.7.0    
 [4] pbdZMQ_0.3-3        tools_3.6.1         htmltools_0.4.0    
 [7] pillar_1.4.2        base64enc_0.1-3     crayon_1.3.4       
[10] Rcpp_1.0.3          uuid_0.1-2    

# Sleepwalk install

https://anders-biostat.github.io/sleepwalk/

In [3]:
install.packages( "sleepwalk" )

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
also installing the dependencies ‘jrc’, ‘cowplot’



## Test drive sleepwalk

- https://raw.githubusercontent.com/anders-biostat/sleepwalk/master/test/demo1.R
- https://cran.r-project.org/web/packages/Rtsne/index.html

In [12]:
install.packages(c("Rtsne", "irlba", "umap", "sleepwalk"))

Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
also installing the dependencies ‘RcppEigen’, ‘reticulate’, ‘RSpectra’



In [0]:
library( Rtsne )
library( irlba )
library( umap )
library( sleepwalk ) 

Demo1.R says:
```
# These two file can be downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE100866
countsRNA_filename <- "~/Downloads/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz"
countsADT_filename <- "~/Downloads/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz"
```

In [26]:
# /content/ is the default dir on Colab
countsRNA_URL <- "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE100866&format=file&file=GSE100866%5FCBMC%5F8K%5F13AB%5F10X%2DADT%5Fclr%2Dtransformed%2Ecsv%2Egz"
countsRNA_filename <- "/content/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz"

countsADT_URL <- "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE100866&format=file&file=GSE100866%5FCBMC%5F8K%5F13AB%5F10X%2DADT%5Fumi%2Ecsv%2Egz"
countsADT_filename <- "/content/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz"

download.file(countsRNA_URL, countsRNA_filename)
download.file(countsADT_URL, countsADT_filename)
list.files("/content/")

In [0]:
# Load the file. (This takes a while as it is a large file.)
countsRNA <- as.matrix( read.csv( gzfile( countsRNA_filename ), row.names = 1) )
countsADT <- as.matrix( read.csv( gzfile( countsADT_filename ), row.names = 1) )


In [25]:
# Calculate for each cell ratio of molecules mapped to human genes
# versus mapped to mouse genes
human_mouse_ratio <- 
  colSums( countsRNA[ grepl( "HUMAN" , rownames(countsRNA) ), ] ) / 
  colSums( countsRNA[ grepl( "MOUSE" , rownames(countsRNA) ), ] )

# Keep only the cells with at least 10 times more human than mouse genes
# and keep only the counts for the human genes.
countsRNA <- countsRNA[ 
  grepl( "HUMAN" , rownames(countsRNA) ), 
  human_mouse_ratio > 10 ]

# Remove the "HUMAN_" prefix from the gene names
rownames(countsRNA) <- sub( "HUMAN_", "", rownames(countsRNA) )


print(colnames(rownames))





# Subset the ADT matrix to the same cells as in the RNA matrix
countsADT <- countsADT[ , colnames(countsRNA) ]


# Calculate size factors
exprsRNA <- matrix( nrow = nrow(countsRNA), ncol = ncol(countsRNA), dimnames = dimnames(countsRNA) )
for( j in seq.int( ncol(countsRNA) ) )
  exprsRNA[,j] <- log2( 1 + countsRNA[,j] / sum(countsRNA[,j]) )

# Run a PCA on the expression data
pca <- prcomp_irlba( t(exprsRNA), n=50 )

# Set the rownames manually (as IRLBA doesn't do that for us)
rownames(pca$x) <- colnames(exprsRNA)
rownames(pca$rotation) <- rownames(exprsRNA)

# Run t-SNE on the data
tsneRNA <- Rtsne( pca$x, pca=FALSE, verbose=TRUE )
rownames(tsneRNA$Y) <- rownames(pca$x)

# Explore the result with sleepwalk
sleepwalk( tsneRNA$Y, pca$x, 0.07 )


NULL


ERROR: ignored