# MOCA dataset analysis on Colab

<img src="https://oncoscape.v3.sttrcancer.org/atlas.gs.washington.edu.mouse.rna/assets/marquee-mouse.jpg" />
Image from Fred Hutch's <a href="https://oncoscape.v3.sttrcancer.org/atlas.gs.washington.edu.mouse.rna/landing">MOCA home page</a>

## Introduction
This notebook, [moca_on_colab.ipynb](https://github.com/reconstrue/single_cell_on_colab/blob/master/datasets/moca_on_colab.ipynb), explores Fred Hutch's mouse organogenesis cell atlas (MOCA), a snRNA-seq dataset containing about 1.3 million cells.


## MOCA dataset web links
- Website: [Mouse Organogenesis Cell Atlas (MOCA)](http://atlas.gs.washington.edu/mouse-rna/)
-A 4 minute video of [MOCA data viewed in BioTuring Browser](https://www.youtube.com/watch?v=If3i2Gqtxas)
- [2 million-cell experiment traces how a mammal grows, cell by single cell
](https://brotmanbaty.org/2-million-cell-experiment-traces-how-a-mammal-grows-cell-by-single-cell/): a technically accurate press released, which introduced the paper concomitant with its publication in Nature. 
- [Paper published in Nature](https://www.nature.com/articles/s41586-019-0969-x#author-information)
- [Preprint](http://cole-trapnell-lab.github.io/pdfs/papers/cao-spielmann-mouse-emb.pdf) 


## Download MOCA dataset 

First step is to download the dataset to the Colab VM's file system.
- [Data on MOCA site](https://oncoscape.v3.sttrcancer.org/atlas.gs.washington.edu.mouse.rna/landing)


There are multiple files made available for download. One of them is `cds_cleaned_samples_100k.RDS` which is descibed as:
>single cell cds data with 100,000 sampled cells (340M)
>For testing purpose, this 100,000 sampled cells from the filtered data set above

In [9]:
# download the 100K file
import urllib.request    

hundred_k_rds_url = "https://urldefense.proofpoint.com/v2/url?u=https-3A__shendure-2Dweb.gs.washington.edu_content_members_cao1025_public_mouse-5Fembryo-5Fatlas_cds-5Fcleaned-5Fsampled-5F100k.RDS&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=4os15r5D3L-b7X7xAf5xTq8w511N9-T9roRgpZ8Jnzc&m=Ku3t8tpaR3GmIGYtwgQKChz2hxlcoRwRxPEF8BjwAlQ&s=UBhFmg42nWvqkH83ii3niy_uqgWM1-DjGe7yUsJePVw&e="
dest_filename = "/content/moca_cds_cleaned_sampled_100k.RDS"
parked_filename, _ = urllib.request.urlretrieve(hundred_k_rds_url, dest_filename)
print("Downloaded to: %s" % parked_filename)

Downloaded to: /content/moca_cds_cleaned_sampled_100k.RDS


In [11]:
!ls -lh

total 339M
-rw-r--r-- 1 root root 339M Nov 30 10:56 moca_cds_cleaned_sampled_100k.RDS
drwxr-xr-x 1 root root 4.0K Nov 21 16:30 sample_data


In [0]:
with open(parked_filename, "rb") as f:
  read_data = f.read()

## Python meets RDS file format

The data is distributed as .Rds file, which is an `R`-world format. There is a Python package, `pyreader`, which can rear such files. Via [stackoverflow.com](https://stackoverflow.com/a/53956614)


### Take 1: pyreadr

In [27]:
#!locate zlib
!apt-get install zlib1g-dev

Reading package lists... Done
Building dependency tree       
Reading state information... Done
zlib1g-dev is already the newest version (1:1.2.11.dfsg-0ubuntu2).
zlib1g-dev set to manually installed.
The following package was automatically installed and is no longer required:
  libnvidia-common-430
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 32 not upgraded.


In [28]:
!pip install pyreadr

import pyreadr

result = pyreadr.read_r(parked_filename) # also works for RData

# done! 
# result is a dictionary where keys are the name of objects and the values python
# objects. In the case of Rds there is only one object with None as key
df = result[None] # extract the pandas data frame 



LibrdataError: ignored

In [22]:
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()
readRDS = robjects.r['readRDS']
df = readRDS(parked_filename)
df = pandas2ri.ri2py(df)

  error reading from connection



RRuntimeError: ignored

### Take 2: scanpy

## References

- [ ] https://github.com/czbiohub/tabula-muris-vignettes

### Other Fred Hutch data

- [ ] [2 million cells](https://brotmanbaty.org/2-million-cell-experiment-traces-how-a-mammal-grows-cell-by-single-cell/)
- [ ] [2018 work with ~100K cells 	Cusanovich DA](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111586)   
- [ ] [A Single-Cell Atlas of In Vivo Mammalian Chromatin
Accessibility](https://cole-trapnell-lab.github.io/pdfs/papers/cusanovich-mouse-atac.pdf) in Cell
