<a href="https://colab.research.google.com/github/pachterlab/GRNP_2020/blob/master/notebooks/figure_generation/GenFigS6-S19Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Precalculates data for the supplementary figures S6-S19**

This notebook precalculates the data for the supplementary figures S6-S19, since there are some heavy calculation steps involved for generating the figures. The most demanding task is prediction of unseen molecules for each gene using the ZTNB method. This notebook may take several hours to run.

Steps:
1. Download the code and processed data.
2. Setup the R environment
2. Define a general function to precalculate figure data for a dataset
3. Call this function for all datasets

**1. Define a general function to precalculate data for a dataset**

This function predicts the number of unseen molecules per gene

In [None]:
![ -d "GRNP_2020" ] && rm -r GRNP_2020

!git clone https://github.com/pachterlab/GRNP_2020.git


In [None]:
#download BUG data from Zenodo
!mkdir data
!cd data && wget https://zenodo.org/record/3924675/files/EVAL.zip?download=1 && unzip 'EVAL.zip?download=1' && rm 'EVAL.zip?download=1'

In [None]:
#Check that download worked
!cd data && ls -l && cd EVAL/bus_output && ls -l

**2. Prepare the R environment**

In [None]:
#switch to R mode
%reload_ext rpy2.ipython


In [None]:
#install the R packages
%%R
sourcePath = "GRNP_2020/NotebookAdaptedRCode/"
install.packages("qdapTools")
install.packages("dplyr")
install.packages("stringdist")


**1. Define a general function to precalculate data for a dataset**

This function [fill in]

In [None]:
#create output directory
!mkdir figureData

In [None]:
#First set some path variables
%%R
source("GRNP_2020/RCode/pathsGoogleColab.R")


In [None]:
#Process and filter the BUG file
%%R
source(paste0(sourcePath, "BUGProcessingHelpers.R"))
createStandardBugsData(paste0(dataPath,"EVAL/"), "EVAL", c(0.05, 0.1, 0.2, 0.25, 0.4, 0.6, 0.8, 1))



**4. Generate statistics for the dataset**

Here we create a file with various statistics for the dataset, which is used for generating table S2. It also contains some additional information about the dataset. Generation of this file may take several hours.

In [None]:
%%R
source(paste0(sourcePath, "GenBugSummary.R"))
genBugSummary("EVAL", "Vmn1r13", "Ubb", 10)

In [None]:
!cd figureData/EVAL && ls -l && more ds_summary.txt