# ROCCO Quick Start Demo

This notebook consists of three sections.
1. BAM Preprocessing
1. Running Rocco
1. Analyzing Results

The first section walks through the BAM --> WIG pipeline to generate ROCCO conformable input from a collection of samples' BAM files.

The second section involves running ROCCO for a couple scenarios, and the third section carries out some cursory analysis of results

## BAM Preprocessing

**Download Input Alignments:** To acquire the ATAC-seq alignments (human lymphoblast) used for this demo, run
```
xargs -L 1 curl -O -J -L < demo_files/bam_links.txt
```
in the main `ROCCO` directory.

These files are are obtained from the ENCODE project with the follwing [query](https://www.encodeproject.org/search/?type=Experiment&control_type%21=%2A&status=released&perturbed=false&assay_title=ATAC-seq&biosample_ontology.cell_slims=lymphoblast&audit.ERROR.category%21=extremely+low+read+depth&audit.NOT_COMPLIANT.category%21=low+FRiP+score&audit.NOT_COMPLIANT.category%21=poor+library+complexity&audit.NOT_COMPLIANT.category%21=severe+bottlenecking&audit.WARNING.category%21=moderate+library+complexity&audit.WARNING.category%21=mild+to+moderate+bottlenecking&audit.WARNING.category%21=moderate+number+of+reproducible+peaks).

The downloaded alignment files have been QC-processed with the [ENCODE ATAC-seq pipeline](https://www.encodeproject.org/atac-seq/). In general, we assume the BAM files used as input to ROCCO have been prepared according to some QC standard---duplicate removal, adapter trimming, etc.

#### [`prep_bams.py`](https://nolan-h-hamilton.github.io/ROCCO/prep_bams.html)

```
prep_bams.py [-h] [-i BAMDIR] [-o OUTDIR] [-s SIZES] [-L INTERVAL_LENGTH] [--multi] [-c CORES]

    default parameters:  `BAMDIR = '.', OUTDIR= '.', SIZES = 'hg38', INTERVAL_LENGTH = 50, CORES = 1
```
This script generates a smooth signal track for each sample's BAM file and then divides each into chromosome-specific directories `tracks_<chromosome name>`, thereby providing ROCCO conformable input. The script will call `pysam.index()` for any BAM files that have not yet been indexed. Use `samtools index` at the command line, alternatively.

Full documentation for this script is available [here](https://nolan-h-hamilton.github.io/ROCCO/prep_bams.html), and this [flowchart](https://github.com/nolan-h-hamilton/ROCCO/blob/main/docs/bamsig_flowchart.png) offers a visualization of the workflow.

In [1]:
!python prep_bams.py --multi

/work/users/n/h/nolanh/ROCCO/ENCFF009NCL.bam: running bamSitesToWig.py
/work/users/n/h/nolanh/ROCCO/ENCFF110EWQ.bam: running bamSitesToWig.py
/work/users/n/h/nolanh/ROCCO/ENCFF231YYD.bam: running bamSitesToWig.py
/work/users/n/h/nolanh/ROCCO/ENCFF395ZMS.bam: running bamSitesToWig.py
/work/users/n/h/nolanh/ROCCO/ENCFF495DQP.bam: running bamSitesToWig.py
/work/users/n/h/nolanh/ROCCO/ENCFF621AYF.bam: running bamSitesToWig.py
/work/users/n/h/nolanh/ROCCO/ENCFF767FGV.bam: running bamSitesToWig.py
/work/users/n/h/nolanh/ROCCO/ENCFF797EAL.bam: running bamSitesToWig.py
/work/users/n/h/nolanh/ROCCO/ENCFF801THG.bam: running bamSitesToWig.py
/work/users/n/h/nolanh/ROCCO/ENCFF948HNW.bam: running bamSitesToWig.py
cmd: python3 /work/users/n/h/nolanh/ROCCO/pepatac/bamSitesToWig.py -i /work/users/n/h/nolanh/ROCCO/ENCFF009NCL.bam -c hg38.sizes -w /work/users/n/h/nolanh/ROCCO/ENCFF009NCL.bam.bw -r 50 -m atac -p 1 --variable-step
retval: 0

cmd: python3 /work/users/n/h/nolanh/ROCCO/pepatac/bamSitesToWig.

## Running ROCCO
Note, to run ROCCO in this section, we use the default ECOS solver. If the MOSEK solver is available, you can add `--solver MOSEK` to each command for improved efficiency

#### 1) Run on a Single Chromosome (`chr22`) with Default Parameters
[`ROCCO_chrom.py`](https://nolan-h-hamilton.github.io/ROCCO/ROCCO_chrom.html) assembles $\mathbf{S}_{chr}$ from wig files in `--wig_path` and then solves the optimization problem for the given chromosome.

In [2]:
!python ROCCO_chrom.py --chrom chr22 --wig_path tracks_chr22

ROCCO_chrom: reading wig file tracks_chr22/chr22_ENCFF009NCL.bam.bw.wig
ROCCO_chrom: reading wig file tracks_chr22/chr22_ENCFF110EWQ.bam.bw.wig
ROCCO_chrom: reading wig file tracks_chr22/chr22_ENCFF231YYD.bam.bw.wig
ROCCO_chrom: reading wig file tracks_chr22/chr22_ENCFF395ZMS.bam.bw.wig
ROCCO_chrom: reading wig file tracks_chr22/chr22_ENCFF495DQP.bam.bw.wig
ROCCO_chrom: reading wig file tracks_chr22/chr22_ENCFF621AYF.bam.bw.wig
ROCCO_chrom: reading wig file tracks_chr22/chr22_ENCFF767FGV.bam.bw.wig
ROCCO_chrom: reading wig file tracks_chr22/chr22_ENCFF797EAL.bam.bw.wig
ROCCO_chrom: reading wig file tracks_chr22/chr22_ENCFF801THG.bam.bw.wig
ROCCO_chrom: reading wig file tracks_chr22/chr22_ENCFF948HNW.bam.bw.wig
ROCCO_chrom: writing output: ./ROCCO_out_chr22_0.035_1.0_0.0_1.0_1.0_1.0.bed


#### 2) Run on Multiple Chromosomes with Default Parameters
[`ROCCO.py`](https://nolan-h-hamilton.github.io/ROCCO/ROCCO.html) will look for chromosome-specific parameters in the CSV file specified with the `--param_file` argument, in our case, `demo_files/demo_params.csv`. Since a `NULL` entry is present in each cell in this file, the genome-wide defaults will be used. 

We use the `--multi` flag to run the `ROCCO_chrom.py` jobs simultaneously

**`demo_files/demo_params.csv`:**
```
chromosome,input_path,budget,gamma,tau,c1,c2,c3
chr20,tracks_chr20,NULL,NULL,NULL,NULL,NULL,NULL
chr21,tracks_chr21,NULL,NULL,NULL,NULL,NULL,NULL
chr22,tracks_chr22,NULL,NULL,NULL,NULL,NULL,NULL
```

In [3]:
!python ROCCO.py --param_file demo_files/demo_params.csv --combine ROCCO_out_combined.bed --outdir demo_outdir --multi

cmd: python3 /work/users/n/h/nolanh/ROCCO/ROCCO_chrom.py --chrom chr20 --wig_path tracks_chr20 --budget 0.035 --gamma 1.0 --tau 0.0 --c1 1.0 --c2 1.0 --c3 1.0 --solver ECOS --bed_format 3 --outdir demo_outdir --rr_iter 50
retval: 0

cmd: python3 /work/users/n/h/nolanh/ROCCO/ROCCO_chrom.py --chrom chr21 --wig_path tracks_chr21 --budget 0.035 --gamma 1.0 --tau 0.0 --c1 1.0 --c2 1.0 --c3 1.0 --solver ECOS --bed_format 3 --outdir demo_outdir --rr_iter 50
retval: 0

cmd: python3 /work/users/n/h/nolanh/ROCCO/ROCCO_chrom.py --chrom chr22 --wig_path tracks_chr22 --budget 0.035 --gamma 1.0 --tau 0.0 --c1 1.0 --c2 1.0 --c3 1.0 --solver ECOS --bed_format 3 --outdir demo_outdir --rr_iter 50
retval: 0

combining output files --> ROCCO_out_combined.bed


#### 3) Run on Multiple Chromosomes with Variable Budgets

We run ROCCO over chromosomes 17-19 with specific budgets for each. In this example, budgets are computed loosely based on gene density of the respective chromosome.

**demo_files/spec_params.csv:**
```
chromosome,input_path,budget,gamma,tau,c1,c2,c3
chr17,tracks_chr17,0.04,NULL,NULL,NULL,NULL,NULL
chr18,tracks_chr18,0.03,NULL,NULL,NULL,NULL,NULL
chr19,tracks_chr19,0.05,NULL,NULL,NULL,NULL,NULL
```



In [4]:
!python ROCCO.py --param_file demo_files/spec_params.csv --combine spec_combined.bed --outdir spec_outdir --multi

cmd: python3 /work/users/n/h/nolanh/ROCCO/ROCCO_chrom.py --chrom chr17 --wig_path tracks_chr17 --budget 0.04 --gamma 1.0 --tau 0.0 --c1 1.0 --c2 1.0 --c3 1.0 --solver ECOS --bed_format 3 --outdir spec_outdir --rr_iter 50
retval: 0

cmd: python3 /work/users/n/h/nolanh/ROCCO/ROCCO_chrom.py --chrom chr18 --wig_path tracks_chr18 --budget 0.03 --gamma 1.0 --tau 0.0 --c1 1.0 --c2 1.0 --c3 1.0 --solver ECOS --bed_format 3 --outdir spec_outdir --rr_iter 50
retval: 0

cmd: python3 /work/users/n/h/nolanh/ROCCO/ROCCO_chrom.py --chrom chr19 --wig_path tracks_chr19 --budget 0.05 --gamma 1.0 --tau 0.0 --c1 1.0 --c2 1.0 --c3 1.0 --solver ECOS --bed_format 3 --outdir spec_outdir --rr_iter 50
retval: 0

combining output files --> spec_combined.bed


## Analyzing Results 


#### ROCCO predicted peak regions over `chr22` using default parameters
IDR thresholded peaks and fold change signals from ENCODE are included
![Alt text](demo_files/demo1.png)

#### Peak Summary for Variable Budgets, Human Chromosomes 17-19

In [5]:
!bedtools summary -i spec_combined.bed -g demo_files/chroms.sizes | column -t

chrom  num_records  total_bp  chrom_frac_genome  frac_all_ivls  frac_all_bp  min    max     mean
chr17  4706         3326300   0.37461            0.386          0.384        50     13100   706.821
chr18  4053         2410850   0.36164            0.332          0.278        50     7050    594.831
chr19  3438         2927300   0.26375            0.282          0.338        50     14650   851.454
all    12197        8664450   1.0                1.0            50           14650  710.38  
