# Quick Start Demo
ROCCO usage examples on a set of 10 publicly available ATAC-seq alignments.


Note, this demo assumes the ROCCO repository has been $\text{cloned}^1$ and the current working directory is: `/path/to/ROCCO/demo/`

$^1$:  `git clone https://github.com/nolan-h-hamilton/ROCCO.git`

## Obtaining Data
**Download and index** the ATAC-seq alignments (human lymphoblast) used for this demo:
```
xargs -L 1 curl -O -J -L < demo_bams.txt
```
in the current working directory, followed by:
```
samtools index -M *.bam
```
at the command line. Note that these BAM files have already been filtered as part of the ENCODE ATAC-seq [pipeline](https://www.encodeproject.org/pipelines/ENCPL344QWT/). As is customary at the peak calling stage, ROCCO assumes a suitable QC protocol has already been applied.


## `.BAM --> .WIG`: [`rocco prep`](https://nolan-h-hamilton.github.io/ROCCO/rocco/prep.html)
In this step, we convert the alignments into [wiggle files](https://genome.ucsc.edu/goldenPath/help/wiggle.html) and structures data for construction of the sample-by-locus input matrix, $\mathbf{S} \in \mathbb{R}^{K \times n}$.

See the [documentation](https://nolan-h-hamilton.github.io/ROCCO/rocco/prep.html) for additional details regarding this step.


In [None]:
!rocco prep --bamdir . -s hg38

## Example I: Run ROCCO on `chr21` with default parameters

Track data for each chromosome is now available in `prep`'s output `tracks_chr[]` directories.

See subcommand documentation for [`rocco chrom`](https://nolan-h-hamilton.github.io/ROCCO/rocco/chrom.html).

In [None]:
!rocco chrom --wig_path tracks_chr21 --chrom chr21

#### Peak score filtering
In this call, require peak scores of at least 50 for inclusion in the final annotation

In [None]:
!rocco chrom --wig_path tracks_chr21 --chrom chr21 --filter_by_score 50

## Example II: Run ROCCO using a `-p/--param_file`

Copy and paste the following into a file named **test_params.csv**:
```
chromosome,input_path,budget,gamma,tau,c1,c2,c3
chr19,tracks_chr19,0.05,NULL,NULL,NULL,NULL,NULL
chr20,tracks_chr20,0.035,NULL,NULL,NULL,NULL,NULL
chr21,tracks_chr21,0.025,NULL,NULL,NULL,NULL,NULL
chr22,tracks_chr22,0.35,NULL,NULL,NULL,NULL,NULL
```

More generally, to run genome-wide with default chromosome-specific parameters for humans, you can run

```
rocco gwide -p hg_params
```

or

```
rocco gwide -p mm_params
```

for mice. `hg_params` points to the CSV file [`hg38_params.csv`](https://github.com/nolan-h-hamilton/ROCCO/blob/main/rocco/hg38_params.csv) that is included in the ROCCO installation with `pip` and `conda`. `mm_params` points to `mm10_params.csv`.

For additional details, see the subcommand documentation for [`rocco gwide`](https://nolan-h-hamilton.github.io/ROCCO/rocco/gwide.html).

In [None]:
!rocco gwide -p test_params.csv --outdir demo_outdir --combine demo_out.bed

### Example III: Integrating sample metadata
For differential accessibility testing with imbalanced classes, it may be beneficial to run ROCCO separately on each class and then merge results *post hoc* before computing the count matrix. This protocol is easily applied by invoking `rocco gwide`'s `--coldata` argument.

Example coldata file with arbitrarily assigned groups for this demonstration:
```
sample	group	sex
ENCFF495DQP	A	M
ENCFF395ZMS	A	M
ENCFF231YYD	A	M
ENCFF009NCL	A	M
ENCFF621AYF	B	M
ENCFF767FGV	B	M
ENCFF110EWQ	B	M
ENCFF797EAL	B	M
ENCFF948HNW	B	M
```

Copy/paste the above into a tab-separated file, **`coldata.tsv`**.

Using the command in the next cell, ROCCO will be run for each group (A,B) separately with results stored in `group_{group_name}.bed`

In [None]:
!rocco gwide -p hg_params --coldata coldata.tsv --sample_column sample --group_column group

Sort/merge bed files, exclude regions in `BL_regions.bed`

In [None]:
!cat group_A.bed group_B.bed | bedtools sort -i stdin | bedtools merge -i stdin | bedtools intersect -a stdin -b BL_regions.bed -v > groups_combined_sorted.bed

#### Generating a count matrix

**With `count_matrix.py`**

[count_matrix.py](https://github.com/nolan-h-hamilton/ROCCO/blob/main/demo/count_matrix.py) is included in the ROCCO repository and can be used to generate a count matrix conformable with differential analysis software, e.g., DESeq2.

In [None]:
!python count_matrix.py -i groups_combined_sorted.bed -m coldata.tsv --bamdir . -o demo_count_matrix.tsv --sample_column sample



Several external tools, e.g., [`bedtools multicov`](https://bedtools.readthedocs.io/en/latest/content/tools/multicov.html), can also be used to generate a count matrix:

```
bedtools multicov -bams <space-separated BAM files> -bed groups_combined_sorted.bed
```
for downstream, make sure the order of files supplied to `-bams` matches that in **`coldata.tsv`**