# HiCExplorer using macaque data

***Before using this notebook, make sure you use the 'hic' kernel.*** 

I might change the env into a seperate HiC env for each of the tools, such as HICExplorer, Cooler/cooltools, etc. 

As of this update, there is only one environment in the repo, and it contains both Python, R, and CLI tools for use in HI-C data analysis. 


## Getting started

In this notebook, we will use HiCExplorer to analysis Hi-C reads from macaque monkeys. We will follow an [example](https://hicexplorer.readthedocs.io/en/latest/content/example_usage.html#how-we-use-hicexplorer) from the HiCExplorer documentation.

First, let's look at the files we have. They are from [SRA](https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA430777&o=acc_s%3Aahttps://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA430777&o=acc_s%3Aa), and are Hi-C sequencing data from macaque fibroblasts. The data is generated for the Wang2019 paper. 

In [11]:
%%bash 

tree ../data/links/macaque_fastq

tree ../../../../data/macaque_raw/downloaded/

ls ../../../../data/macaque_raw/downloaded/

[01;34m../data/links/macaque_fastq[0m
├── [01;36mSRR6502335_1.fastq.gz[0m -> [01;31m../../../../../../data/macaque_raw/downloaded/SRR6502335_1.fastq.gz[0m
├── [01;36mSRR6502335_2.fastq.gz[0m -> [01;31m../../../../../../data/macaque_raw/downloaded/SRR6502335_2.fastq.gz[0m
├── [01;36mSRR6502336_1.fastq.gz[0m -> [01;31m../../../../../../data/macaque_raw/downloaded/SRR6502336_1.fastq.gz[0m
├── [01;36mSRR6502336_2.fastq.gz[0m -> [01;31m../../../../../../data/macaque_raw/downloaded/SRR6502336_2.fastq.gz[0m
├── [01;36mSRR6502337_1.fastq.gz[0m -> [01;31m../../../../../../data/macaque_raw/downloaded/SRR6502337_1.fastq.gz[0m
├── [01;36mSRR6502337_2.fastq.gz[0m -> [01;31m../../../../../../data/macaque_raw/downloaded/SRR6502337_2.fastq.gz[0m
├── [01;36mSRR6502338_1.fastq.gz[0m -> [01;31m../../../../../../data/macaque_raw/downloaded/SRR6502338_1.fastq.gz[0m
├── [01;36mSRR6502338_2.fastq.gz[0m -> [01;31m../../../../../../data/macaque_raw/downloaded/SRR6502338_2.fastq

### An overview of the sequences

We can see summary statistics for our sequences in the `SraRunTable.txt`

In [13]:
import pandas as pd

pd.read_csv("../../../../data/macaque_raw/downloaded/SraRunTable.txt")

Unnamed: 0,Run,Assay Type,AvgSpotLen,Bases,BioProject,BioSample,Bytes,Center Name,Consent,DATASTORE filetype,...,ReleaseDate,create_date,version,Sample Name,source_name,SRA Study,cell_type,tissue,agent,genotype
0,SRR6502335,Hi-C,300,73201141800,PRJNA430777,SAMN08375237,31966430779,GEO,public,"fastq,run.zq,sra",...,2019-02-12T00:00:00Z,2018-01-23T23:05:00Z,1,GSM2940099,fibroblast,SRP131117,fibroblast,,,
1,SRR6502336,Hi-C,300,65119970100,PRJNA430777,SAMN08375237,24433383054,GEO,public,"fastq,run.zq,sra",...,2019-02-12T00:00:00Z,2018-01-24T00:16:00Z,1,GSM2940099,fibroblast,SRP131117,fibroblast,,,
2,SRR6502337,Hi-C,300,52769196300,PRJNA430777,SAMN08375236,23015357755,GEO,public,"fastq,run.zq,sra",...,2019-02-12T00:00:00Z,2018-01-24T07:21:00Z,1,GSM2940100,fibroblast,SRP131117,fibroblast,,,
3,SRR6502338,Hi-C,300,52378949100,PRJNA430777,SAMN08375236,22999581685,GEO,public,"fastq,run.zq,sra",...,2019-02-12T00:00:00Z,2018-01-23T22:14:00Z,1,GSM2940100,fibroblast,SRP131117,fibroblast,,,
4,SRR6502339,Hi-C,300,28885941600,PRJNA430777,SAMN08375236,10960123150,GEO,public,"fastq,run.zq,sra",...,2019-02-12T00:00:00Z,2018-01-23T21:03:00Z,1,GSM2940100,fibroblast,SRP131117,fibroblast,,,


## HiCExplorer


In [15]:
!hicexplorer --help

usage: hicexplorer [-h] [--version]

HiCExplorer addresses the common tasks of Hi-C analysis from processing to visualization.
Each tool should be called by its own name as in the following example:

 $ hicPlotMatrix -m hic_matrix.h5 -o plot.pdf

If you find HiCExplorer useful for your research please cite as:

Fidel Ramirez, Vivek Bhardwaj, Jose Villaveces, Laura Arrigoni, Bjoern A Gruening, Kin Chung Lam,
Bianca Habermann, Asifa Akhtar, Thomas Manke.
"High-resolution TADs reveal DNA sequences underlying genome organization in flies".
Nature Communications, Volume 9, Article number: 189 (2018), doi: https://doi.org/10.1038/s41467-017-02525-w

Joachim Wolff, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach,
Thomas Manke, Rolf Backofen, Fidel Ramírez, Björn A Grüning.
"Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization",
Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W11–W16, doi: http

In [20]:
!hicexplorer hicBuildMatrix 

usage: hicexplorer [-h] [--version]

HiCExplorer addresses the common tasks of Hi-C analysis from processing to visualization.
Each tool should be called by its own name as in the following example:

 $ hicPlotMatrix -m hic_matrix.h5 -o plot.pdf

If you find HiCExplorer useful for your research please cite as:

Fidel Ramirez, Vivek Bhardwaj, Jose Villaveces, Laura Arrigoni, Bjoern A Gruening, Kin Chung Lam,
Bianca Habermann, Asifa Akhtar, Thomas Manke.
"High-resolution TADs reveal DNA sequences underlying genome organization in flies".
Nature Communications, Volume 9, Article number: 189 (2018), doi: https://doi.org/10.1038/s41467-017-02525-w

Joachim Wolff, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach,
Thomas Manke, Rolf Backofen, Fidel Ramírez, Björn A Grüning.
"Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization",
Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W11–W16, doi: http