# HiCExplorer using macaque data

***Before using this notebook, make sure you use the 'hic' kernel.*** 

I might change the env into a seperate HiC env for each of the tools, such as HICExplorer, Cooler/cooltools, etc. 

As of this update, there is only one environment in the repo, and it contains both Python, R, and CLI tools for use in HI-C data analysis. 


## Getting started

In this notebook, we will use HiCExplorer to analysis Hi-C reads from macaque monkeys. We will follow an [example](https://hicexplorer.readthedocs.io/en/latest/content/example_usage.html#how-we-use-hicexplorer) from the HiCExplorer documentation.

First, let's look at the files we have:

In [21]:
!tree ../../../kmt/hicmaps/macaque/GSE109344_data/data

!tree ../data/links

../../../kmt/hicmaps/macaque/GSE109344_data/data
├── GSE109344_monkey_fibro.allValidPairs -> GSE109344_monkey_fibro_allValidPairs.txt
├── GSE109344_monkey_fibro_allValidPairs.txt
├── GSE109344_monkey_fibro_pac_rs_gene.fpkm -> GSE109344_monkey_fibro_pac_rs_gene.fpkm.txt
├── GSE109344_monkey_fibro_pac_rs_gene.fpkm.txt
├── GSE109344_monkey_pac.allValidPairs -> GSE109344_monkey_pac_allValidPairs.txt
├── GSE109344_monkey_pac_allValidPairs.txt
├── GSE109344_monkey_rs.allValidPairs -> GSE109344_monkey_rs_allValidPairs.txt
├── GSE109344_monkey_rs_allValidPairs.txt
├── GSE109344_monkey_spa.allValidPairs -> GSE109344_monkey_spa_allValidPairs.txt
├── GSE109344_monkey_spa_allValidPairs.txt
├── GSE109344_monkey_sperm.allValidPairs -> GSE109344_monkey_sperm_allValidPairs.txt
└── GSE109344_monkey_sperm_allValidPairs.txt

0 directories, 12 files
../data/links
└── macaque_data -> ../../../../kmt/hicmaps/macaque/GSE109344_data/data

1 directory, 0 files


### What is inside the files

In [18]:
%%bash

ls -lh ../../../kmt/hicmaps/macaque/GSE109344_data/data/$FILE

echo ""

for FILE in $(ls ../../../kmt/hicmaps/macaque/GSE109344_data/data)
do 
    echo ">> head -n2 $FILE"
    head -n2 ../../../kmt/hicmaps/macaque/GSE109344_data/data/$FILE
    echo ""
done



total 306G
lrwxrwxrwx 1 kmt baboondiversity   40 May  2 16:08 GSE109344_monkey_fibro.allValidPairs -> GSE109344_monkey_fibro_allValidPairs.txt
-rw-rw-r-- 1 kmt baboondiversity  61G May  1 14:28 GSE109344_monkey_fibro_allValidPairs.txt
lrwxrwxrwx 1 kmt baboondiversity   43 May  2 16:09 GSE109344_monkey_fibro_pac_rs_gene.fpkm -> GSE109344_monkey_fibro_pac_rs_gene.fpkm.txt
-rw-rw-r-- 1 kmt baboondiversity 1.1M May  1 14:28 GSE109344_monkey_fibro_pac_rs_gene.fpkm.txt
lrwxrwxrwx 1 kmt baboondiversity   38 May  2 16:09 GSE109344_monkey_pac.allValidPairs -> GSE109344_monkey_pac_allValidPairs.txt
-rw-rw-r-- 1 kmt baboondiversity  82G May  1 14:43 GSE109344_monkey_pac_allValidPairs.txt
lrwxrwxrwx 1 kmt baboondiversity   37 May  2 16:10 GSE109344_monkey_rs.allValidPairs -> GSE109344_monkey_rs_allValidPairs.txt
-rw-rw-r-- 1 kmt baboondiversity  62G May  1 14:55 GSE109344_monkey_rs_allValidPairs.txt
lrwxrwxrwx 1 kmt baboondiversity   38 May  2 16:10 GSE109344_monkey_spa.allValidPairs -> GSE109344_

## Convert the format

In [3]:
!head -n 100 ../data/links/macaque_data/GSE109344_monkey_fibro.allValidPairs | awk '{print $2, $3, $3+1, $5, $6, $6+1}'

chr1 31 32 chr5 214251 214252
chr1 32 33 chr15 57886364 57886365
chr1 34 35 chr10 61122746 61122747
chr1 34 35 chr14 44036807 44036808
chr1 35 36 chr2 82076213 82076214
chr1 43 44 chr10 52711984 52711985
chr1 44 45 chr1 2602668 2602669
chr1 45 46 chr5 11863259 11863260
chr1 45 46 chr9 99351902 99351903
chr1 48 49 chr5 23829 23830
chr1 48 49 chr9 5814466 5814467
chr1 49 50 chr1 17555 17556
chr1 50 51 chr5 138691168 138691169
chr1 51 52 chr1 2784680 2784681
chr1 51 52 chr14 1291765 1291766
chr1 53 54 chr1 2276683 2276684
chr1 53 54 chr14 14068817 14068818
chr1 55 56 chr5 39602073 39602074
chr1 56 57 chr16 55377362 55377363
chr1 57 58 chr1 1388389 1388390
chr1 58 59 chrX 84752078 84752079
chr1 61 62 chr9 114926426 114926427
chr1 63 64 chr1 121156298 121156299
chr1 63 64 chr14 10232915 10232916
chr1 63 64 chr14 27098869 27098870
chr1 65 66 chr6 154971391 154971392
chr1 66 67 chr17 10250744 10250745
chr1 67 68 chr4 83604512 83604513
chr1 67 68 chr16 32192925 32192926
chr1 69 70 chr1 36374 3

## HiCExplorer


In [15]:
!hicexplorer --help

usage: hicexplorer [-h] [--version]

HiCExplorer addresses the common tasks of Hi-C analysis from processing to visualization.
Each tool should be called by its own name as in the following example:

 $ hicPlotMatrix -m hic_matrix.h5 -o plot.pdf

If you find HiCExplorer useful for your research please cite as:

Fidel Ramirez, Vivek Bhardwaj, Jose Villaveces, Laura Arrigoni, Bjoern A Gruening, Kin Chung Lam,
Bianca Habermann, Asifa Akhtar, Thomas Manke.
"High-resolution TADs reveal DNA sequences underlying genome organization in flies".
Nature Communications, Volume 9, Article number: 189 (2018), doi: https://doi.org/10.1038/s41467-017-02525-w

Joachim Wolff, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach,
Thomas Manke, Rolf Backofen, Fidel Ramírez, Björn A Grüning.
"Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization",
Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W11–W16, doi: http

In [20]:
!hicexplorer hicBuildMatrix 

usage: hicexplorer [-h] [--version]

HiCExplorer addresses the common tasks of Hi-C analysis from processing to visualization.
Each tool should be called by its own name as in the following example:

 $ hicPlotMatrix -m hic_matrix.h5 -o plot.pdf

If you find HiCExplorer useful for your research please cite as:

Fidel Ramirez, Vivek Bhardwaj, Jose Villaveces, Laura Arrigoni, Bjoern A Gruening, Kin Chung Lam,
Bianca Habermann, Asifa Akhtar, Thomas Manke.
"High-resolution TADs reveal DNA sequences underlying genome organization in flies".
Nature Communications, Volume 9, Article number: 189 (2018), doi: https://doi.org/10.1038/s41467-017-02525-w

Joachim Wolff, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach,
Thomas Manke, Rolf Backofen, Fidel Ramírez, Björn A Grüning.
"Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization",
Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W11–W16, doi: http