A matrix library for spectral and reproducibility analysis of Hi-C contact maps. Several useful functions:
- to calculate the reproducibility metric between 2 HiC contact maps
- Knight Ruiz algorithm for matrix balancing
- to find the average contact frequency as a function of genomic distance
- to find A, B compartment, using method described in Liberman et al. Science 2009
and a few functions for binning a genome, and reading HiC maps
HiC-spector is mostly written in Julia. It has been tested in Julia 0.4 and 0.5. Both the Julia language ( Installationhttp://julialang.org/) and the required packages have to be installed. Please refer to the beginning of the file HiC_spector.jl for the necessary packages. To get some contact maps for testing the code, please follow the instructions shown in the file data/readme_data.
There is a Python script available for quantifying reproducibility. The Python version can read files in genomic coordinates as well as the .hic format (https://github.com/theaidenlab/juicebox/wiki/Data). To do so, please download the Python version of the tool straw (straw.py) developed by the Aiden lab (https://github.com/theaidenlab/straw).
The script run_reproducibility.jl is used to get the reproducibility score from a command-line interface. Usage: > julia run_reproducibility.jl matrix_file1 matrix_file2 Usage
The input file here is a simple text delimited format with no header.
1 1 20
1 2 18
The first and second columns represent the row and column indices of a contact map, whereas the third column is the count. To represent a full matrix, only the upper-triangular component is required. Note that the index should begin with 1.
Please use the files stored in the folder A549 mentioned in ./data/readme_data to test the script run_reproducibility.jl.
Julia users can include the file HiC_spector.jl for their own analysis by simply using
Please refer to the file hic_spector_tutorial.jl for how to use some of the functions and how to read files in other formats.
For non-Julia users, one can use the Python script run_reproducibility.py to obtain the reproducibility score. Usage:
python run_reproducibility.py -F matrix_file1 matrix_file2
If the matrix files are labeled in genomic coordinates of bins, USage:
python run_reproducibility.py t matrix_file1 matrix_file2 40000
where 40000 is the bin size used in the two files
In addition to the text delimited input files, the Python script can calculate reproducibility score for contact maps stored in .hic format. Usage:
python run_reproducibility.py -f hic_file1 hic_file2 chrid resolution
A script is provided in the tool straw (https://github.com/theaidenlab/straw/tree/master/python) for reading the headers (including chr id and the available resolutions) in .hic file.
Regarding memory, given two contact maps of human chr1 binned in a bin-size of 10kb, the code works fine in a laptop (16GB memory) from our experience.