HiNT (Hi-C for copy Number variation and Translocation detection), a computational method to detect CNVs and Translocations from Hi-C data. HiNT has three main components: HiNT-PRE, HiNT-CNV, and HiNT-TL. HiNT-PRE preprocesses Hi-C data and computes the contact matrix, which stores contact frequencies between any two genomic loci; both HiNT-CNV and HiNT-TL starts with HI-C contact matrix, predicts copy number segments, and inter-chromosomal translocations, respectively
R and R packages
Python and Python packages
- python >= 3.5
- pyparix >= 0.3.0, cooler >= 0.7.4, pairtools >= 0.2.2, numpy, scipy, pandas, sklearn, multiprocessing
Java and related tools (Optional: required when want to process Hi-C data with juicer tools)
Perl
Other dependencies
- samtools (1.3.1+)
- BIC-seq2 (0.7.3) ! This is optional: if you don't want to run HiNT-CNV, you don't need this package. [Download BICseq2, unzip it, and give the path of BICseq2-seg_v0.7.3 (/path/to/BICseq2-seg_v0.7.3)].
- bwa (0.7.16+) ! This is optional: required only when your input is fastq
- tabix (0.2.6)
-
Method1: Install using conda (highly recommended)
$ conda install -c su hint
or
$ conda install hint
-
Method2: Install from PyPI using pip.
$ pip install HiNT-Packages
-
Method3: Install manually
- Install HiNT dependencies
- Download HiNT
git clone https://github.com/parklab/HiNT.git
- Go to HiNT directory, install it by
$ python setup.py install
*** Type
$ hint
to test if HiNT successfully installed -
Method 4: Run HiNT in a Docker container (highly recommended)
$ docker pull suwangbio/hint
$ docker run suwangbio/hint hint
See details of the usage on HiNT page at docker hub
Download reference files used in HiNT HERE
- Download HiNT references HERE. Only hg19, hg38 and mm10 are available currently. Unzip it
$ unzip hg19.zip
- Download HiNT background matrices HERE. Only hg19, hg38 and mm10 are available currently. Unzip it
$ unzip hg19.zip
- Download BWA index files HERE. Only hg19, hg38 and mm10 are available currently. Unzip it
$ unzip hg19.zip
- Download the test datasets from HERE
HiNT pre: Preprocessing Hi-C data. HiNT pre does alignment, contact matrix creation and normalization in one command line.
$ hint pre -d /path/to/hic_1.fastq.gz,/path/to/hic_2.fastq.gz -i /path/to/bwaIndex/hg19/hg19.fa --refdir /path/to/refData/hg19 --informat fastq --outformat cooler -g hg19 -n test -o /path/to/outputdir --pairtoolspath /path/to/pairtools --samtoolspath /path/to/samtools --coolerpath /path/to/cooler
$ hint pre -d /path/to/test.bam --refdir /path/to/refData/hg19 --informat bam --outformat juicer -g hg19 -n test -o /path/to/outputdir --pairtoolspath /path/to/pairtools --samtoolspath /path/to/samtools --juicerpath /path/to/juicer_tools.1.8.9_jcuda.0.8.jar
use $ which samtools
$ which pairtools
$ which cooler
to get the absolute path of these tools, and /path/to/juicer_tools.1.8.9_jcuda.0.8.jar
should be the path where you store this file
see details and more options
$ hint pre -h
HiNT cnv: prediction of copy number information, as well as segmentation from Hi-C.
$ hint cnv -m contactMatrix.cool -f cooler --refdir /path/to/refDir/hg19 -r 50 -g hg19 -n test -o /path/to/outputDir --bicseq /path/to/BICseq2-seg_v0.7.3 -e MboI
$ hint cnv -m /path/to/4DNFIS6HAUPP.mcool::/resolutions/50000 -f cooler --refdir /path/to/refDir/hg38 -r 50 -g hg38 -n HepG2 --bicseq /path/to/BICseq2-seg_v0.7.3 -e DpnII --maptrack 36mer
$ hint cnv -m /path/to/4DNFICSTCJQZ.hic -f juicer --refdir /path/to/refDir/hg38 -r 50 -g hg38 -n HepG2 --bicseq /path/to/BICseq2-seg_v0.7.3 -e DpnII
$ hint cnv -m /path/to/4DNFICSTCJQZ.hic -f juicer --refdir /path/to/refDir/hg38 -r 50 -g hg38 -n HepG2 --bicseq /path/to/BICseq2-seg_v0.7.3 -e DpnII --doiter
/path/to/BICseq2-seg_v0.7.3
should be the path where you store this package
see details and more options
$ hint cnv -h
HiNT tl: interchromosomal translocations and breakpoints detection from Hi-C inter-chromosomal interaction matrices.
$ hint tl -m /path/to/data_1Mb.cool,/path/to/data_100kb.cool --chimeric /path/to/test_chimeric.sorted.pairsam.gz --refdir /path/to/refDir/hg19 --backdir /path/to/backgroundMatrices/hg19 --ppath /path/to/pairix -f cooler -g hg19 -n test -o /path/to/outputDir
$ hint tl -m /path/to/4DNFIS6HAUPP.mcool::/resolutions/1000000,/path/to/4DNFIS6HAUPP.mcool::/resolutions/100000 -f cooler --refdir /path/to/refDir/hg38 --backdir /path/to/backgroundMatrices/hg38 -g hg38 -n 4DNFICSTCJQZ -c 0.05 --ppath /path/to/pairix -p 12
$ hint tl -m /path/to/4DNFICSTCJQZ.hic -f juicer --refdir /path/to/refData/hg38 --backdir /path/to/backgroundMatrices/hg38 -g hg38 -n 4DNFICSTCJQZ -c 0.05 --ppath /path/to/pairix -p 12 -o HiNTtransl_juicerOUTPUT
use $ which pairix
to get the absolute path of pairix
see details and more options
$ hint tl -h
In the HiNT-PRE output directory, you will find
jobname.bam
aligned lossless file in bam formatjobname_merged_valid.pairs.gz
reads pairs in pair formatjobname_chimeric.sorted.pairsam.gz
ambiguous chimeric read pairs used for breakpoint detection in pairsam formatjobname_valid.sorted.deduped.pairsam.gz
valid read pairs used for Hi-C contact matrix creation in pairsam formatjobname.mcool
Hi-C contact matrix in cool formatjobname.hic
Hi-C contact matrix in hic format
In the HiNT-CNV output directory, you will find
jobname_GAMPoisson.pdf
the GAM regression resultsegmentation/jobname_bicsq_allchroms.txt
CNV segments with log2 copy ratio and p-values in txt filesegmentation/jobname_resolution_CNV_segments.png
figure to visualize CNV segmentssegmentation/jobname_bicseq_allchroms.l2r.pdf
figure to visualize log2 copy ration in each bin (bin size = resolution you set)segmentation/other_files
intermediate files used to run BIC-seqjonname_dataForRegression/*
data used for regression as well as residuals after removing Hi-C biases
In the HiNT-TL output directory, you will find
jobname_Translocation_IntegratedBP.txt
the final integrated translocation breakpointjobname_chrompairs_rankProduct.txt
rank product predicted potential translocated chromosome pairsotherFolders
intermediate files used to identify the translocation breakpoints