Tools are added by publication date, newest on top. Unpublished tools are listed at the end of each section. See Hi-C data notes and single-cell Hi-C notes for more. Please, contribute and get in touch! See MDnotes for other data science and genomics-related notes.
- Pipelines
- Resolution improvement
- Normalization
- Reproducibility
- AB compartments
- Peak/Loop callers
- Differential analysis
- TAD callers
- Prediction of 3D features
- SNP-oriented
- CNV and Structural variant detection
- Visualization
- De novo genome scaffolding
- 3D modeling
- Deconvolution
- Haplotype phasing
- Papers
- Courses
- Labs
- Misc
- Microket - 3D genomics data (Hi-C, Micro-C) preprocessing pipeline. A unique read-stitch strategy (stitching complementary reads) improving mapping efficiency (FLASH). Removes adapters (Ktrim), low-quality cycles, PCR duplicates, uses STAR. Outperforms Juicer, HiC-Pro, HiCUP, Fan-C, Distiller in detecting more pairs (especially at shorter distances), running time, memory. Tested on simulated (sim3C) and Rao2014 data. Output: pairs, hic, and cool formats. Linux, command line. Zhao, Yu, Mengqi Yang, Fanglei Gong, Yuqi Pan, Minghui Hu, Qin Peng, Leina Lu, Xiaowen Lyu, and Kun Sun. “Accelerating 3D Genomics Data Analysis with Microcket.” Communications Biology 7, no. 1 (June 1, 2024): 675.
- FAN-C - Python pipeline for Hi-C processing. Input - raw FASTQ (aligned using BWA or Bowtie2, artifact filtering) or pre-aligned BAMs. KR or ICE normalization. Analysis and Visualization (contact distance decay, A/B compartment detection, TAD/loop detection, Average TAD/loop profiles, saddle plots, triangular heatmaps, comparison of two heatmaps). Automatic or modular. Compatible with .cool and .hic formats. Tweet1, Tweet2. Table 1 - detailed comparison of 13 Hi-C processing tools
Kruse, Kai, Clemens B. Hug, and Juan M. Vaquerizas. "FAN-C: A Feature-Rich Framework for the Analysis and Visualisation of Chromosome Conformation Capture Data" Genome Biology 21, no. 1 (December 2020)
- v3 of the Galaxy HiCExplorer - Includes full analysis of Hi-C, Capture-C, scHi-C. Workflow-like description of tools/tasks for each data type.
Wolff, Joachim, Leily Rabbani, Ralf Gilsbach, Gautier Richard, Thomas Manke, Rolf Backofen, and Björn A Grüning. "Galaxy HiCExplorer 3: A Web Server for Reproducible Hi-C, Capture Hi-C and Single-Cell Hi-C Data Analysis, Quality Control and Visualization" Nucleic Acids Research, (July 2, 2020)
- scHiCExplorer - set of command-line tools specifically designed for scHi-C data. scHiCExplorer's documentation.
Wolff, Joachim, Leily Rabbani, Ralf Gilsbach, Gautier Richard, Thomas Manke, Rolf Backofen, and Björn A Grüning. "Galaxy HiCExplorer 3: A Web Server for Reproducible Hi-C, Capture Hi-C and Single-Cell Hi-C Data Analysis, Quality Control and Visualization" Nucleic Acids Research, (July 2, 2020)
- Cooltools formal paper - the suite of computational tools for modular high-level analysis of processed Hi-C data in cooler format, by the Open2C group. Supercede hiclib. Normalization (ICE, including trans-chromosomal), interaction frequency vs. distance decay curves (including within chromosomal arms), A/B compartment analysis (GC content, gene density for orientation, saddle plots), TAD/loop identification (insulation score, HiCCUPS, can work with up to 100bp resolution data, Micro-C, aggregate analyses, on- and off-diagonal pileups). Built using Python and command line interface. Demo/documentation Jupyter notebools demonstrating custom visualization and analysis, GitHub version. Code for manuscript figures. Quaich snakemake pipeline for Hi-C post-processing using cooltools, [chromosight](, mustache, coolpuppy.
Open2C, Nezar Abdennur, Sameer Abraham, Geoffrey Fudenberg, Ilya M. Flyamer, Aleksandra A. Galitsyna, Anton Goloborodko, Maxim Imakaev, Betul A. Oksuz, and Sergey V. Venev. “Cooltools: Enabling High-Resolution Hi-C Analysis in Python.” Preprint. Bioinformatics, November 1, 2022.
- Cooler: Scalable Storage for Hi-C Data and Other Genomically-Labeled Arrays
- cooler - file format for storing Hi-C matrices, sparse, hierarchical, multi-resolution.
Python package for data loading, aggregation, merging, normalization (balancing), viewing, exporting data. Together with "pairs" text-based format, and hic, cooler is accepted by the 4D Nucleome Consortium DAC. Documentation - cooltools - tools to work with .cool files, Documentation
- hiclib - Python tools to QC, map, normalize, filter and analyze Hi-C data
- hic2cool - Lightweight converter between hic and cool contact matrices.
- pairtools - tools for low-level processing of mapped Hi-C paired reads. Documentation.
Abdennur, Nezar, and Leonid Mirny. "Cooler: Scalable Storage for Hi-C Data and Other Genomically-Labeled Arrays" Bioinformatics, January 1, 2020 - cooler - file format for storing Hi-C matrices, sparse, hierarchical, multi-resolution.
- Pairtools - chromatin conformation capture (Hi-C and more) modular CLI tools (Python implementation and ecosystem), from the aligned sam/bam files (bwa, minimap, etc.). Tools include parse , sort, dedup. parse considers various ligation types and properly reports contact pairs. Outputs a tab-separated .pairs file (with header) with contact coordinates and additional information, can be converted into a binned contact matrix using cooler. parse2 handles multiple ligation events (multi-way contacts, PORE-C and MC-3C technologies), better handles newer sequencing data. dedup handles imperfect matches, can consider additional columns. Additional preprocessing tools: flip, header, select, sample, merge. Quality control tools: scaling and _stats, calculate chromosome-specific decay rate of interaction frequencies with distance, additional stats. Protocol-specific tools (restrict for annotating pairs by restriction fragments, phase for annotating haplotype-resolved Hi-C, filterbycov for cleaning up single-cell Hi-C). Integration with cooler and cooltools software from Open2C, the backbone of the distiller pipeline. Table 1 - comparison with Chromap, Juicer, HiC-Pro, HiCExplorer, Fan-C, TADbit. Second fastest to Chromap. pip, conda installable. Documentation.
Open2C, Nezar Abdennur, Geoffrey Fudenberg, Ilya M. Flyamer, Aleksandra A. Galitsyna, Anton Goloborodko, Maxim Imakaev, and Sergey V. Venev. “Pairtools: From Sequencing Data to Chromosome Contacts.” Preprint. Bioinformatics, February 15, 2023.
- HiCExplorer - set of programs to process, normalize, analyze and visualize Hi-C data, Python, .cool format, conversion utilities. Documentation.
Ramírez, Fidel, Vivek Bhardwaj, Laura Arrigoni, Kin Chung Lam, Björn A. Grüning, José Villaveces, Bianca Habermann, Asifa Akhtar, and Thomas Manke. "High-Resolution TADs Reveal DNA Sequences Underlying Genome Organization in Flies" Nature Communications 9, no. 1 (December 2018)
- Galaxy HiCExplorer - a web server for Hi-C data preprocessing, QC, visualization. Docker container.
Wolff, Joachim, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach, Thomas Manke, Rolf Backofen, Fidel Ramírez, and Björn A. Grüning. "Galaxy HiCExplorer: A Web Server for Reproducible Hi-C Data Analysis, Quality Control and Visualization" Nucleic Acids Research 46, no. W1 (July 2, 2018).
- GITAR - full Hi-C pre-processing, normalization, TAD detection, and visualization. Python scripts wrapping other tools. Table 1 summarizes the functionality of existing tools. Documentation.
Calandrelli, Riccardo, Qiuyang Wu, Jihong Guan, and Sheng Zhong. “GITAR: An Open Source Tool for Analysis and Visualization of Hi-C Data.” Genomics, Proteomics & Bioinformatics 16, no. 5 (2018): 365–72.
- HiC-bench - complete pipeline for Hi-C data analysis.
Lazaris, Charalampos, Stephen Kelly, Panagiotis Ntziachristos, Iannis Aifantis, and Aristotelis Tsirigos. "HiC-Bench: Comprehensive and Reproducible Hi-C Data Analysis Designed for Parameter Exploration and Benchmarking" BMC Genomics 18, no. 1 (December 2017)
- TADbit - TADbit is a complete Python library to deal with all steps to analyze, model and explore 3C-based data. With TADbit, the user can map FASTQ files to obtain raw interaction binned matrices (Hi-C like matrices), normalize and correct interaction matrices, identify and compare the Topologically Associating Domains (TADs), build 3D models from the interaction matrices, and finally, extract structural properties from the models. TADbit is complemented by TADkit for visualizing 3D models.
Serra, François, Davide Baù, Mike Goodstadt, David Castillo, Guillaume J. Filion, and Marc A. Marti-Renom. "Automatic Analysis and 3D-Modelling of Hi-C Data Using TADbit Reveals Structural Features of the Fly Chromatin Colors" PLoS Computational Biology 13, no. 7 (July 2017)
Juicer - Java full pipeline to convert raw reads into Hi-C maps, visualized in Juicebox. Calls domains, loops, CTCF binding sites.
file format for storing multi-resolution Hi-C data.Paper
Durand, Neva C., Muhammad S. Shamim, Ido Machol, Suhas S. P. Rao, Miriam H. Huntley, Eric S. Lander, and Erez Lieberman Aiden. "Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments" Cell Systems 3, no. 1 (July 2016)Rao, Suhas S. P., Miriam H. Huntley, Neva C. Durand, Elena K. Stamenova, Ivan D. Bochkov, James T. Robinson, Adrian L. Sanborn, et al. "A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping" Cell 159, no. 7 (December 18, 2014) - Juicer analysis example. TADs defined by frequent interactions. Enriched in CTCF and cohesin members. Five domain types. A1 and A2 enriched in genes. Chr 19 contains 6th pattern B6. Enrichment in different histone modification marks. TADs are preserved across cell types. Yet, differences between Gm12878 and IMR90 were detected. Boundaries detection by scanning image. Refs to the original paper.
- HiC-Pro - Python and command line-based optimized and flexible pipeline for Hi-C data processing. hicpro2juicebox tool to generate Juicebox-compatible files (requires juicebox_clt.jar). Documentation
Servant, Nicolas, Nelle Varoquaux, Bryan R. Lajoie, Eric Viara, Chong-Jian Chen, Jean-Philippe Vert, Edith Heard, Job Dekker, and Emmanuel Barillot. "HiC-Pro: An Optimized and Flexible Pipeline for Hi-C Data Processing." Genome Biology 16 (December 1, 2015) - HiC pipeline, references to other pipelines, comparison. From raw reads to normalized matrices. Normalization methods, fast and memory-efficient implementation of iterative correction normalization (ICE). Data format. Using genotyping information to phase contact maps.
- HiCdat - Hi-C processing pipeline and downstream analysis/visualization. Analyses: normalization, correlation, visualization, comparison, distance decay, PCA, interaction enrichment test, epigenomic enrichment/depletion. Consists of GUI tool for data preprocessing and R package for data analysis.
Schmid, Marc W., Stefan Grob, and Ueli Grossniklaus. "HiCdat: A Fast and Easy-to-Use Hi-C Data Analysis Tool" BMC Bioinformatics 16 (September 3, 2015)
- HiCUP - modular pipeline (Perl) for Hi-C and Capture Hi-C mapping, filtering artifacts, religation of adjacent restriction fragments, removing PCR duplicates. Creates BAM files, QC reports. Details about Hi-C sequencing artifacts. Output compatible with CHiCAGO, GOTHiC, Homer, Hicpipe. Documentation.
Wingett, Steven, Philip Ewels, Mayra Furlan-Magaril, Takashi Nagano, Stefan Schoenfelder, Peter Fraser, and Simon Andrews. "HiCUP: Pipeline for Mapping and Processing Hi-C Data" F1000Research 4 (2015) - HiCUP pipeline, alignment only, removes artifacts (religations, duplicate reads) creating BAM files. Details about Hi-C sequencing artifacts. Used in conjunction with other pipelines.
HiC_Pipeline - Python-based pipeline performing mapping, filtering, binning, and ICE-correcting Hi-C data, from raw reads (.sra, .fastq) to contact matrices. Additionally, converting to sparse format, performing QC. Documentation.
Juicer Tools - for creating/extracting data from .hic files, Arrowhead for finding contact domains, HiCCUPS for loop detection and HiCCUPS Diff for finding differential loops, MotifFinder for characterizing CTCF peaks at loop anchors, Pearsons for calculating the Pearson's correlation matrix of the Observed/Expected, APA aggregated peak analysis, Eigenvector for determining A/B chromatin states, Compare Lists for separating loop lists into common and condition-specific loops. All tools are described in the Extended Experimental Procedures of Rao, Huntley et al. Cell 2014.
ENCODE project Data Production and Processing Standard of the Hi-C Mapping Center, PDF - Computational standards of the Hi-C ENCODE mapping center including quality control measures and computational methods.
Arima mapping pipeline - Mapping pipeline for data generated using Arima-HiC. MboI and HinfI enzymes (source). The combination of restriction enzymes represent ten possible cut sites: ^GATC, G^ANTC, C^TNAG and T^TAA; ‘^’ is the cut site on the plus DNA strand, and the 'N' can be any of the four genomic bases (source)
cword - Perl cworld module and collection of utility/analysis scripts for C data (3C, 4C, 5C, Hi-C).
HiCpipe - an efficient Hi-C data processing pipeline. It is based on Juicer and HiC-pro, which combines the advantages of these two processing pipelines. HiCpipe is much faster than Juicer and HiC-Pro and can output multiple features of Hi-C maps.
HiCExperiment - R package for handling three main Hi-C data formats ((m)cool, hic, HiC-Pro). Imports interaction pairs in the GInteractions objects with the intuitive metadata, like bin IDs, raw and normalized (balanced) interaction frequencies. The objects have the expected slots for features one can define from Hi-C data, like TADs, loops. Allows to query subsets (chunks) of Hi-C data.
my5C - web-based tools, well-documented analysis and visualization of 5S data.
nf-core-hic - Analysis of Chromosome Conformation Capture data (Hi-C and more), Nextflow pipeline. Also, nf-core/hic. Documentation.
dietJuicer - a lighter-weight, HPC flexible version of juicer written with snakemake. Designed for SLURM, modifiable for other HPC job schedulers.
distiller-nf - Java modular Hi-C mapping pipeline for reproducible data analysis, Nextflow pipeline. Alignment, filtering, aggregating Hi-C matrices.
runHiC - aka
, Hi-C data processing pipeline, from raw FASTQ files. Supports bwa-mem, chromap (default), and minimap2 aligners. Performs quality control (plots), filtering, binning (to mcool output), pileup. Parallelized. By XiaoTao Wang. -
4D Nucleome Hi-C Processing Pipeline - set of scripts wrapped in a Docker image. Works with
files. Overview. -
Pairix - a tool to index and query files in Pairs format, a block-compressed text file format for storing paired genomic coordinates (header plus 7 columns: readID, chr1, pos1, chr2, pos2, strand1, strand2). Bgzipped sorted files (chr1, chr2, pos1, then pos2 sorting order) are indexed (less than a second for million lines) by pairix (similar in functionality, but incompatible with tabix). Command-line, R (Rpairix), Python implementations. Supplementary scripts like
are available. pairsqc - QC report generator for pairs files. Standard of the 4D Nucleome consortium, supported by Juicer, cooler, pairtools.Paper
Lee, Soohyun, Carl Vitzthum, Burak H. Alver, and Peter J. Park. "Pairs and Pairix: A File Format and a Tool for Efficient Storage and Retrieval for Hi-C Read Pairs" Bioinformatics preprint, August 26, 2021.
- qc3C - Hi-C quality assessment method based on non-naturally occurring k-mers containing ligation artifacts (the proportion of "signal"). Details of various types for read-pairs, valid and invalid configurations. Tested on simulated and experimental data. Works of FASTQ or BAM files. Output - the breakdown of valid and invalid pairs (numbers, stacked barplots). Compatible with MultiQC. Conda, Docker, Singluarity installations. Scripts for the paper
DeMaere, Matthew Z., and Aaron E. Darling. "Qc3C: Reference-Free Quality Control for Hi-C Sequencing Data" Preprint. Bioinformatics, February 25, 2021.
HiCNoiseMeasurer - a Python script to measure noise in .hic files using the auto-correlation function.
HiCSampler - a Python script for subsetting .hic files.
- Capture-C (low input, NuTi Capture-C, Tiled-C, Tri-C) wet and dry lab protocols. Introduction, comparison with other technologies, 4-base enzyme cutters. General purpose tools: Peaky, PeakC; Capture-C compatible: HiC-Pro, capC-MAP, CCseqBasicS; Tri-C compatible: TriC; Tiled-C compatible: TiledC. Recommended analysis tool for NG-Capture-C, Tri-C and Tiled-C data: CapCruncher. Requires files: a BED file with coordinates of viewpoints, coordinates of enriched regions (for NuTi Capture-C and Tri-C), a configuration file specifying the genome, mapping parameters, experimental method and output directories. Documentation.
Downes, Damien J. “Capture-C: A Modular and Flexible Approach for High-Resolution Chromosome Conformation Capture.” NATURE PROTOCOLS 17 (2022).
- Bacon - tool for benchmarking computational pipelines for targeted chromatin conformation capture technologies (e.g., HiChIP, ChIA-PET). Overlap with ChIP-seq data, the Uniquely Valid Rate, Peak Co-occupancy, Accuracy. HiChIP technology outperforms ChIA-PET. ChIAPoP/MAPS pipelines perform better for ChIA-PET/HiChIP technologies. Figure 3 - practical guidelines.
Tang, Li, Matthew C. Hill, Patrick T. Ellinor, and Min Li. “Bacon: A Comprehensive Computational Benchmarking Framework for Evaluating Targeted Chromatin Conformation Capture-Specific Methodologies.” Genome Biology 23, no. 1 (December 2022): 30.
- CHiCANE - an R-based data processing and interaction calling toolkit for the analysis and interpretation of Capture Hi-C data (Arima, Dovetail). Data preprocessing with HiCUP (recommended), but BAMs from other pipelines are supported. Flexible regression modeling of the number of reads linking bait and target fragments, including distance. (Truncated) Negative binomial, Poisson distributions, zeros may be included, other covariates. Functionality to assess model fit. Similar tools - GOTHiC, CHiCAGO, ChiCMaxima. Protocol explaining each step.
Holgersen, Erle M. "Identifying High-Confidence Capture Hi-C Interactions Using CHiCANE" NATURE PROTOCOLS (April 2021)
- CaptureCompendium - all-in-one toolkit for the design, analysis and presentation of 3C experiments, combines oligonucleotide design Capsequm2, sequence mapping and extraction CCseqBasic, statistical data presentation and distribution CaptureCompare with Peaky integration, CaptureSee. Allows for multi-way interactions (Tri-C). Overview of previous tools doing parts.
Telenius, Jelena M., Damien J. Downes, Martin Sergeant, A. Marieke Oudelaar, Simon McGowan, Jon Kerry, Lars L.P. Hanssen, et al. "CaptureCompendium: A Comprehensive Toolkit for 3C Analysis" Preprint. Bioinformatics, February 18, 2020.
- GOPHER - Java app probe design for Capture Hi-C. All, or selected, promoters, or around GWAS hits. Documentation.
Hansen, Peter, Salaheddine Ali, Hannah Blau, Daniel Danis, Jochen Hecht, Uwe Kornak, Darío G. Lupiáñez, Stefan Mundlos, Robin Steinhaus, and Peter N. Robinson. "GOPHER: Generator Of Probes for Capture Hi-C Experiments at High Resolution" BMC Genomics 20, no. 1 (December 2019).
- capC-MAP - Capture-C analysis pipeline. Python and C++, run through a configuration file. Outputs bedGraph. Compared with HiC-Pro, better detects PCR duplicates, identifies more interactions. Normalization tuned for Capture-C data. Documentation
Buckle, Adam, Nick Gilbert, Davide Marenduzzo, and Chris A Brackley. "capC-MAP: software for analysis of Capture-C data" Bioinformatics, 15 November 2019
- Benchmarking of Capture Hi-C analysis pipelines. HiCUP and mHiC preprocessing, the multimapping read rescue doesn't in mHiC doesn't improve data quality. GOTHIC, CHiCAGO, CHiCANE, CHiCMaxima tools comparison, reproducibility, the proportion of bait-bait interactions, overlap with open chromatin, histone marks. GOTHIC may be too permissive, CHiCANE is the strictest, CHiCAGO and CHiCMaxima overall provide good quality results.
Aljogol, Dina, I. Richard Thompson, Cameron S. Osborne, and Borbala Mifsud. “Comparison of Capture Hi-C Analytical Pipelines.” Frontiers in Genetics 13 (January 28, 2022): 786501.
- Peaky - Bayesian sparse variable selection approach. The model proposes that for any given bait, the expected CHi-C signal at each prey fragment is expressed as a sum of contributions from a set of fragments directly contacting that bait. Documentation.
Eijsbouts, Christiaan Q, Oliver S Burren, Paul J Newcombe, and Chris Wallace. "Fine Mapping Chromatin Contacts in Capture Hi-C Data" BMC Genomics 20, no. 1 (December 2019).
- ChiCMaxima - a pipeline for detection and visualization of chromatin loops in Capture Hi-C data. Loess smoothing combined with a background model to detect significant interactions. Compare with GOTHiC and CHiCAGO.
Ben Zouari, Yousra, Anne M Molitor, Natalia Sikorska, Vera Pancaldi, and Tom Sexton. "ChiCMaxima: A Robust and Simple Pipeline for Detection and Visualization of Chromatin Looping in Capture Hi-C" Genome Biology, 22 May 2019
- HiCapTools - A command-line software package that can design sequence capture probes for targeted chromosome capture applications and analyze sequencing output to detect proximities involving targeted fragments. Two probes are designed for each feature while avoiding repeat elements and non-unique regions. The data analysis suite processes alignment files to report genomic proximities for each feature at restriction fragment level and is isoform-aware for gene features. Statistical significance of contact frequencies is evaluated using an empirically derived background distribution.
Anandashankar Anil, Rapolas Spalinskas, Örjan Åkerborg, Pelin Sahlén; "HiCapTools: a software suite for probe design and proximity detection for targeted chromosome conformation capture applications" Bioinformatics, Volume 34, Issue 4, 15 February 2018
- CHiCAGO (Capture Hi-C Analysis of Genomic Organisation) - R package for Capture Hi-C data processing. Two-component background model (Delaporte distribution) - Brownian motion (Neg. Binom.) and technical noise (Poisson), accounts for distance. Compared with a null model from all possible fragment pairs. Considers asymmetry of bait-fragment interactions, considering bait-bait and bait-other end. The majority of true interactions are on the same chromosome. Weighted (distance-dependent) multiple testing correction. Tested on GM12878 and mESC capture Hi-C data. Enrichment of highly interacting regions in regulatory features and SNP sets (permutations, GoShifter). Autoimmune SNPs are enriched in "other end" fragments.
Cairns, Jonathan, Paula Freire-Pritchett, Steven W. Wingett, Csilla Várnai, Andrew Dimond, Vincent Plagnol, Daniel Zerbino, et al. "CHiCAGO: Robust Detection of DNA Looping Interactions in Capture Hi-C Data." Genome Biology 17, no. 1 (2016): 127.
- CHiCAGO protocol for Capture Hi-C analysis. Introduction into 3C-based technologies, as compared with Hi-C, Statistical model for background noise estimation, normalization, weighted p-value correction. Comparison with other tools (HiCapTools, CHiCMaxima, CHiCANE), downstream analysis with Peaky, Chicdiff. Preprocessing with HiCUP, input files (Table 1), how to create auxillary files and set parameters for different restriction enzymes (R, Python scripts), QC, visualization. CHiCAGO R package, chicagoTools, PCHiCdata R package.
Freire-Pritchett, Paula, Helen Ray-Jones, Monica Della Rosa, Chris Q. Eijsbouts, William R. Orchard, Steven W. Wingett, Chris Wallace, Jonathan Cairns, Mikhail Spivakov, and Valeriya Malysheva. "Detecting Chromosomal Interactions in Capture Hi-C Data with CHiCAGO and Companion Tools" Nature Protocols, August 9, 2021.
- HiChIP protocol improvement for cohesin target. Two cross-linking agents (formaldehyde and EGS). nf-hichip pipeline combining ChIP-specific and HiChIP-specific steps (MAPS pipeline), processing multiple datasets, Nextflow, Docker. Mapping, coverage track, peak calling. Outperforms Juicer, ChIA-PIPE, detects more loops. gStripe - stripe calling from HiChIP data, graph-based, operates on sets of loops from other tools, outperforms Stripenn, Python implementation.
Jodkowska, Karolina, Zofia Parteka-Tojek, Abhishek Agarwal, Michał Denkiewicz, Sevastianos Korsak, Mateusz Chiliński, Krzysztof Banecki, and Dariusz Plewczynski. “Improved Cohesin HiChIP Protocol and Bioinformatic Analysis for Robust Detection of Chromatin Loops and Stripes,” n.d.
- HiChIP-Peak - Command-line HiChIP peak caller, focus on peaks at re-ligation sites. Peak filtering, then negative binomial model. Differential peak analysis similar to DiffBind.
Shi, Chenfu, Magnus Rattray, and Gisela Orozco. "HiChIP-Peaks: A HiChIP Peak Calling Algorithm" Bioinformatics, Volume 36, Issue 12, 15 June 2020
- MAPS - Model-based Analysis of PLAC-seq and HiChIP data. A zero-truncated Poisson regression framework to explicitly remove systematic biases, normalize, call long-distance interactions. Compared with hichipper, identifies more long-range, biologically relevant interactions. Works with Arima's HiChIP data
Juric, Ivan, Miao Yu, Armen Abnousi, Ramya Raviram, Rongxin Fang, Yuan Zhao, Yanxiao Zhang, et al. "MAPS: Model-Based Analysis of Long-Range Chromatin Interactions from PLAC-Seq and HiChIP Experiments" April 15, 2019, 24.
- CID - Chromatin Interaction Discovery, call chromatin interactions from ChIA-PET. Outperforms ChIA-PET2, MANGO pipelines, call more peaks than HICCUPS, hichipper. Java implementation
Guo, Yuchun, Konstantin Krismer, Michael Closser, Hynek Wichterle, and David K Gifford. "High Resolution Discovery of Chromatin Interactions" Nucleic Acids Research, February 14, 2019.
- HiChIP pipeline - by Dovetail Genomics, technology and analysis steps.
- pipe4C - 4C-seq processing pipeline, R implementation.
Krijger, Peter H.L., Geert Geeven, Valerio Bianchi, Catharina R.E. Hilvering, and Wouter de Laat. "4C-Seq from Beginning to End: A Detailed Protocol for Sample Preparation and Data Analysis" Methods 170 (January 2020)
- peakC - an R package for non-parametric peak calling in 4C/Capture-c/PCHiC data.
Geeven, Geert, Hans Teunissen, Wouter de Laat, and Elzo de Wit. "PeakC: A Flexible, Non-Parametric Peak Calling Package for 4C and Capture-C Data" Nucleic Acids Research 46, no. 15 (September 6, 2018)
- 4Cseqpipe processing pipeline and a genome-wide 4C primer database.
Werken, Harmen J. G. van de, Gilad Landan, Sjoerd J. B. Holwerda, Michael Hoichman, Petra Klous, Ran Chachik, Erik Splinter, et al. "Robust 4C-Seq Data Analysis to Screen for Regulatory DNA Interactions" Nature Methods 9, no. 10 (October 2012) - 4C technology paper. Two different 4bp cutters to increase resolution. Investigation of beta-globin locus, interchromosomal interactions.
- VEHiCLE - a variational autoencoder (feature extraction, dimensionality reduction) and Generative Adversarial Network (maps low-dimensional vectors to Hi-C maps) for Hi-C resolution enhancement. Uses a combination of four loss functions: adversarial loss, variational loss, mean square error, and insulation score loss (interesting!). Intro into VAEs, GANs, loss functions. Uses GM12878, IMR90, K562, HMEC data. Compared using five metrics (similarity, reproducibility) against HiCPlus, DeepHiC, HiCSR, outperforms all. Improves TAD identification, 3D structure modeling. Python implementation.
Highsmith, Max, and Jianlin Cheng. "VEHiCLE: A Variationally Encoded Hi-C Loss Enhancement Algorithm for Improving and Generating Hi-C Data" Scientific Reports 11, no. 1 (December 2021)
- HiCRes - resolution estimation, based on the linear dependence of 20th percentile of coverage and the window size used to access coverage. Includes preseq for estimating and predicting library complexity, bowtie2 and HiCUP for estimating Hi-C-specific QC metrics. Relatively insensitive to enzyme of choice. Implemented as Docker/Singularity images. Requires significant computational resources, like 5 hours on 40 CPU cluster.
Marchal, Claire, Nivedita Singh, Ximena Corso-Díaz, and Anand Swaroop. "HiCRes: A Computational Method to Estimate and Predict the Resolution of HiC Libraries" Preprint. Bioinformatics, September 22, 2020
- HiCSR - enhancement of Hi-C contact maps using a Generative Adversarial Network trained to optimize a custom loss function (weighted adversarial loss, pixel-wise L1 loss, and a feature reconstruction loss). An increase in resolution refers to recovering additional Hi-C contacts, "saturating" downsampled and noisy Hi-C matrices, not increasing the number of pixels. Representation learning with autoencoder with several convolutional layers and skip connections, then using it for the generator to create new matrices with discriminator telling them fake or real. Compared with HiCPlus, HiCNN, hicGAN, DeepHiC. Reproducibility is better using four metrics. Python3 PyTorch implementation.
Dimmick, Michael C., Leo J. Lee, and Brendan J. Frey. "HiCSR: A Hi-C Super-Resolution Framework for Producing Highly Realistic Contact Maps" Preprint. Genomics, February 25, 2020.
- DeepHiC - a web-based generative adversarial network (GAN) for enhancing Hi-C data. Does not change the bin size, enhances the content of Hi-C data. Reconstructs the content from ~1% of the original data. Outperforms Boost-HiC, HiCPlus, HiCNN. Documentation.
- Hong, Hao, Shuai Jiang, Hao Li, Cheng Quan, Chenghui Zhao, Ruijiang Li, Wanying Li, et al. "DeepHiC: A Generative Adversarial Network for Enhancing Hi-C Data Resolution" PLOS Computational Biology, February 21, 2020
- HIFI - Command-line tool for Hi-C Interaction Frequency Inference for restriction fragment-resolution analysis of Hi-C data. Sparsity is resolved by using dependencies between neighboring restriction fragments, with Markov Random Fields performing the best. Better resolves TADs and sub-TADs, significant interactions. CTCF, RAD21, SMC3, ZNF143 are enriched around TAD boundaries. Matrices normalized for fragment-specific biases.
- Cameron, Christopher JF, Josée Dostie, and Mathieu Blanchette. "Estimating DNA-DNA Interaction Frequency from Hi-C Data at Restriction-Fragment Resolution" Genome Biology, 14 January 2020
- hicGAN - improving resolution (saturation) of Hi-C data using Generative Adversarial Networks. Generator - five inner residual blocks to fight vanishing gradient (each block has two convolutional layers and batch normalization) and an outer skip connection. The discriminator has three convolutional blocks. Evaluation metrics: MSE, signal-to-noise ratio, structure similarity index, chromatin loop score. Compared against HiCPlus. Python, Tensorflow implementation.
Liu, Qiao, Hairong Lv, and Rui Jiang. "HicGAN Infers Super Resolution Hi-C Data with Generative Adversarial Networks" Bioinformatics 35, no. 14 (July 15, 2019)
- HiCNN - a computational method for resolution enhancement. A modification of the HiCPlus approach, using very deep (54 layers, five types of layers) convolutional neural network. A Hi-C matrix of regular resolution is transformed into the high-resolution but very sparse matrix, HiCNN predicts the missing values. Pearson and MSE evaluation metrics, overlap of Fit-Hi-C-detected significant interactions - perform similar or slightly better than HiCPlus. PyTorch implementation.
Liu, Tong, and Zheng Wang. "HiCNN: A Very Deep Convolutional Neural Network to Better Enhance the Resolution of Hi-C Data" Bioinformatics, April 9, 2019
- Boost-HiC - infer fine-resolution contact frequencies in Hi-C data, performs well even on 0.1% of the raw data. TAD boundaries remain. Better than HiCPlus. It can be used for differential analysis (comparison) of two Hi-C maps.
Carron, Leopold, Jean-baptiste Morlot, Vincent Matthys, Annick Lesne, and Julien Mozziconacci. "Boost-HiC: Computational Enhancement of Long-Range Contacts in Chromosomal Contact Maps" November 18, 2018.
- mHi-C - recovering alignment of multi-mapped reads in Hi-C data. Generative model to estimate probabilities for each bin-pair originating from a given origin. Reproducibility of contact matrices (stratum-adjusted correlation), reproducibility and number of significant interactions are improved. Novel interactions. Enrichment of TAD boundaries in LINE and SINE repetitive elements. Multi-mapping is not sensitive to trimming. Read filtering strategy (Figure 1, supplementary figures are very visual).
Zheng, Ye, Ferhat Ay, and Sunduz Keles. "Generative Modeling of Multi-Mapping Reads with MHi-C Advances Analysis of High Throughput Genome-Wide Conformation Capture Studies" October 3, 2018.
- HiCPlus - increasing resolution of Hi-C data using convolutional neural network, mean squared error as a loss function. Basically, smoothing parts of Hi-C image, then binning into smaller parts. Performs better than bilinear/biqubic smoothing.
Zhang, Yan, Lin An, Ming Hu, Jijun Tang, and Feng Yue. "HiCPlus: Resolution Enhancement of Hi-C Interaction Heatmap" March 1, 2017.
- FreeHi-C v.2.0 - simulation of realistic Hi-C matrices with user- or data-driven spike-ins. Spike-ins are introduced on read-level and converted to interaction frequency level. Benchmark of HiCcompare, multiHiCcompare, diffHiC, and Selfish. Assessment of FDR, power, significance order, PRC and AUROC, genomic properties. GM12878 and A549 replicates of experimental Hi-C data. Three simulation settings with varying background distribution of interaction frequencies, spike-in proportions, sequencing depth. Figure 5 - summary of performances for all methods and comparison types. Subjective top performers: multiHiCcompare, HiCcompare, diffHiC, Selfish.
Zheng, Ye, Peigen Zhou, and Sündüz Keleş. "FreeHi-C Spike-in Simulations for Benchmarking Differential Chromatin Interaction Detection" Methods, July 2020
- FreeHi-C - Hi-C data simulation based on properties of experimental Hi-C data. Preserves A/B compartments, TADs, the correlation between replicated (HiCRep), significant interactions, improves power to detect differential interactions. Robust to sequencing depth changes. Tested on replicates of GM12878, A549 human cancer cells, malaria P.falciparum. Compared with poorly performing Sim3C. Simulated data. Python3 implementation.
Zheng, Ye, and Sündüz Keleş. "FreeHi-C Simulates High-Fidelity Hi-C Data for Benchmarking and Data Augmentation" Nature Methods 17, no. 1 (January 2020)
- VSS-Hi-C - Variance-stabilized signals for chromatin contacts, normalizes individual matrices. Improves subcompartment detection.
Kenari, Neda Shokraneh, Faezeh Bayat, and Maxwell Libbrecht. “VSS-Hi-C: Variance-Stabilized Signals for Chromatin Contacts,” n.d.
- HiConfidence - Python tool for eliminating biases from the Hi-C data by downweighting chromatin contacts from low-quality (low-coverage) Hi-C replicates. For each replicate, calculate differential matrix and then the confidence for each pixel as the inverse pixel-wise difference divided by their mean, raised to the power of a tunable parameter k (Figure 2A). Used for correction for replicates' confidence in calculating TAD boundaries, intra-TAD densities, improves replicate reproducibility (stratum-adjusted correlation coefficient). Compared with multiHiCcompare, aids in differential analysis, compartment and TAD detection. Applied to D. melanogaster S2 cells, GSE200078, processed with distiller, pairtools, cooltools.
Kobets, Victoria A, Sergey V Ulianov, Aleksandra A Galitsyna, Semen A Doronin, Elena A Mikhaleva, Mikhail S Gelfand, Yuri Y Shevelyov, Sergey V Razin, and Ekaterina E Khrameeva. “HiConfidence: A Novel Approach Uncovering the Biological Signal in Hi-C Data Affected by Technical Biases.” Briefings in Bioinformatics, February 9, 2023, bbad044.
- HiCorr - a method for correcting known (mappability, CG content) and unknown (visibility) biases in Hi-C maps (multiplicative effects, Methods). Easy Hi-C protocol allowing for low-input (~100K cells) Hi-C (in vivo HindIII digestion, in situ proximity ligation, DpnII digestion after lysis and reverse crosslink, Methods). HiCorr outputs ratio matrixes representing enrichment of Hi-C signal, hence loops can be easily extracted. Recovers 65% of HICCUPS loops and more. Chromatin loops are better marks of cell identity than compartments and outperform eQTLs in defining neurological GWAS target genes. Human iPSCs, neural progenitors (NPCs), neurons, fetal cerebellum, adult temporal cortex, data from other studies.
Lu, Leina, Xiaoxiao Liu, Wei-Kai Huang, Paola Giusti-Rodríguez, Jian Cui, Shanshan Zhang, Wanying Xu, et al. "Robust Hi-C Maps of Enhancer-Promoter Interactions Reveal the Function of Non-Coding Genome in Neural Development and Diseases" Molecular Cell, June 2020
- normGAM - an R package to normalize Genomic Architecture Mapping (GAM) data. New type of systematic bias, the fragment length bias. normGAM eliminates the fragment length bias resulting from random slicing, and biases related to window detection frequency, mappability, and GC content. Five normalization methods, including newly designed KR2 that handles negative values (others include the original GAM normalization algorithm normalized linkage disequilibrium NLD, VC, SCN, ICE). KR2 normalization produces better correlation with Hi-C data, all normalization methods improve concordance with FISH-detected distances.
Liu, Tong, and Zheng Wang. "NormGAM: An R Package to Remove Systematic Biases in Genome Architecture Mapping Data" BMC Genomics, (December 2019)
- multiHiCcompare - R/Bioconductor package for joint normalization of multiple Hi-C datasets using cyclic loess regression through pairs of MD plots (minus-distance). Data-driven normalization accounting for the between-dataset biases. Per-distance edgeR-based testing of significant interactions.
Stansfield, John C, Kellen G Cresswell, and Mikhail G Dozmorov. "MultiHiCcompare: Joint Normalization and Comparative Analysis of Complex Hi-C Experiments" Bioinformatics, January 22, 2019.
- Binless - a resolution-agnostic normalization method that adapts to the quality and quantity of available data, to detect significant interactions and differences. Negative binomial count regression framework, adapted for ICE normalization. Fused lasso to smooth neighboring signals. TADbit for data processing, details of read filtering.
- Spill, Yannick G., David Castillo, Enrique Vidal, and Marc A. Marti-Renom. "Binless Normalization of Hi-C Data Provides Significant Interaction and Difference Detection Independent of Resolution" Nature Communications 10, no. 1 (26 2019)
- HiCcompare - R/Bioconductor package for joint normalization of two Hi-C datasets using loess regression through an MD plot (minus-distance). Data-driven normalization accounting for the between-dataset biases. Per-distance permutation testing of significant interactions.
Stansfield, John C., Kellen G. Cresswell, Vladimir I. Vladimirov, and Mikhail G. Dozmorov. "HiCcompare: An R-Package for Joint Normalization and Comparison of HI-C Datasets" BMC Bioinformatics 19, no. 1 (December 2018).
- HiFive - handling and normalization or pre-aligned Hi-C and 5C data.
Sauria, Michael EG, Jennifer E. Phillips-Cremins, Victor G. Corces, and James Taylor. "HiFive: A Tool Suite for Easy and Efficient HiC and 5C Data Analysis" Genome Biology 16, no. 1 (December 2015). - HiFive - post-processing of aligned Hi-C and 5C data, three normalization approaches: "Binning" - model-based Yaffe & Tanay's method, "Express" - matrix-balancing approach, "Probability" - multiplicative probability model. Judging normalization quality by the correlation between matrices.
- HiCNorm - removing known biases in Hi-C data (GC content, mappability, fragment length) via Poisson regression.
Hu, Ming, Ke Deng, Siddarth Selvaraj, Zhaohui Qin, Bing Ren, and Jun S. Liu. "HiCNorm: Removing Biases in Hi-C Data via Poisson Regression" Bioinformatics (Oxford, England) 28, no. 23 (December 1, 2012) - Poisson normalization. Also tested negative binomial.
- Cancer-hic-norm - Hi-C data normalization considering CNVs. Extension of matrix-balancing algorithm to either retain the copy-number variation effect (LOIC) or remove them (CAIC). ICE itself can lead to misrepresentation of the contact probabilities between CNV regions. Estimating CNV directly from Hi-C data correcting for GC content, mappability, fragment length using Poisson regression. LOIC - the sum of contacts for a given genomic bin is proportional to CNV. CAIC - raw interaction counts are the product of a CNV bias matrix and the expected contact counts at a given genomic distance. Data. LOIC and CAIC methods are implemented in the iced Python package.
Servant, Nicolas, Nelle Varoquaux, Edith Heard, Emmanuel Barillot, and Jean-Philippe Vert. "Effective Normalization for Copy Number Variation in Hi-C Data" BMC Bioinformatics 19, no. 1 (September 6, 2018)
- OneD - CNV bias-correction method, addresses the problem of partial aneuploidy. Bin-centric counts are modeled using the negative binomial distribution, and its parameters are estimated using splines. A hidden Markov model is fit to infer the copy number for each bin. Each Hi-C matrix entry is corrected by dividing its value by the square root of the product of CNVs for the corresponding bins. Reproducibility score (eigenvector decomposition and comparison) to measure improvement in the similarity between replicated Hi-C data.
Vidal, Enrique, François leDily, Javier Quilez, Ralph Stadhouders, Yasmina Cuartero, Thomas Graf, Marc A Marti-Renom, Miguel Beato, and Guillaume J Filion. "OneD: Increasing Reproducibility of Hi-C Samples with Abnormal Karyotypes" Nucleic Acids Research, January 31, 2018.
- HiCapp - Iterative correction-based caICB method. Method to adjust for the copy number variants in Hi-C data. Loess-like idea - we converted the problem of removing the biases across chromosomes to the problem of minimizing the differences across the count-distance curves of different chromosomes. Our method assumes equal representation of genomic locus pairs with similar genomic distances located on different chromosomes if there were no bias in the Hi-C maps.
Wu, Hua-Jun, and Franziska Michor. "A Computational Strategy to Adjust for Copy Number in Tumor Hi-C Data" Bioinformatics (Oxford, England) 32, no. 24 (December 15, 2016)
- ENT3C - Similarity for Hi-C and micro-C matrices by comparing the complexity of patterns contained in shifting smaller nxn submatrices along their diagonals. Based on the von Neumann entropy of Pearson correlation matrices (log-transformed submatrices converted to correlation matrices). Input: .cool files (instructionf for processing from .bam files using pairtools and cooler). Matlab and Julia scripts running on a configuration file. Output: chromosome-specific similarity measures and per window von Neumann entropy. Can be used to compare regions transitioning between high and low entropy states. Performance similar to HiC-Spector, QuASARRep, GenomeDISCO, HiCRep (3DChromatin_ReplicateQC wrapper). Similarity is chromosomal dependent.
Lainscsek, Xenia, and Leila Taher. “ENT3C: An Entropy-Based Similarity Measure for Hi-C and Micro-C Derived Contact Matrices.” Preprint. Bioinformatics, February 1, 2024.
- HPRep - reproducibility measure for HiChIP and PLAC-seq data. HiCrep-inspired. Reorganize data into anchor-centric interaction bins, normalize (fragment length, GC content, mappability, log2 obs/exp) smooth, stratify by distance (concatenate bins with the same distance from anchors). Considering "AND" (bin-pairs), "XOR" (one anchor bin), "NOT" (no interactions, ignored) bin pairs. Distance metric - weighted Pearson correlation (pairs of columns) stratified by distance. Compared with HiCRep, HiC-Spector, and naive Pearson on mouse H3K4me3 PLAC-seq data (brain and mESCs), human H3K37ac HiChIP data from GM12878 and K562, human H3K4me3 PLAC-seq brain data. HPRep shows higher similarity for replicates and more differentiation between cell lines, robust to downsampling. Nearly same results can be achieved analysing one chromosome (for speed).
Rosen, Jonathan D, Yuchen Yang, Armen Abnousi, Jiawen Chen, Michael Song, Yin Shen, Ming Hu, and Yun Li. "HPRep: Quantifying Reproducibility in HiChIP and PLAC-Seq Datasets" Current issues in molecular biology, 17 September 2021
- - a fast Python implementation of stratum-adjusted correlation coefficient metric for measuring similarity between Hi-C datasets (HiCrep method, originally in R). Can be used for MDS. Evaluated on 90 datasets from 4D Nucleome. More than 20 times faster on a single CPU. Results are the same as R implementation.
Lin, Dejun, Justin Sanders, and William Stafford Noble. "HiCRep.Py: Fast Comparison of Hi-C Contact Matrices in Python" Bioinformatics, February 12, 2021
- IDR2D - Irreproducible Discovery Rate that identifies replicable interactions in ChIP-PET, HiChIP, and Hi-C data. Includes the original 1D IDR version ( Resolves multiple pairwise interactions.
Krismer, Konstantin, Yuchun Guo, and David K Gifford. "IDR2D Identifies Reproducible Genomic Interactions" Nucleic Acids Research, 06 April 2020
- 3DChromatin_ReplicateQC - Comparison of four Hi-C reproducibility assessment tools, HiCRep, GenomeDISCO, HiC-Spector, QuASAR-Rep. Tested the effects of noise, sparsity, resolution. Spearman doesn't work well. All tools performed similarly, worsening expectedly. QuASAR has a QC tool measuring the level of noise.
Yardimci, Galip, Hakan Ozadam, Michael E.G. Sauria, Oana Ursu, Koon-Kiu Yan, Tao Yang, Abhijit Chakraborty, et al. "Measuring the Reproducibility and Quality of Hi-C Data]( Genome Biology, March 19, 2019
- localtadsim - Analysis of TAD similarity using a variation of information (VI) metric as a local distance measure. 23 human Hi-C datasets, Hi-C Pro processed into 100kb matrices, Armatus to call TADs. Defining structurally similar and variable regions. Comparison with previous studies of genomic similarity. Cancer-normal comparison - regions containing pan-cancer genes are structurally conserved in normal-normal pairs, not in cancer-cancer.
- Sauerwald, Natalie, and Carl Kingsford. "Quantifying the Similarity of Topological Domains across Normal and Cancer Human Cell Types" Bioinformatics (Oxford, England) 34, no. 13 (July 1, 2018)
- QuASAR - Hi-C quality and reproducibility measure using spatial consistency between local and regional signals. Finds the maximum useful resolution by comparing quality and replicate scores of replicates. Part of the HiFive pipeline.
Sauria, Michael EG, and James Taylor. "QuASAR: Quality Assessment of Spatial Arrangement Reproducibility in Hi-C Data" BioRxiv, November 14, 2017.
- HiCRep - Similarity assessment using generalized Cochran-Mantel-Haenzel statistics M2. Spearman/Pearson doesn't work. 2-step procedure: Smooth the matrix, then CMH statistics. Basically, splitting data by distance chunks, Pearson on each chunk, summarize. Simple and well-thought stats. Methods: Hi-C datasets with replicates, including 11 ENCODE datasets. R package, and Python implementation
Yang, Tao, Feipeng Zhang, Galip Gurkan Yardimci, Ross C Hardison, William Stafford Noble, Feng Yue, and Qunhua Li. "HiCRep: Assessing the Reproducibility of Hi-C Data Using a Stratum-Adjusted Correlation Coefficient]( Genome Research, August 30, 2017
- HiC-Spector - reproducibility metric to quantify the similarity between contact maps using spectral decomposition. Decomposing Laplacian matrices and sum the Euclidean distance between eigenvectors.
Yan, Koon-Kiu, Galip Gürkan Yardimci, Chengfei Yan, William S. Noble, and Mark Gerstein. "HiC-Spector: A Matrix Library for Spectral and Reproducibility Analysis of Hi-C Contact Maps" Bioinformatics (Oxford, England) 33, no. 14 (July 15, 2017)
- Pentad - average A/B compartment analysis, within pre-defined range of genomic distances. A/B compartment areas from the observed-to-expected matrix rescaled using bilinear interpolation into squares of a predefined size, median averaged, plotted. Small (noisy) areas are filtered. Separate analysis of cis and trans interactions. Python implementation. Input: cooler Hi-C matrix and a compartment signal in the bedGraph format.
Magnitov, Mikhail D., Azat K. Garaev, Alexander V. Tyakht, Sergey V. Ulianov, and Sergey V. Razin. “Pentad: A Tool for Distance-Dependent Analysis of Hi-C Interactions within and between Chromatin Compartments.” BMC Bioinformatics 23, no. 1 (December 2022): 116.
- Calder - multi-scale A/B compartment detection. A complete hierarchy of compartment subdomains. Computes whole-chromosome contact similarities, cluster domains using a divisive hierarchical clustering, reorder dendrograms, estimate the likelihood of nested subdomains. Nested domain boundaries are likely associated with loops or TADs. Outperforms adapted k-means, robust to noise. Applied to over 100 cell lines, characterization of domain repositioning.
Liu, Yuanlong, Luca Nanni, Stephanie Sungalee, Marie Zufferey, Daniele Tavernari, Marco Mina, Stefano Ceri, Elisa Oricchio, and Giovanni Ciriello. “Systematic Inference and Comparison of Multi-Scale Chromatin Sub-Compartments Connects Spatial Organization to Cell Phenotypes.” Nature Communications 12, no. 1 (May 10, 2021): 2439.
- POSSUM - A/B compartment detection method in super-resolution Hi-C matrices. PCA of Sparse SUper Massive Matrices, Calculating eigenvectors for sparse matrices using power method (Figure 1, Methods). New GM12878 data at 500kb resolution (42 billion read pairs, 33 billion contacts). Genes can span compartments, but gene promoters almost exclusively (95%) are located in A compartments. Distinguishing loops formed by extrusion and non-extrusion mechanisms (SIP, HiCCUPS, Fit-Hi-C for detection), high resolution of Hi-C data is important. Applied to other datasets, organisms. A part of the Juicer pipeline Eigenvector, C++ POSSUM code on Jordan Rowley's lab GitHub. Other tools: HiCSampler, HiCNoiseMeasurer. Tweet1 by Doug Phanstiel, Tweet2 by Jordan Rowley.
- Gu, Huiya, Hannah Harris, Moshe Olshansky, Kiana Mohajeri, Yossi Eliaz, Sungjae Kim, Akshay Krishna, et al. "Fine-Mapping of Nuclear Compartments Using Ultra-Deep Hi-C Shows That Active Promoter and Enhancer Elements Localize in the Active A Compartment Even When Adjacent Sequences Do Not" Preprint. Genomics, October 3, 2021.
- Calder - multi-scale compartment and sub-compartment detection, improvement over dichotomous AB compartment detection. Clustering contact similarities (Fisher's z-transformed correlations) into high intra and low inter-region similarities, followed by a divisive hierarchical clustering within each domain. The likelihood of nested sub-domains can be estimated using a mixture log-normal distribution. Detailed methods, complex. Eight subcompartments, 4 within the A and 4 within the B compartment, balanced set, in contrast to SNIPER. Expected associations with active/inactive genomic annotations. Nested compartments may be associated with TADs/loops. Analysis of domain repositioning across 114 cell lines. 40kb resolution. R package, named after Alexander Calder, an American sculptor. Supplementary Data 1 - IDs and links to Hi-C, ChIP-seq, and RNA-seq datasets; Data 2 - hg19 BED files of Complete domain hierarchies inferred by CALDER from 127 Hi-C contact maps; Data 7 - coordinates of Repositioned compartment domains between normal and cancer cell lines derived from breast, prostate, and pancreatic tissue samples.
- Liu, Yuanlong, et al. "Systematic Inference and Comparison of Multi-Scale Chromatin Sub-Compartments Connects Spatial Organization to Cell Phenotypes" Nature Communication, 10 May 2021
- dcHiC - differential A/B compartment analysis of Hi-C data. Uses Multiple Factor Analysis (MFA), and extension of PCA which combines Hi-C maps before performing generalized PCA. Analogous to weighted PCA in which every dataset is normalized for its biases (Methods). Multivariate distance measure to estimate statistical significance of compartment differences. Applied to mouse neuronal differentiation, mouse hematopoietic system, human cell Hi-C data. Gene enrichment analysis shows biologically relevant signal. Input - sparse matrix, hic, cool files.
Wang, Jeffrey, Abhijit Chakraborty, and Ferhat Ay. "DcHiC: Differential Compartment Analysis of Hi-C Datasets" BioRxiv, January 1, 2021
- SNIPER - 3D subcompartment (A1, A2, B1, B2, B3) identification from low-coverage Hi-C datasets. A neural network based on a denoising autoencoder (9 layers) and a multi-layer perceptron. Sigmoidal activation of inputs, ReLU, softmax on outputs. Dropout, binary cross-entropy. exp(-1/C) transformation of Hi-C matrices. Applied to Gm12878 and 8 additional cell types to compare subcompartment changes. Compared with Rao2014 annotations, outperforms Gaussian HMM and MEGABASE.
Xiong, Kyle, and Jian Ma. "Revealing Hi-C Subcompartments by Imputing High-Resolution Inter-Chromosomal Chromatin Interactions" Nature Communications, 07 November 2019
- CScoreTool - AB compartment detection, fast and memory-efficient C++ tool, operates on data with low sequencing depth (benchmarked against HOMER). In contrast to PCA, uses a log-likelihood function, MLE for parameter estimation. C-scores can be directly compared.
Zheng, Xiaobin, and Yixian Zheng. “CscoreTool: Fast Hi-C Compartment Analysis at High Resolution.” Edited by John Hancock. Bioinformatics 34, no. 9 (May 1, 2018): 1568–70.
- Eigenvector - Juicer's native tool. The eigenvector can be used to delineate compartments in Hi-C data at coarse resolution; the sign of the eigenvector typically indicates the compartment. The eigenvector is the first principal component of the Pearson's matrix.
- Review of chromatin loop calling tools. Intro about loop formation (loop extrusion model), Hi-C data biases (GC content, mappability, fragment length, distance decay). Table 1 - loop calling tools for Hi-C, by year (model, language, application, etc.), Table 2 - loop calling for ChIA-PET, Table 3 - for HiChIP, Table 4 - for capture Hi-C. Table 5 - multiway interactions. Brief description of each method.
Liu, Li, Kaiyuan Han, Huimin Sun, Lu Han, Dong Gao, Qilemuge Xi, Lirong Zhang, and Hao Lin. “A Comprehensive Review of Bioinformatics Tools for Chromatin Loop Calling.” Briefings in Bioinformatics 24, no. 2 (March 19, 2023): bbad072.
- RefHiC - reference Hi-C data-guided TAD/loop detection (annotation). An attention-based deep learning frameworkthat determines which of the reference samples (4D Nucleome) are most relevant, and then makes a prediction based on the combined study sample and attention-weighted reference samples. Two components - a network combining the study sample and the reference panel and predicting loop points or left/right TAD boundary scores based on the local contact submatrix, and a task-specific component selecting one representative TAD/loop boundary (an encoder for dimensionality reduction, an attention module, a task-specific perceptron). Outperforms other tools (Mustache, Chromosight, HiCCUPS, Peakachu, RobusTAD and 13 TAD callers) across different cell types, species, and sequencing depths, using experimental ChIA-PET on CTCF, RAD21 and HiChIP on SMC1, H3K27ac. Scripts to reproduce the paper. Python.
Zhang, Yanlin, and Mathieu Blanchette. “Reference Panel Guided Topological Structure Annotation of Hi-C Data.” Nature Communications 13, no. 1 (December 2, 2022): 7426.
- LASCA - loop/significant contact caller that uses Weibull distribution-based modeling to each diagonal. DBSCAN to cluster adjacent significant pixels. Works with Hi-C data from any species, tested on human, C. Elegans, S. Cerevisiae. Filters according Aggregate Peak Analysis patterns may be used to refine calls. Compared with HiCCUPS, MUSTACHE, demonstrates good overlap. Also identifies non-CTCF-driven loops. Input - .cool files. Python code.
Luzhin, Artem V., Arkadiy K. Golov, Alexey A. Gavrilov, Artem K. Velichko, Sergey V. Ulianov, Sergey V. Razin, and Omar L. Kantidze. "LASCA: Loop and Significant Contact Annotation Pipeline" Scientific Reports, (December 2021)
- ZipHi-C - a Bayesian framework based on a Hidden Markov Random Field model to detect significant interactions and experimental biases in Hi-C data. Predecessors - HMRFBayesHi-C, FastHiC. Borrows information from neighboring loci. Tested on simulated and experimental data, less false positives than FastHiC, Juicer, HiCExplorer. Detailed stats methods.
Osuntoki, Itunu G., Andrew P. Harrison, Hongsheng Dai, Yanchun Bao, and Nicolae Radu Zabet. "ZipHiC: a novel Bayesian framework to identify enriched interactions and experimental biases in Hi-C data" bioRxiv (October 20, 2021).
- cLoops2 - improved pipeline for analysing Hi-TrAC/TrAC data. Peak/loop calling, differentially enriched loops, annotation, resolution estimation, similarity, aggregation/visualization. Improved blockDBSCAN clustering algorithm. Outperforms MACS2, SICER, HOMER, SEACR.
Cao, Yaqiang, Shuai Liu, Gang Ren, Qingsong Tang, and Keji Zhao. "CLoops2: A Full-Stack Comprehensive Analytical Tool for Chromatin Interactions" BioRxiv, 2021.
- NeoLoopFinder - detecting chromatin interactions induced by all kinds of structural variants (SVs). Input - a Hi-C contact matrix and a list of SV breakpoints. Output - genome-wide CNV profile, CNV segments, local assembly around SVs (graph-based algorithm), corrected Hi-C matrix for newly assembled regions and normalized for CNV effect and allelic effect, chromatin loops in rearranged regions (Peakachu), enhancer-hijacking events (needs H3K27ac data). CNVs are detected by HMM-based segmentation module. Includes visualization module. Neo-loop detection in 50 cancer Hi-C datasets from cell lines and patient samples (17 cancer types). Cancer-specific neoloops, associated genes, epigenomic enrichments. Methods - DI + HMM. Video, 20m.
Wang, Xiaotao, Jie Xu, Baozhen Zhang, Ye Hou, Fan Song, Huijue Lyu, and Feng Yue. "Genome-Wide Detection of Enhancer-Hijacking Events from Chromatin Interaction Data in Rearranged Genomes" Nature Methods, (June 2021)
- HiC-ACT - improved chromatin loop detection considering spatial dependency (especially at high 5-10kb resolution). Aggregated Cauchy Test (ACT) based approach accounting for possible correlations between adjacent loci pairs from high-resolution Hi-C data. Combine a set of p-values, T statistics following Cauchy distribution under arbitrary dependence structure. Need the local smoothing bandwidth size. Post-processing of results from loop callers that assume independence among loci. Input - bin-pair identifiers and the corresponding p-values. Tested on GM12878 and mESC data. The improvement in power is most pronounced in low-depth (downsampled) data. Fast, implemented in R. Documentation.
Lagler, Taylor M., Armen Abnousi, Ming Hu, Yuchen Yang, and Yun Li. "HiC-ACT: Improved Detection of Chromatin Interactions from Hi-C Data via Aggregated Cauchy Test" The American Journal of Human Genetics, (February 2021)
- Chromosight - python implemetnation of loop and pattern detection, computer vision-based (borders, FIREs, hairpins, and centromeres) in Hi-C maps. Takes in a single, whole-genome contact map, text-based bedGraph2d, and binary cool formats, ICE-normalizes. Sliding window, pattern detection using Pearson correlation with the template, then series of filters. Output - text-based. Outperforms HiCexplorer, HICCUPS, HOMER, cooltools, in the order of decreasing F1. Tested on synthetic Hi-C data mimicking S. cerevisiae genome, benchmark data at Zenodo.
Matthey-Doret, Cyril, Lyam Baudry, Axel Breuer, Rémi Montagne, Nadège Guiglielmoni, Vittore Scolari, Etienne Jean, et al. "Computer Vision for Pattern Detection in Chromosome Contact Maps" Nature Communications, (December 2020)
- Peakachu - loop prediction from Hi-C data using random forest on loop-specific pixel intensities within 11x11 window. ChIA-PET and HiChIP provide positive training examples. H3K27ac HiChIP better predicts short-range interactions, CTCF ChIA-PET is better for longer interactions. 10Kb resolution data, Gm12878 and K562 cell line. Excels for short-range interactions. Detects more loops than Fit-Hi-C, HiCCUPS, with good overlap. FDR estimated on auxin-treated vs. untreated HCT-116 cells, about 0.2%. Model trained using data from Hi-C performs well in other technologies, Micro-C, DNA SPRITE. Robust to sequencing depth. MCC to select best model. 3-fold cross-validation. Balanced training, same number of negative examples (with short and long distances between interacting loci). Predicted loops for 56 cell/tissue types.
Salameh, Tarik J., Xiaotao Wang, Fan Song, Bo Zhang, Sage M. Wright, Chachrit Khunsriraksakul, Yijun Ruan, and Feng Yue. "A Supervised Learning Framework for Chromatin Loop Detection in Genome-Wide Contact Maps" Nature Communications, (December 2020)
- FIREcaller - an R package to call frequently interacting regions from Hi-C data, as well as clustered super-FIREs. Normalization using HiCNormCis to regress out systematic biases. Converts normalized cis-interactions into Z-scores, calculates one-sided p-values and classifies bins as FIRE/nonFIRE. Also outputs continuous FIREscore (-ln(p-value)). FIREs are tissue-specific, can distinguish samples. Associated with H3K27ac and H3K4me3 signal.
Crowley, Cheynna, Yuchen Yang, Yunjiang Qiu, Benxia Hu, Jakub Lipi, Hyejung Won, Bing Ren, Ming Hu, and Yun Li. "FIREcaller: Detecting Frequently Interacting Regions from Hi-C Data" October 26, 2020, 11.
- LOOPbit - loop detection guided by CTCF-CTCF topology classification. Interaction profiles of all CTCF-CTCF pairs are projected using self-organized feature maps (SOFM, NeuPy), embedded using UMAP, clustered using HDBSCAN (10 clusters with strong-to-weak interaction patterns), and characterized by their epigenetic signatures (15 states aggregated into four). 10 clusters correspond to enrichment gradient from active to inactive chromatin states. These SOFM clusters serve as an input to a CNN. Trained CNN detects de novo chromatin loops from 9x9 Hi-C submatrices. Similar reproducibility as other loop callers, but epigenomically interpretable. Gm12878 data processed with TADbit, MetaWaffle. Input - normalized Hi-C matrix in 3-column format. Python implementation,
Galan, Silvia, François Serra, and Marc A. Marti-Renom. “Identification of Chromatin Loops from Hi-C Interaction Matrices by CTCF-CTCF Topology Classification.” Preprint. Bioinformatics, July 22, 2020.
- SIP - loop caller using image analysis. Regional maxima-based, peaks called in a sliding window. Distance-normalized Hi-C matrices, image adjusted using Gaussian blur, contrast enhancement, White Top-Hat correction, identified peaks then filtered by peak enrichment, empirical FDR, loop decay. Comparison with HiCCUPS and cLoops callers. Robust to noise, sequencing depth, much faster, good agreement, improved detection rate. SIPMeta - average metaplots of loops on bias-corrected images for better representation. Java implementation, works with .hic and .cool files.
Rowley, M. Jordan, Axel Poulet, Michael H. Nichols, Brianna J. Bixler, Adrian L. Sanborn, Elizabeth A. Brouhard, Karen Hermetz, et al. "Analysis of Hi-C Data Using SIP Effectively Identifies Loops in Organisms from C. Elegans to Mammals" Genome Research 30, no. 3 (March 2020)
- HiCExplorer's hicDetectLoops for loop detection - Review and critique of HiCCUPS, HOMER, GOTHIC, cLoops, FastHiC. Distance-dependent of chromatin interactions with a continuous negative binomial distribution, detection of the interaction counts with p-values smaller than a threshold, then filtering. Documentation
Wolff, Joachim, Rolf Backofen, and Björn Grüning. "Loop Detection Using Hi-C Data with HiCExplorer" Preprint. Bioinformatics, March 6, 2020
- Mustache - loop detection from Hi-C and Micro-C maps. Scale-space theory, detection of blob-shaped objects in a multi-scale representation of contact maps, Gaussian kernels with increasing scales. Differences of adjacent Gaussians guide the search for local maxima. Series of filtering steps to minimize false positives. Corrected for multiple testing p-values of blobs. Applied to Gm12878 and K562 Hi-C data, and HFFc6 cell line Micro-C data, 5kb resolution. Compared with HiCCUPS, SIP comparison added, detects similar and more loops flanked by convergent CTCF, RAD21, SMC3, loops confirmed by ChIA-PET and HiChIP data. Python3 tool, Conda/Docker wrapped, handles .hic/.cool files. Tweet.
Ardakany, Abbas Roayaei. "Mustache: Multi-Scale Detection of Chromatin Loops from Hi-C and Micro-C Maps Using Scale-Space Representation" Genome Biology, February 24, 2020
- FitHiC2 - protocol to install/run FitHiC Python3 tool/scripts. Fit of non-increasing cubic splines to distance-interaction frequency decay to identify significant interactions in individual matrices. Accounts for biases derived from KR (ICE, or other) normalization (HiCKRy). Works with fixed-bin- or restriction cut site resolution data. Overview of FitHiC algorithm, accounting for biases. Flexible input options, from HiC-Pro, Juicer, and other tools, validPairs file format. Post-processing to prioritize highly significant interactions supported by the nearby loci, and filter noisy detections. HTML report, flexible BED-derived output format, conversion to formats for WashU epigenome and UCSC browsers. Installable using conda, pip, GitHub. Comparable methods - HiCCUPS, HOMER, GOTHiC, HiC-DC, a brief description of each. Tested on three datasets.Executable on Code Ocean, Data.
Kaul, Arya, Sourya Bhattacharyya, and Ferhat Ay. "Identifying Statistically Significant Chromatin Contacts from Hi-C Data with FitHiC2" Nature Protocols, January 24, 2020.
- - Pile-up (aggregation, averaging) analysis of Hi-C data (.cool format) for visualizing and identifying chromatin loops from several sparse datasets, e.g., single-cell. Visualization using script. Scripts for the paper.
Flyamer, Ilya M., Robert S. Illingworth, and Wendy A. Bickmore. "Coolpup.Py - a Versatile Tool to Perform Pile-up Analysis of Hi-C Data" BioRxiv, January 1, 2019
- cLoops - DBSCAN-based algorithm for the detection of chromatin loops in ChIA-PET, Hi-C, HiChIP, Trac-looping data. Local permutation-based estimation of statistical significance, several tests for enrichment over the background. Outperforms diffHiC, Fit-Hi-C, GOTHiC, HiCCUPS, HOMER.
Cao, Yaqiang, Xingwei Chen, Daosheng Ai, Zhaoxiong Chen, Guoyu Chen, Joseph McDermott, Yi Huang, and Jing-Dong J. Han. "Accurate Loop Calling for 3D Genomic Data with CLoops" November 8, 2018.
- FitHiChIP - significant peak caller in HiChIP and PLAC-seq data. Accounts for assay-specific biases, as well as for the distance effect. 3D differential loops detection. Methods.
Bhattacharyya, Sourya, Vivek Chandra, Pandurangan Vijayanand, and Ferhat Ay. "FitHiChIP: Identification of Significant Chromatin Contacts from HiChIP Data" September 10, 2018.
- HiC-DC - significant interaction detection using the zero-inflated negative binomial model and accounting for biases like GC content, mappability. Compared with Fit-Hi-C, more conservative. Robust to sequencing depth. Detects significant, biologically relevant interactions at all length scales, including sub-TADs. BWA-MEM alignment (Python script), then processing in R.
Carty, Mark, Lee Zamparo, Merve Sahin, Alvaro González, Raphael Pelossof, Olivier Elemento, and Christina S. Leslie. "An Integrated Model for Detecting Significant Chromatin Interactions from High-Resolution Hi-C Data" Nature Communications 8, no. 1 (August 2017)
- GOTHiC - R package for peak calling in individual HiC datasets, while accounting for noise. Bioconductor package
Mifsud, Borbala, Inigo Martincorena, Elodie Darbo, Robert Sugar, Stefan Schoenfelder, Peter Fraser, and Nicholas M. Luscombe. "GOTHiC, a Probabilistic Model to Resolve Complex Biases and to Identify Real Interactions in Hi-C Data" Edited by Mark Isalan. PLOS ONE 12, no. 4 (April 5, 2017) - The GOTHiC (genome organization through HiC) algorithm uses a simple binomial distribution model to simultaneously remove coverage-associated biases in Hi-C data and detect significant interactions by assuming that the global background interaction frequency of two loci. Use of the Benjamini–Hochberg multiple-testing correction to control for the false discovery rate.
- FastHiC - hidden Markov random field (HMRF)-based peak caller, fast and well-performing.
Xu, Zheng, Guosheng Zhang, Cong Wu, Yun Li, and Ming Hu. "FastHiC: A Fast and Accurate Algorithm to Detect Long-Range Chromosomal Interactions from Hi-C Data" Bioinformatics (Oxford, England) 32, no. 17 (01 2016)
- HMRFBayesHiC - a hidden Markov random field-based Bayesian peak caller to identify long-range chromatin interactions from Hi-C data. Borrowing information from neighboring loci. Previous peak calling methods, Fit-Hi-C. Interactions between enhancers and promoters as a benchmark.
Xu, Zheng, Guosheng Zhang, Fulai Jin, Mengjie Chen, Terrence S. Furey, Patrick F. Sullivan, Zhaohui Qin, Ming Hu, and Yun Li. "A Hidden Markov Random Field-Based Bayesian Method for the Detection of Long-Range Chromosomal Interactions in Hi-C Data" Bioinformatics (Oxford, England) 32, no. 5 (01 2016)
- Fit-Hi-C - Python tool for detection of significant chromatin interactions.
Ay, Ferhat, Timothy L. Bailey, and William Stafford Noble. "Statistical Confidence Estimation for Hi-C Data Reveals Regulatory Chromatin Contacts" Genome Research 24, no. 6 (June 2014) - Fit-Hi-C method, Splines to model distance dependence. Model mid-range interaction frequencies, decay with distance. Biases, normalization methods. Two-step splines - use all dots for the first fit, identify and remove outliers, second fit without outliers. Markers of boundaries - insulators, heterochromatin, pluripotent factors. CNVs are enriched in chromatin boundaries. Replication timing data how-to Validation Hi-C data.
HiCCUPS - chromatin loop detection, local maxima detection in Hi-C images. GPU and CPU implementations. Described in Section VI.a of the Extended Experimental Procedures of Rao, Huntley et al. Cell 2014
HiCPeaks - Python CPU-based implementation for BH-FDR and HICCUPS, two peak calling algorithms for Hi-C data, proposed by Rao et al. 2014. Text-to-cooler Hi-C data converter, two scripts to call peaks, and one for visualization (creation of a .png file). Pypi repo
HOMER - Perl scripts for normalization, visualization, significant interaction detection, motif discovery. Does not correct for bias.
- Benchmarking of 11 methods for Hi-C map comparisons (Figure 1 outlines comparison goals and method classification). Basic methods (mean square error (MSE), Pearson/Spearman/Stratum-adjusted correlation, Structural similarity index), Map-informed methods (Eigenvector differences, directionality index, insulation differences, contact probability decay, triangle profiles), Feature-informed methods (cooltools TAD calling, HiCCUPS loop calling, overlap of functional genomics data). MSE and Spearman report different similarity properties (MSE is sensitive to intensity and Spearman is not). Directionality index performs best in identifying focal changes, like the loss of loops, while contact decay, eigenvector and insulation differences prioritize global changes. Basic methods are sensitive to technical variations (noise, resolution). Tested on experimental (ESC and HFF cells) and 22,500 in silico generated (unperturbed 1Mb sequences and with random 100bp structural variations, CTCF motif insertions/deletions, maps predicted with Akita) contact maps. The "Guidelines" section, Table 1, Supplementary Table 1 and Supplementary text - method description, strengths, weaknesses, and suggested applications of comparison methods. GitHub - data, Python code for all methods, Jupyter notebooks for all analyses. See also HiC1Dmetrics
Gunsalus, Laura M, Evonne McArthur, Ketrin Gjoni, Shuzhen Kuang, John A Capra, and Katherine S Pollard. “Comparing Chromatin Contact Maps at Scale: Methods and Insights,” April 04, 2023,
- DiffHiChIP - differential chromatin interaction detection from HiChIP and similar 3C data. Uses DESeq2 and edgeR. Settings include test selection (GLM, quasi-likelihood F test, likelihood ration test), complete (A) or a subset (F) of contact map for background estimation. Additional methods include independent hypothesis weighting (IHW), distance stratification for modeling distance decay. Tested on five datasets, used matching Hi-C and other data, aggregate peak analysis. HiChIP contacts have much higher dispersion than RNA-seq and ChIP-seq. IHW correction of p-values captures significant long-distance loops (BH is conservative, distance correction alone performs worse), DESeq2 is affected by the number of replicates, edgeR with GLM performs best.
Bhattacharyya, Sourya, Daniela Salgado Figueroa, Katia Georgopoulos, and Ferhat Ay. "DiffHiChIP: Identifying differential chromatin contacts from HiChIP data." bioRxiv (2025): 2025-01.
- slHiC - a contrastive self-supervised representation learning framework (similar data pairs are close in the latent space) for modeling the multi-level features of chromosome conformation and producing feature embeddings for genomic loci and their interactions to facilitate comparative analyses of Hi-C contact maps. Tested on reproducibility (GenomeDISCO, HiCRep, HiC-Spector), differential loci/interaction detection (multiHiCcompare, Selfish). Simulated data (generateSimulation from FIND), controlled changes (considering distance to true changes), robust to sequencing depth, low fold changes, high resolution, improved biological relevance of the detected changes.
Li, Han, Xuan He, Lawrence Kurowski, Ruotian Zhang, Dan Zhao, and Jianyang Zeng. “Improving Comparative Analyses of Hi-C Data via Contrastive Self-Supervised Learning.” Briefings in Bioinformatics 24, no. 4 (July 20, 2023): bbad193.
- StripeDiff - differential stripe analysis. Intro about TADs/loops, stripes (observed at TAD boundaries, can be explained by loop extrusion), frequently interacting regions (FIREs, observed near the centers of TADs). Methods for stripe detection n single sample, fold change is typical. Definition of vertical/horizontal domains of a stripe window (by abrupt change around the window, Figure 1). Chi-square test for differences, AUROC for performance evaluation on simulated data (SimuStripe method) and manually annotated experimental data. Outperforms edgeR at various noise levels, stripes of different lengths, window sizes, sequencing depth, normalization strategies. Better detects stripes in single samples as compared to Zebra, StripeNN. Differential stripes are associated with CTCF binding changes, changes in chromatin state, gene expression, define cell lineage. R code and data to reproduce the paper, Python and R implementation.
Gupta, Krishan, Guangyu Wang, Shuo Zhang, Xinlei Gao, Rongbin Zheng, Yanchun Zhang, Qingshu Meng, Lili Zhang, Qi Cao, and Kaifu Chen. “StripeDiff: Model-Based Algorithm for Differential Analysis of Chromatin Stripe.” Science Advances 8, no. 49 (December 9, 2022): eabk2246.
- NE-MVNMF - combines network enhancement (network denoising) and multitask non-negative matrix factorization for comparing multiple Hi-C matrices and idenrtifying dynamic (changing) regions. MVNMF - joint decomposition to find a common underlying structure in multiple matrices, clustering regions, comparing cluster assignments, identifying stretches of 5 or more regions switching clusters (dynamic regions). Applied to rat mammary epithelial cells, two time points (within and outside the window of susceptibility (WOS) to breast cancer, week 6 and 12). Arima Genomics, triplicates (6 samples total), merged into two, 10kb, HiC-Pro processing, ICE normalization, differential point interactions as intersection of Selfish and Fit-Hi-C. Integrated with gene expression (RNA-seq) and published breast cancer SNPs. GEO GSE184285.
Baur, Brittany, Da-Inn Lee, Jill Haag, Deborah Chasman, Michael Gould, and Sushmita Roy. “Deciphering the Role of 3D Genome Organization in Breast Cancer Susceptibility.” Frontiers in Genetics 12 (January 11, 2022): 788318.
- HICDCPlus - an R/Bioconductor package for Hi-C/Hi-ChIP interaction calling (directly from raw data, negative binomial regression accounting for genomic distance,GC content, mappability, restriction enzyme-based bin size) and differential analysis (DESeq2). Includes TAD (TopDom) and A/B compartment callers. Input - HiC-Pro or Juicer. Output compatible with visualization in Juicebox and HiCExplorer. Compared with diffHiC, multiHiCcompare, Selfish - provides better results. Normalization by ChIP-seq input may not be helpful. BitBucket.
Sahin, Merve, Wilfred Wong, Yingqian Zhan, Kinsey Van Deynze, Richard Koche, and Christina S. Leslie. "HiC-DC+ enables systematic 3D interaction calls and differential analysis for Hi-C and HiChIP" Nature communications, 07 June 2021
- BART3D - transcriptional regulators associated with differential chromatin interactions from Hi-C data. Input: HiC-Pro matrices, .hic Juicer files, .cool files. Output - ranked lists of transcriptional regulators. Distance-based normalization to average of individual matrices. Difference detection by a paired t-test of normalized interactions within 200kb (Figure 1A). Differential interactions are mapped to the union of DNAseI hypersensitive sites, then standard BART algorithm. Python implementation.
Wang, Zhenjia, Yifan Zhang, and Chongzhi Zang. "BART3D: Inferring Transcriptional Regulators Associated with Differential Chromatin InteracTions from Hi-C Data" Bioinformatics, 15 March 2021
- HiC1Dmetrics - Python3, command-line and web-based tool for analyzing various types of 1D metrics for linerizing Hi-C data, identifying interesting 3D features (mostly TAD boundaries), comparing matrices. Review of 1D metrics, such as PC1, directionality index (DI), insulation score (IS), separation score (SS), contrast index (CI), distal-to-local ratio (DLR) and inter-chromosomal fraction of interactions (Figure 1, Methods). Newly developed intra-TAD score (IAS) and inter-TAD score (IES) (detection of stripes, asymmetric 5'/3' stripe TADs, loope, others), adjusted interaction frequency (IF) (detection of hubs, among others), directional relative frequency (DRF) for two-sample comparison, detects directional TADs (5'/3' dTADs) depicting an asymmetric event of inter-TAD interactions. New metrics can be compared between matrices, used for clustering, examples on experimental data. Novel observations, e.g., the upstream and downstream proximal TAD regions tend to exhibit opposite trends in gene expression. Inclusion of IS, PC1, and IF in ChromHMM identifies additional "active promoters on boundaries" state. Hi-C data preprocessing - Juicer, KR normalization. Documentation.
Wang, Jiankang, and Ryuichiro Nakato. “HiC1Dmetrics: Framework to Extract Various One-Dimensional Features from Chromosome Structure Data.” Briefings in Bioinformatics, December 1, 2021, bbab509.
- CHESS (Comparison of Hi-C Experiments using Structural Similarity) - comparative analysis of Hi-C matrices and automatic feature extraction (TADs, loops, stripes). Image analysis-based structural similarity index (SSIM, combines brightness, contrast, and structure differences, S = 1 - identical matrices, <1 - differences) to assign similarity score and an associated p-value (from empirical distribution of SSIMs, two types of bacground model, Methods) to pairs of genomic regions. Obs/exp transformation, differential matrix, denoise, smooth, binarize, feature extraction using close morphology filter, k-means clustering, classification. Works with low sequencing depth, high noise data. Outperforms diffHiC, HOMER, and ACCOST. Applied to interspecies comparison of syntenic regions (Synteny portal), WT and Zelda-depleted Drosophila, C-cell lymphoma, Capture-C analysis. Input - Juicer, Cooler, of FAN-C format, plus .bedpe for regions to compare (chess pairs to generate). Python, scikit-image module. Documentation.
Galan, Silvia, Nick Machnik, Kai Kruse, Noelia Díaz, Marc A. Marti-Renom, and Juan M. Vaquerizas. "CHESS Enables Quantitative Comparison of Chromatin Contact Data and Automatic Feature Extraction" Nature Genetics, October 19, 2020
- DiffGR - differentially interacting genomic regions. Stratum-adjusted correlation coefficient (SCC) (HiCrep-inspired) to measure similarity of local TAD regions. Focus on within-TAD interactions. Simulated data at various levels of sparsity, noise, HiCseg for TAD calling. 2D mean filter for smoothing, KR normalization. Permutation test to estimate the significance of SCC changes. FDR depends on the proportion of altered TADs. R implementation.
Liu, Huiling, and Wenxiu Ma. "DiffGR: Detecting Differentially Interacting Genomic Regions from Hi-C Contact Maps" bioRxiv, August 31, 2020
- Serpentine - differential analysis of two Hi-C maps using the 2D serpentine-binning method. Serpentine is a subset of connected pixels defined by thresholds in control and experimental contact maps. Serpentines are then compared using the Mean-Deviation plot. Help to alleviate the effect of sparsity. Uses HiCcompare functionality. Normalization does not help. Python package, currently processes full 1500x1500 matrices. Documentation
Baudry, Lyam, Gaël A Millot, Agnes Thierry, Romain Koszul, and Vittore F Scolari. "Serpentine: A Flexible 2D Binning Method for Differential Hi-C Analysis" Edited by Alfonso Valencia. Bioinformatics 36, no. 12 (June 1, 2020)
- ACCOST (Altered Chromatin COnformation STatirstics) - distance-aware differential Hi-C analysis. Extends the statistical model of DEseq by using the size factors to model the genomic distance effect. Use of the MD plot. Compare with diffHiC, FIND, and HiCcompare. Evaluated on human, mouse, plasmodium Hi-C data.
- Cook, Kate B., Borislav H. Hristov, Karine G. Le Roch, Jean Philippe Vert, and William Stafford Noble. "Measuring significant changes in chromatin conformation with ACCOST" Nucleic acids research, (18 March 2020)
- Selfish - comparative analysis of replicate Hi-C experiments via a self-similarity measure - local similarity borrowed from image comparison. Check reproducibility, detect differential interactions. Boolean representation of contact matrices for reproducibility quantification. Deconvoluting local interactions with a Gaussian filter (putting a Gaussian bell around a pixel), then comparing derivatives between contact maps for each radius. Simulated (Zhou method) and real comparison with FIND - better performance, especially on low fold-changes. Stronger enrichment of relevant epigenomic features. Matlab implementation.
Roayaei Ardakany, Abbas, Ferhat Ay, and Stefano Lonardi. "Selfish: Discovery of Differential Chromatin Interactions via a Self-Similarity Measure" Bioinformatics, July 2019
- multiHiCcompare - R/Bioconductor package for joint normalization of multiple Hi-C datasets using cyclic loess regression through pairs of MD plots (minus-distance). Data-driven normalization accounting for the between-dataset biases. Per-distance edgeR-based testing of significant interactions.
Stansfield, John C, Kellen G Cresswell, and Mikhail G Dozmorov. "MultiHiCcompare: Joint Normalization and Comparative Analysis of Complex Hi-C Experiments" Bioinformatics, January 22, 2019
- Chicdiff - differential interaction detection in Capture Hi-C data. Signal normalization based on the CHiCAGO framework, differential testing using DESeq2. Accounting for distance effect by the Independent Hypothesis Testing (IHW) method to learn p-value weights based on the distance to maximize the number of rejected null hypotheses.
Cairns, Jonathan, William R. Orchard, Valeriya Malysheva, and Mikhail Spivakov. "Chicdiff: A Computational Pipeline for Detecting Differential Chromosomal Interactions in Capture Hi-C Data" BioRxiv, January 1, 2019
- HiCcompare - R/Bioconductor package for joint normalization of two Hi-C datasets using loess regression through an MD plot (minus-distance). Data-driven normalization accounting for the between-dataset biases. Per-distance permutation testing of significant interactions.
Stansfield, John C., Kellen G. Cresswell, Vladimir I. Vladimirov, and Mikhail G. Dozmorov. "HiCcompare: An R-Package for Joint Normalization and Comparison of HI-C Datasets" BMC Bioinformatics 19, no. 1 (December 2018)
- FIND - differential chromatin interaction detection comparing the local spatial dependency between interacting loci. Previous strategies - simple fold-change comparisons, binomial model (HOMER), count-based edgeR. FIND exploits a spatial Poisson process model to detect differential chromatin interactions that show a significant change in their interaction frequency and the interaction frequency of their adjacent bins. "Variogram" concept. For each point, compare densities between conditions using Fisher's test. Explored various multiple correction testing methods, used r^th ordered p-values (rOP) method. Benchmarking against edgeR in simulated settings - FIND outperforms at shorter distances, edgeR has more false positives at longer distances. Real Hi-C data normalized using KR and MA normalizations. R package.
Mohamed Nadhir, Djekidel, Yang Chen, and Michael Q. Zhang. “FIND: DifFerential Chromatin INteractions Detection Using a Spatial Poisson Process.” Genome Research, February 12, 2018.
- diffloop - Differential analysis of chromatin loops (ChIA-PET). edgeR framework.
Lareau, Caleb A., and Martin J. Aryee. "Diffloop: A Computational Framework for Identifying and Analyzing Differential DNA Loops from Sequencing Data" Bioinformatics (Oxford, England), September 29, 2017.
- AP - aggregation preference - parameter, to quantify TAD heterogeneity. Call significant interactions within a TAD, cluster with DBSCAN, calculate weighted interaction density within each cluster, average. AP measures are reproducible. Comparison of TADs in Gm12878 and IMR90 - stable TADs change their aggregation preference, these changes correlate with LINEs, Lamin B1 signal. Can detect structural changes (block split) in TADs.
Wang, X.-T., Dong, P.-F., Zhang, H.-Y., and Peng, C. (2015). "Structural heterogeneity and functional diversity of topologically associating domains in mammalian genomes.](" Nucleic Acids Research
- diffHiC - Differential contacts using the full pipeline for Hi-C data. Explanation of the technology, binning. MA normalization, edgeR-based. Comparison with HOMER. Documentation.
Lun, Aaron T. L., and Gordon K. Smyth. "DiffHic: A Bioconductor Package to Detect Differential Genomic Interactions in Hi-C Data" BMC Bioinformatics 16 (2015)
HiCCUPS Diff - differential loop analysis. Input - two .hic files and loop lists; output - lists of differential loops.
Meltron - a statistical framework to detect differences in chromatin contact density at genomic regions of interest
- GILoop - a dual-branch neural network for CTCF-mediated loop calling. Integrates the image view and graph view of KR-normalized Hi-C data. A U-net branch for image processing and a graph convolutional network (GCN) that extracts edge-wise information. Each branch is trained separately and then branches are fused via a bilinear pooling operation and fine tuned. Graph representation incorporated DNA k-mer information (320-dimensional feature space). Patch sampling strategy. Detects loops with more stringent CTCF relevance than Peakachu. Highly robust to sequencing depth. Model trained on one cell line can be transferred to another cell line. ChIA-PET for ground truth.
Wang, Fuzhou, Tingxiao Gao, Jiecong Lin, Zetian Zheng, Lei Huang, Muhammad Toseef, Xiangtao Li, and Ka-Chun Wong. “GILoop: Robust Chromatin Loop Calling across Multiple Sequencing Depths on Hi-C Data.” iScience 25, no. 12 (December 2022): 105535.
- TADfit - hierarchical TAD finder using a multivariate linear regression model. Fit the interaction frequencies in Hi-C contact matrices (with or without replicates) defined by TopDom boundaries. Finds partially overlapping TADs. Three steps algorithm: preparing TADs with TopDom, modeling the relationship between IFs and candidate hierarchical TADs, solving the model by FTRL (an online learning solver FTRL, Follow-The-Regularized-Leader, two hyperparameters). Tested on simulated and experimental data, ICE-normalized, outperforms (Jaccard, F1) TADtree, 3DNetMod, OnTAD, SpectralTAD, and TADpole. Considers overlapping TADs. R implementation.
Liu, Erhu, Hongqiang Lyu, Qinke Peng, Yuan Liu, Tian Wang, and Jiuqiang Han. “TADfit Is a Multivariate Linear Regression Model for Profiling Hierarchical Chromatin Domains on Replicate Hi-C Data.” Communications Biology 5, no. 1 (June 20, 2022): 608.
- SuperTLD - hierarchical TAD-like domain (TLD) detection from RNA-DNA associated interactions (RAIs, technologies like iMARGI, GRID-seq, ChAR-seq, RADICL-seq). Adapts SuperTAD by incorporating a Bayesian correction during dynamic programming and iterative inferring hierarchy. Incorporates imputation of missing interaction frequencies using a negative binomial model (inspired by SAVER). Applied on four RAI datasets, compares TLDs with TADs (Overlapping Ratio (OR) measuring similarity between two hierarchies, normalized mutual information (NMI) measuring similarity of two disjoint partitions (metrics from SuperTAD paper, four types of TAD relationships (matched, merged, split, shifted, adapted from SpectralTAD), also Pearson correlation and distance decay). TLDs and TADs are moderately similar. RAI-constructed similarity matrices and Hi-C maps are highly correlated, TADs and TLDs are moderately similar, more novel boundaries in TLDs. Common boundaries are associated with cytoplasm and cytosol terms, TLD boundaries - with different terms.
Zhang, Yu Wei, Lingxi Chen, and Shuai Cheng Li. "Detecting TAD-like domains from RNA-associated interactions." Nucleic Acids Research (25 May 2022).
- SuperTAD-Fast - hierarchical TAD caller, improves speed and memory usage of SuperTAD, a dynamic programming approach to find TADs with minimal structural entropy. An approximation algorithm using discretization method in the multinary encoding tree detecting algorithm of SuperTAD. Three parameters: h, the maximum height of the encoding tree; k, the maximum step number in the discretization model, and w, the size of the search window. Tested on experimental and simulated data, outperforms deDOC, SpectralTAD, GMAP, IC-Finder, CaTCH, 3DNetMod.
Ling, Zhao, Yu Wei Zhang, and Shuai Cheng Li. “SuperTAD-Fast: Accelerating Topologically Associating Domains Detection Through Discretization.” Journal of Computational Biology 31, no. 9 (September 1, 2024): 784–96.
- SuperTAD - hierarchical TAD callers, efficient algorithms for computing the coding tree of a contact map, structure information theory, dynamic programming, polynomial-time solvable. Compared with seven hierarchical TAD callers (but not SpectralTAD!), better epigenomic enrichment, agreement in calls from raw and KR matrices, better cross-resolution agreement by size, width. Command line, C++ implementation.
Zhang, Yu Wei, Meng Bo Wang, and Shuai Cheng Li. “SuperTAD: Robust Detection of Hierarchical Topologically Associated Domains with Optimized Structural Information.” Genome Biology 22, no. 1 (December 2021): 45.
- GRiNCH - TAD detection algorithm using Graph Regularized Nonnegative matrix factorization. Graph captures distance dependence. Also smoothes/imputes Hi-C matrices. Compared with Directionality Score, Insulation Index, rGMAP, Armatus, HiCSeg, TopDom using several metrics - Davies-Bouldin index, Delta Contact Counts, Rand Index, Mutual Information. GRiNCH detects consistent similarly-sized TADs, robust to different resolutions, boundaries show robust enrichment in known markers of TAD boundaries (TFBSs and histone), detects consistent Fit-Hi-C significant interactions (area under precision-recall curve). Applied to mouse neural development and pluripotency reprogramming to confirm known and discover new boundary regulators. Applied to SPRITE and HiChIP data. Visualization tutorial.
Lee, Da-Inn, and Sushmita Roy. "Graph-Regularized Matrix Factorization for Reliable Detection of Topological Units from High-Throughput Chromosome Conformation Capture Datasets" Genome Biology, 25 May 2021
- HiCKey - hierarchical TAD caller, comparison of TADs across samples. A generalized likelihood-ratio (GRL) test for detecting change-points in an interaction matrix that follows negative binomial distribution (Methods). Bottom-up approach to detect hierarchy. Tested on Forcato simulated data with nested TADs, TPR/FPR/difference/Fowlkes-Mallows index to estimate performance. Applied to seven cell lines. TAD hierarchy is up to four levels. Compared with TADtree, 3D-NetMod, IC-Finder, HiCSeg. Colocalization within a 2-bin distance. Input - normalized, distance effect removed matrix in sparse text format, output - TAD start coordinate, hierarchy level, p-value of the changepoint. C++ implementation. Did not compare with SpectralTAD hierarchical caller.
Xing, Haipeng. "Deciphering Hierarchical Organization of Topologically Associated Domains through Change-Point Testing" BMC Bioinformatics, April 10, 2021
- TADBD - TAD caller using a multi-scale Haar diagonal template (sum of on-diagonal squares minus the sum of off-diagonal squares). Compared with HiCDB, IC-Finder, EAST (also using Haar features), TopDom, HiCseg using simulated (Forcato) and experimental (K562 and IMR90) data. ICE-normalized data. MCC, Jaccard. R package.
Lyu, Hongqiang, Lin Li, Zhifang Wu, Tian Wang, Jiguang Zheng, and Hongda Wang. "TADBD: A Sensitive and Fast Method for Detection of Typologically Associated Domain Boundaries" BioTechniques, April 7, 2020
- BHi-Cect - identification of the full hierarchy of chromosomal interactions (TADs). Spectral clustering starting from the whole chromosome, detecting nested BHi-Cect Partition Trees (BPTs), partitioned in non-contiguous and interwoven enclaves, inspired by fractal globule idea. Variation of information to test the agreement between two clustering results, overlap-based metrics to test correspondence with TADs. Correspondence analysis of enclaves association with TF content. Gene enrichment. Different enclaves show different epigenomic and gene expression signatures, bottom enclaves are most crisply defined. Resolution affects what enclave size can be detected.
Kumar, Vipin, Simon Leclerc, and Yuichi Taniguchi. "BHi-Cect: A Top-down Algorithm for Identifying the Multi-Scale Hierarchical Structure of Chromosomes" Nucleic Acids Research 48, no. 5 (March 18, 2020)
- TADpole - hierarchical TAD boundary caller. Preprocessing by filtering sparse rows, transforming the matrix into its Pearson correlation coefficient matrix, running PCA on it and retaining 200 PCs, transforming into a Euclidean distance matrix, clustering using the Constrained Incremental Sums of Squares clustering (
rioja::chclust(, coniss)
), estimating significance, Calinski-Harabasz index to estimate the optimal number of clusters (chromatin subdivisions). Benchmarking using Zufferey 2018 datasets, mouse limb bud development with genomic inversions from Kraft 2019. Resolution, normalization, sequencing depth. Metrics: the Overlap Score, the Measure of Concordance, all from Zufferey 2018. Enrichment in epigenomic marks. DiffT metric for differential analysis (on binarized TAD/non-TAD matrices). Compared with 22 TAD callers, including hierarchical (CaTCH, rGMAP, Matryoshka, PSYCHIC).Paper
Soler-Vila, Paula, Pol Cuscó Pons, Irene Farabella, Marco Di Stefano, and Marc A. Marti-Renom. "Hierarchical Chromatin Organization Detected by TADpole" Preprint. Bioinformatics, July 11, 2019.
- HiCDB - TAD boundary detection using local relative insulation (LRI) metric, improved stability, less parameter tuning, cross-resolution, differential boundary detection, lower computations, visualization. Review of previous methods, directionality index, insulation score. Math of LRI. GSEA-like enrichment in genome annotations (CTCF). Differential boundary detection using the intersection of extended boundaries. Compared with Armatus, DomainCaller, HiCseg, IC-Finder, Insulation, TopDom on 40kb datasets. Accurately detects smaller-scale boundaries. Differential TADs are enriched in cell-type-specific genes.
Chen, Fengling, Guipeng Li, Michael Q. Zhang, and Yang Chen. "HiCDB: A Sensitive and Robust Method for Detecting Contact Domain Boundaries" Nucleic Acids Research 46, no. 21 (November 30, 2018)
- OnTAD - hierarchical TAD caller, Optimal Nested TAD caller. Sliding window, adaptive local minimum search algorithm, similar to TOPDOM. Other hierarchical callers - TADtree, rGMAP, Arrowhead, 3D-Net, IC-Finder. Performance comparison with DomainCaller, rGMAP, Arrowhead, TADtree. Stronger enrichment of CTCF and two cohesin proteins RAD21 and SMC3. C++ implementation. OnTAD for coolers - a Python wrapper to work with
An, Lin, Tao Yang, Jiahao Yang, Johannes Nuebler, Qunhua Li, and Yu Zhang. "Hierarchical Domain Structure Reveals the Divergence of Activity among TADs and Boundaries" July 3, 2018. - Intro about TADs, Dixon's directionality index, Insulation score. Other hierarchical callers - TADtree, rGMAP, Arrowhead, 3D-Net, IC-Finder. Limitations of current callers - ad hoc thresholds, sensitivity to sequencing depth and mapping resolution, long running time and large memory usage, insufficient performance evaluation. Boundaries are asymmetric - some have more contacts with other boundaries, support for asymmetric loop extrusion model. Performance comparison with DomainCaller, rGMAP, Arrowhead, TADtree. Stronger enrichment of CTCF and two cohesin proteins RAD21 and SMC3. TAD-adjR^2 metric quantifying the proportion of variance in the contact frequencies explained by TAD boundaries. Reproducibility of TAD boundaries - Jaccard index, tested at different sequencing depths and resolutions. Boundaries of hierarchical TADs are more active - more CTCF, epigenomic features, TFBSs expressed genes. Super-boundaries - shared by 5 or more TADs, highly active. Rao-Huntley 2014 GM12878 data. Distance correction - subtracting the mean counts at each distance.
- 3D-NetMod - hierarchical, nested, partially overlapping TAD detection using graph theory. Community detection method based on the maximization of network modularity, Louvain-like locally greedy algorithm, repeated several (20) times to avoid local maxima, then getting consensus. Tuning parameters are estimated over a sequence search. Benchmarked against TADtree, directionality index, Arrowhead. ICE-normalized data brain data from Geschwind (human data) and Jiang (mouse data) studies. Computationally intensive. Python implementation.
Norton, Heidi K., Daniel J. Emerson, Harvey Huang, Jesi Kim, Katelyn R. Titus, Shi Gu, Danielle S. Bassett, and Jennifer E. Phillips-Cremins. “Detecting Hierarchical Genome Folding with Network Modularity.” Nature Methods 15, no. 2 (February 2018): 119–22.
- deDoc - TAD detection minimizing structural entropy of the Hi-C graph (structural information theory). Detects optimal resolution (= minimal entropy). Pooled 10 single-cell Hi-C analysis. Intro about TADs, a brief description of TAD callers, including hierarchical. Works best on raw, non-normalized data, highly robust to sparsity (0.1% of the original data sufficient). Compared with five TAD callers (Armatus, TADtree, Arrowhead, MrTADFinder, DomainCaller (DI)), and a classical graph modularity detection algorithm. Enrichment in CTCF, housekeeping genes, H3K4me3, H4K20me1, H3K36me3. Other benchmarks - weighted similarity, number, length of TADs. Detects hierarchy over different passes. Java implementation (won't run on Mac).
Li, Angsheng, Xianchen Yin, Bingxiang Xu, Danyang Wang, Jimin Han, Yi Wei, Yun Deng, Ying Xiong, and Zhihua Zhang. “Decoding Topologically Associating Domains with Ultra-Low Resolution Hi-C Data by Graph Structural Entropy.” Nature Communications 9, no. 1 (15 2018): 3265.
- CaTCH - identification of hierarchical TAD structure. Reciprocal insulation (RI) index. Benchmarked against Dixon's TADs (diTADs). CTCF enrichment as a benchmark, enrichment of TADs in differentially expressed genes.
Zhan, Yinxiu, Luca Mariani, Iros Barozzi, Edda G. Schulz, Nils Blüthgen, Michael Stadler, Guido Tiana, and Luca Giorgetti. "Reciprocal Insulation Analysis of Hi-C Data Shows That TADs Represent a Functionally but Not Structurally Privileged Scale in the Hierarchical Folding of Chromosomes" Genome Research 27, no. 3 (2017)
- HiTAD - hierarchical TAD identification, different resolutions, correlation with chromosomal compartments, replication timing, gene expression. Adaptive directionality index approach. Data sources, methods for comparing TAD boundaries, reproducibility. H3K4me3 enriched and H3K4me1 depleted at boundaries. TAD boundaries (but not sub-TADs) separate replication timing, A/B compartments, gene expression.
Wang, Xiao-Tao, Wang Cui, and Cheng Peng. "HiTAD: Detecting the Structural and Functional Hierarchies of Topologically Associating Domains from Chromatin Interactions" Nucleic Acids Research 45, no. 19 (November 2, 2017)
- ClusterTAD - A clustering method for identifying topologically associated domains (TADs) from Hi-C data.
Oluwadare, Oluwatosin, and Jianlin Cheng. "ClusterTAD: An Unsupervised Machine Learning Approach to Detecting Topologically Associated Domains of Chromosomes from Hi-C Data" BMC Bioinformatics 18, no. 1 (November 14, 2017)
- IC-Finder - Segmentations of HiC maps into hierarchical interaction compartments. Code not available.
Noelle Haddad, Cedric Vaillant, Daniel Jost. "IC-Finder: inferring robustly the hierarchical organization of chromatin folding"
- EAST - Efficient and Accurate Detection of Topologically Associating Domains from Contact Maps, Haar-like features (rectangles on images) and a function that quantifies TAD properties: frequency within is high, outside - low, boundaries must be strong. Objective - finding a set of contiguous non-overlapping domains maximizing the function. Restricted by the maximum length of TADs. Boundaries are enriched in CTCF, RNP PolII, H3K4me3, H3K27ac.
Abbas Roayaei Ardakany, Stefano Lonardi, and Marc Herbstritt, "Efficient and Accurate Detection of Topologically Associating Domains from Contact Maps" (Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik GmbH, Wadern/Saarbruecken, Germany, 2017)
- lavaburst - extension of the Armatus algorithm introduced in Filippova et al., “Identification of Alternative Topological Domains in Chromatin”. Documentation.
Schwarzer, Wibke, Nezar Abdennur, Anton Goloborodko, Aleksandra Pekowska, Geoffrey Fudenberg, Yann Loe-Mie, Nuno A. Fonseca, et al. “Two Independent Modes of Chromatin Organization Revealed by Cohesin Removal.” Nature 551, no. 7678 (02 2017): 51–56.
- Arboretum-Hi-C - a multitask spectral clustering method to identify differences in genomic architecture. Intro about the 3D genome organization, TAD differences, and conservation. Assessment of different clustering approaches using different distance measures, as well as raw contacts. Judging clustering quality by enrichment in genomic regulatory signals (Histone marks, LADs, early vs. late replication timing, TFs like POLII, TAF, TBP, CTCF, P300, CMYC, cohesin components, LADs, replication timing, SINE, LINE, LTR) and by numerical methods (Davies-Bouldin index, silhouette score, others). Although spectral clustering on contact counts performed best, spectral + Spearman correlation was chosen. Comparing cell types identifies biologically relevant differences as quantified by enrichment. Peak counts or average signal within regions were used for enrichment. Data.
Fotuhi Siahpirani, Alireza, Ferhat Ay, and Sushmita Roy. "A Multi-Task Graph-Clustering Approach for Chromosome Conformation Capture Data Sets Identifies Conserved Modules of Chromosomal Interactions" Genome Biology 17, no. 1 (December 2016).
- TADtree - Hierarchical (nested) TAD identification. Two ways of TAD definition: 1D and 2D. Normalization by distance. Enrichment over the background. Documentation
Weinreb, Caleb, and Benjamin J. Raphael. "Identification of Hierarchical Chromatin Domains" Bioinformatics (Oxford, England) 32, no. 11 (June 1, 2016)
- TopDom - An efficient and Deterministic Method for identifying Topological Domains in Genomes, Method is based on the general observation that within-TAD interactions are stronger than between-TAD. binSignal value as the average of nearby contact frequency, fitting a curve, finding local minima, test them for significance. Fast, takes linear time. Detects similar domains to HiCseq and Dixon's directionality index. Found expected enrichment in CTCF, histone marks. Housekeeping genes and overall gene density are close to TAD boundaries, differentially expressed genes are not. R package.
Shin, Hanjun, Yi Shi, Chao Dai, Harianto Tjong, Ke Gong, Frank Alber, and Xianghong Jasmine Zhou. "TopDom: An Efficient and Deterministic Method for Identifying Topological Domains in Genomes" Nucleic Acids Research 44, no. 7 (April 20, 2016)
- TADtool - command-line wrapper for directionality index and insulation score TAD callers.
Kruse, Kai, Clemens B. Hug, Benjamín Hernández-Rodríguez, and Juan M. Vaquerizas. "TADtool: Visual Parameter Identification for TAD-Calling Algorithms" Bioinformatics (Oxford, England) 32, no. 20 (15 2016)
- Armatus - TAD detection at different resolutions. Dynamic programming method.
Filippova, Darya, Rob Patro, Geet Duggal, and Carl Kingsford. "Identification of Alternative Topological Domains in Chromatin" Algorithms for Molecular Biology 9, no. 1 (2014)
- HiCseg - TAD detection by maximization of likelihood based block-wise segmentation model. 2D segmentation rephrased as 1D segmentation - not contours, but borders. Statistical framework, solved with dynamic programming. Dixon data as gold standard. Hausdorff distance to compare segmentation quality. Parameters (from TopDom paper): nb_change_max = 500, distrib = 'G' and model = 'Dplus'.
Lévy-Leduc, Celine, M. Delattre, T. Mary-Huard, and S. Robin. "Two-Dimensional Segmentation for Analyzing Hi-C Data" Bioinformatics (Oxford, England) 30, no. 17 (September 1, 2014)
domaincaller - A Python implementation of the original DI domain caller.
Arrowhead - contact domain (TAD) detection using Arrowhead transformation. Described in Section IV.a of the Extended Experimental Procedures of Rao, Huntley et al. Cell 2014
- Benchmarking of 13 hierarchical TAD callers, normalization, resolution, sequencing depth, epigenomic features, tool usability (Rao 2014 data GSE63525,5kb-100kb resolution). Grouped into five strategies (Table 1): linear score (Arrowhead, Armatus, CaTCH, HiTAD, matryoshka, OnTAD, Multi-CD), clustering (IC-Finder, TADpole, BHi-Cect, SpectralTAD), network features (HBM, spectral, 3DNetMod, GRiNCH), structural entropy (deDoc, SuperTAD), statistical model (TADtree, GMAP, PSYCHIC, HiKey). Similarity among TADs by average linkage, grouped into four clusters. Generally, TADs are smaller than 2Mb. Resolution affects TAD callers, SpectralTAD the most reproducible. Introduction about TADs subdivided in sub-TADs, enrichment of TAD boundaries in CTCF, cohesin, active epigenomic marks of strong enhancers, mediators, transcription factors, TSSs, TTSs. Evidence that TAD hierarchy is different from stacking Hi-C heatmaps of heterogeneous cells. An air conditioner model making highly interacting TAD boundaries affecting gene expression analogous to the concentration of AC affecting temperature. Differential interactions between GM12878 and K562, chr7, 86-88Mb region, DMTF1, ADAM22, other genes. Supplementary figures and tables. Supplementary Data 2 - data references.
Xu, J., Xu, X., Huang, D. et al. A comprehensive benchmarking with interpretation and operational guidance for the hierarchy of topologically associating domains. Nat Commun 15, 4376 (2024).
- RefHiC - attention-based neural network to predict loops and TADs from local contact submatrices. Uses a reference panel of Hi-C datasets (30 for human, 20 for mouse), provided with the tool. Input - a Hi-C contact map, 5kb resolution. Trained on GM12878 data, using ChIA-PET, HiCHIP on CTCF, RAD21, SMC1, H3K27ac. Trained on RobusTAD (Python implementation) left and right TAD boundaries, predicts also left and right separately. Some interesting implementations include using focal loss for the loop model training, supervised contrastive learning for the encode pretraining. Outperforms Chromosight, Peakachu, HiCCUPS, Mustache loop callers and CaTCH, Grinch, deDocDomailcall, EAST, GMAP, Arrowhead, IC-Finder, HiTAD, TopDom, Armatus, OnTAD TAD callers across different cell types, species, sequencing depths. PyTorch, conda installable, CPU but training requires GPU.
Zhang, Y., Blanchette, M. Reference panel guided topological structure annotation of Hi-C data. Nat Commun 13, 7426 (2022).
- TADMaster - a tool/webserver to evaluate the concordance of results of TAD callers. Multiple comparison metrics (number, size, Measure of Concordance (MoC) considering tolerance boundary (flank)) Two run modes - one accepts BED files and compares them, another (TADMaster Plus) processes Hi-C matrices (hic, h5, cool, sparse, full matrices), normalizes them (VC, ICE, KR, more), detects TADs (DI, Insulation Score, HiCseq, SpectralTAD, 12 total), and compares them. Visualization, clustering, example output. Docker version, GitHub.
Higgins, Sean, Victor Akpokiro, Allen Westcott, and Oluwatosin Oluwadare. “TADMaster: A Comprehensive Web-Based Tool for the Analysis of Topologically Associated Domains.” BMC Bioinformatics 23, no. 1 (November 4, 2022): 463.
- DeepChrome - deep convolutional network for binary classification of gene expression (mediad cutoff for high/low) using five histone modification data as input (Roadmap Epigenomics). Visualization of predictive combinatorial histone patterns. H3K36me3, H3K4me1, H3K4me3 predicting high expression, H3K27me3, H3K9me3 - low expression. Outperforms linear regression, SVM, Random Forest. Bins of 100bp +/-5000bp around TSSs, 10 bin step sliding window, 50 filters.
Singh, Ritambhara, Jack Lanchantin, Gabriel Robins, and Yanjun Qi. “DeepChrome: Deep-Learning for Predicting Gene Expression from Histone Modifications.” Bioinformatics 32, no. 17 (September 1, 2016): i639–48.
Sefer, Emre. “A Comparison of Topologically Associating Domain Callers over Mammals at High Resolution.” BMC Bioinformatics 23, no. 1 (December 2022) - Benchmarking of 27 TAD callers (Table 1). Brief description of each caller. Classified into 3 categories: feature-based, clustering, graph-partitioning methods. Systematic evaluation of TAD number, size, enrichment/depletion in functional annotations, 8 metrics. Distinguish corner-dot loops and TADs withour corner dots and evaluate performance on detecting each. Using mESCs Micro-C (HiCNorm normalized), simulated data, testing resolution, sequencing depth. Feature-based callers generally perform better, but performance depends on the type of test. SpectralTAD performs well. Code for all tests at GitHub.
Brief description of 22 TAD calling methods. Source: Zufferey et al., “Comparison of Computational Methods for the Identification of Topologically Associating Domains.”
Zufferey, Marie, Daniele Tavernari, Elisa Oricchio, and Giovanni Ciriello. "Comparison of Computational Methods for the Identification of Topologically Associating Domains" Genome Biology 19, no. 1 (10 2018) - Comparison of 22 TAD callers across different conditions. Callers are classified as linear score-based, statistical model-based, clustering, graph theory. Table 1 and Additional file 1 summarizes each caller. The effect of data resolution, normalization, hierarchy. Test on Rao 2014 data, chromosome 6. ICE or LGF (local genomic feature) normalization. The measure of Concordance (MoC) to compare TADs. CTCF/cohesin as a measure of biological significance. TopDom, HiCseg, CaTCH, CHDF are the top performers. R scripts, including for calculation MoC
Dali, Rola, and Mathieu Blanchette. "A Critical Assessment of Topologically Associating Domain Prediction Tools" Nucleic Acids Research 45, no. 6 (April 7, 2017) - TAD definition, tools. Meta-TADs, hierarchy, overlapping TADs.HiCPlotter for visualization. Manual annotation as a gold standard. Sequencing depth and resolution affect things. Code, manual annotations
Forcato, Mattia, Chiara Nicoletti, Koustav Pal, Carmen Maria Livi, Francesco Ferrari, and Silvio Bicciato. "Comparison of Computational Methods for Hi-C Data Analysis" Nature Methods, June 12, 2017 - Hi-C processing and TAD calling tools benchmarking, Table 1, simulated (Lun and Smyth method) and real data. Notes about pluses and minuses of each tool. TAD reproducibility is higher than chromatin interactions, increases with a larger number of reads. Consistent enrichment of TAD boundaries in CTCF, irrespectively of TAD caller. Hi-C replication is poor, just a bit more than random. Supplementary table 2 - technical details about each program, Supplementary Note 1 - Hi-C preprocessing tools, Supplementary Note 2 - TAD callers. Supplementary note 3 - how to simulate Hi-C data. Supplementary note 6 - how to install tools. Tools for TAD comparison, and simulated matrices.
Olivares-Chauvet, Pedro, Zohar Mukamel, Aviezer Lifshitz, Omer Schwartzman, Noa Oded Elkayam, Yaniv Lubling, Gintaras Deikus, Robert P. Sebra, and Amos Tanay. "Capturing Pairwise and Multi-Way Chromosomal Conformations Using Chromosomal Walks" Nature 540, no. 7632 (November 30, 2016) - TADs organize chromosomal territories. Active and inactive TAD properties. Methods: Good mathematical description of insulation score calculations. Filter TADs smaller than 250kb. Inter-chromosomal contacts are rare, ~7-10%. Concatemers (more than two contacts) are unlikely.
Rocha, Pedro P., Ramya Raviram, Richard Bonneau, and Jane A. Skok. "Breaking TADs: Insights into Hierarchical Genome Organization" Epigenomics 7, no. 4 (2015) - Textbook overview of TADs in 3 pages with key references. 3D organization discovery using FISH, 3C, Hi-C. Discovery of A/B compartments (euchromatin, heterochromatin), TADs as regulatory units conserved even across syntenic regions in different organisms. TADs coordinate gene expression. TAD boundaries are not created equal. Examples of changes of TAD boundaries (Hox gene cluster, ES differentiation). Hierarchy of TADs.
Crane, Emily, Qian Bian, Rachel Patton McCord, Bryan R. Lajoie, Bayly S. Wheeler, Edward J. Ralston, Satoru Uzawa, Job Dekker, and Barbara J. Meyer. "Condensin-Driven Remodelling of X Chromosome Topology during Dosage Compensation" Nature 523, no. 7559 (July 9, 2015). - Insulation Score to define TADs - sliding square along the diagonal, aggregating signal within it. This aggregated score is normalized and binned into TADs, boundaries. See Methods and implementation., Parameters: -is 480000 -ids 320000 -im iqrMean -nt 0 -ss 160000 -yb 1.5 -nt 0 -bmoe 0.
"Hierarchical Regulatory Domain Inference from Hi-C Data" - presentation by Bartek Wilczyński about TAD detection, existing algorithms, new SHERPA and OPPA methods. Video, PDF, Web site, GitHub - SHERPA and OPPA code there.
gStripe - stripe calling from HiChIP data, graph-based, operates on sets of loops from other tools, outperforms Stripenn, Python implementation. Reference to the nf-hichip pipeline.
fontanka - a cooltools-based Python software for detection of extrusion foundains (aka jets, plumes, but not TADs, stripes) in Hi-C/Micro-C data. Fountains are hallmarks of chromosome organization, emerging upon zygotic genome activation (ZGA). Occur at active enhancers, as sites of cohesin loading, Pou5f3, Sox19b, and Nanog, but not CTCF. Detected in Zebrafish, Medaka, Xenopus, mouse ESCs. Details about potential mechanisms. Four-step algorithm, convolution-based pattern recognition, outperforms Chromosight, protractor.f Analysis scripts.
Galitsyna, Aleksandra, Sergey V. Ulianov, Nikolai S. Bykov, Marina Veil, Meijiang Gao, Kristina Perevoschikova, Mikhail Gelfand, Sergey V. Razin, Leonid Mirny, and Daria Onichtchouk. “Extrusion Fountains Are Hallmarks of Chromosome Organization Emerging upon Zygotic Genome Activation.” Preprint. Molecular Biology, July 15, 2023.
- Stripenn - a computer vision-based method for architectural stripes detection using Canny edge detection.Scores stripes by median p-value and stripiness based on the continuity of interaction signal. Input - .cool files, optionally normalized. Output - coordinates and scores of the predicted stripes. Applicable to Hi-C, HiChIP, Micro-C data. Introduction to the biology of architectural stripes, review of previous methods (Zebra from Vian et al. 2018, domainClassifyR, CHESS for comparing 3D domains and stripes). Analysis of stripes from B and T lymphocytes identifies stripe anchors enriched in the transcriptionally active compartments, architectural proteins mediating loop extrusion. Strips are strongly conserved, correspond to TAD boundaries, subtle changes are associated with transcriptional output. Python, three functions (
). Video 16m.Paper
Yoon, Sora, and Golnaz Vahedi. "Stripenn Detects Architectural Stripes from Chromatin Conformation Data Using Computer Vision" Preprint. Bioinformatics, April 18, 2021
- domainClassifyR - detection and classification of 'Stripes' structures at TAD boundaries. Associated with both poised and active chromatin landscapes and transcription is not a key determinant of their structure. Stripe formation is linked to the functional state of the cell through cohesin loading at lineage-specific enhancers and developmental control of CTCF binding site occupancy. Comparison of pluripotent mouse embryonic stem cells and lineage-committed neural cells and characterizing emergent, lost, and mainained stripes. Raw data at SRA668328.
Barrington, Christopher, Dimitra Georgopoulou, Dubravka Pezic, Wazeer Varsally, Javier Herrero, and Suzana Hadjur. "Enhancer accessibility and CTCF occupancy underlie asymmetric TAD architecture and cell type specific genome topology." Nature communications 10, no. 1 (2019): 1-14.
- StripeCaller - A toolkit for analyzing architectural stripes. Architectural stripes, created by extensive loading of cohesin near CTCF anchors, with Nipbl and Rad21 help. Little overlap between B cells and ESCs. Architectural stripes are sites for tumor-inducing TOP2beta DNA breaks. ATP is required for loop extrusion, cohesin translocation, but not required for maintenance, Replication of transcription is not important for loop extrusion. Zebra algorithm for detecting architectural stripes, image analysis, math in Methods. Human lymphoblastoid cells, mouse ESCs, mouse B-cells activated with LPS, CH12 B lymphoma cells, wild-type, treated with hydroxyurea (blocks DNA replication), flavopiridol (blocks transcription, PolII elongation), oligomycin (blocks ATP). Hi-C, ChIA-pet, ChIP-seq, ATAC-seq, and more Data1, Data2.
Vian, Laura, Aleksandra Pękowska, Suhas S.P. Rao, Kyong-Rim Kieffer-Kwon, Seolkyoung Jung, Laura Baranello, Su-Chen Huang, et al. "The Energetics and Physiological Impact of Cohesin Extrusion" Cell 173, no. 5 (May 2018)
TGIF - Tree-Guided Integrated Factorization, a generalizable multi-task non-negative matrix factorization (NMF) for differential compartment- and TAD boundary analysis (k=2 or k=8), including time course. Uses a novel regularization term to jointly factorize the matrices (constrain the matrix-specific latent factor V to be similar to the preceding and the following Vs), block coordinate descent. Outperforms TADcompare and TopDom, except on GM12878 (TADsplimer doesn't work). Simulated data followinf Forcato 2017. Applied to mouse neural differentiation (Bonev 2017, 3 time points) and cardiomyocyte differentiation (Zhang 2019, 6 time points), investigating the association of GWAS SNPs, gene expression, with TAD boundaries.
DiffDomain - differential TAD analysis. Uses random matrix theory (the Tracy–Widom distribution) to identify structurally reorganized TADs (six subtypes: strength-change, loss, split, merge, zoom, and complex). Outperforms other methods (TADCompare, HiCcompare, DiffGR, DiffTAD, HiC-DC+, TADsplimer) in terms of FPR, TPR, proportions and subtypes of reorganized TADs. Applicable to scHi-C (pseudobulk, 100 cells minimum, imputed with scHiCluster). Input - two Hi-C matrices and TADs called in the first (e.g., Arrowhead on KR-normalized matrix). Computes a log2 difference matrix from two KR-normalized matrices, then distance/depth-normalizes off-diagonals, normalizes by SQRT(N) - white noise, a generalized Wigner D matrix, the largest eigenvalue of which converges to 2, hypothesis testing is the eigenvalue is larger than 2.
Hua, Dunming, Ming Gu, Yanyi Du, Li Qi, Xiangjun Du, Zhidong Bai, Xiaopeng Zhu, and Dechao Tian. "DiffDomain enables identification of structurally reorganized topologically associating domains." In International Conference on Research in Computational Molecular Biology, pp. 302-303. Springer, Cham, 2022.
- InterTADs - TAD-centric integration of multi-omics (SNPs, gene expression, methylation) data by genomic coordinateы, proper data scaling, and differential analysis. Differential TAD analysis by interaction strength changes, the number of other differential events. Documentation.
- Tsagiopoulou, Maria, Nikolaos Pechlivanis, and Fotis Psomopoulos. "InterTADs: Integration of Multi-Omics Data on Topological Associated Domains" Preprint. In Review, August 12, 2020
- TADcompare - R package for differential and time-course TAD boundary analysis. Uses SpectralTAD score - spectral decomposition of Hi-C matrices - to statistically detect five types of differential TAD boundaries: merge, split, complex, shifted, strength change. In the time-course analysis, detects six types of boundary score changes: highly common, early appearing, late appearing, early disappearing, late disappearing, and dynamic TAD boundaries. Returns genomic coordinated and types of TAD boundary changes in BED format. Documentation, Bioconductor Package
Cresswell, Kellen G., and Mikhail G. Dozmorov. "TADCompare: An R Package for Differential and Temporal Analysis of Topologically Associated Domains" Frontiers in Genetics 11 (March 10, 2020)
- Analysis of the Structural Variability of Topologically Associated Domains as Revealed by Hi-C - TAD variability among 137 Hi-C samples (including replicates, 69 if not) from 9 studies. HiCrep, Jaccard, TADsim to measure similarity. Variability does not come from genetics. Introduction to TADs. 10-70% of TAD boundaries differ between replicates. 20-80% differ between biological conditions. Much less variation across individuals than across tissue types. Lab -specific source of variation - in situ vs. dilution ligation protocols, restriction enzymes not much. HiCpro to 100kb data, ICE-normalization, Armatus for TAD calling. Table 1 - all studies and accession numbers.
Sauerwald, Natalie, Akshat Singhal, and Carl Kingsford. "Analysis of the Structural Variability of Topologically Associated Domains as Revealed by Hi-C" NAR Genomics and Bioinformatics, 30 September 2019
- BPscore - metric to compare two TAD segmentations. Formula, methods. More stable to Variation of Information (VI) and Jaccard Index (JI). Python implementation for calculating all three metrics.
Zaborowski, Rafał, and Bartek Wilczyński. "BPscore: An Effective Metric for Meaningful Comparisons of Structural Chromosome Segmentations" Journal of Computational Biology 26, no. 4 (April 2019)
- Quantifying the Similarity of Topological Domains across Normal and Cancer Human Cell Types - Analysis of TAD similarity using variation of information (VI) metric as a local distance measure. Defining structurally similar and variable regions. Comparison with previous studies of genomic similarity. Cancer-normal comparison - regions containing pan-cancer genes are structurally conserved in normal-normal pairs, not in cancer-cancer. Kingsford-Group/localtadsim. 23 human Hi-C datasets, Hi-C Pro processed into 100kb matrices, Armatus to call TADs.
Sauerwald, Natalie, and Carl Kingsford. "Quantifying the Similarity of Topological Domains across Normal and Cancer Human Cell Types" Bioinformatics (Oxford, England), (July 1, 2018)
- DiffTAD - differential contact frequency in TADs between two conditions. Two - permutation-based comparing observed vs. expected median interactions, and parametric test considering the sign of the differences within TADs. Both tests account for distance stratum.
Zaborowski, Rafal, and Bartek Wilczynski. "DiffTAD: Detecting Differential Contact Frequency in Topologically Associating Domains Hi-C Experiments between Conditions" BioRxiv, January 1, 2016
- C. Origami - a deep neural network (Sequence encoder+Feature encoder->transformer->decoder, Methods) predict cell type-specific Hi-C matrices using DNA sequence (one-hot encoding), CTCF binding (ChIP-seq), and chromatin accessibility (ATAC-seq) profiles (all critical for best performance). Chromosome-wide predictions by joining predictions across sliding windows (2Mb). Performance evaluation - correlation of insulation scores, over 0.95 Pearson. Outperforms Akita. Enables prediction of the effect of genetic perturbations.
Tan, Jimin, Javier Rodriguez-Hernaez, Theodore Sakellaropoulos, Francesco Boccalatte, Iannis Aifantis, Jane Skok, David Fenyo, Bo Xia, and Aristotelis Tsirigos. “Cell Type-Specific Prediction of 3D Chromatin Architecture.” Preprint. Genomics, March 7, 2022.
- ChINN - chromatin interaction neural network, predicting chromatin interactions from DNA sequence. Trained on CTCF- and RNA PolII-mediated loops, as well as on Hi-C data. Gradient boosting trained on functional annotation, distance, or both as predictors. ChINN - CNN trained on sequence. Convergent CTCF orientation is an important predictor, other motifs complement predictive power. Applied to 6 new chronic lymphocytic leukemia samples, patient-specific interactions, vaildated by Hi-C and 4C.
Cao, Fan, Yu Zhang, Yichao Cai, Sambhavi Animesh, Ying Zhang, Semih Can Akincilar, Yan Ping Loh, et al. "Chromatin Interaction Neural Network (ChINN): A Machine Learning-Based Method for Predicting Chromatin Interactions from DNA Sequences" Genome Biology, (December 2021)
- DL2021_HI-C - deep-learning approach for increasing Hi-C data resolution by appending additional information about genome sequence. Two algorithms: the image-to-image model (modified after VEHiCLE), which enhances Hi-C resolution by itself, and the sequence-to-image model (modified after Akita), which uses additional information about the underlying genome sequence for further resolution improvement. Both models are combined with the simple head model (Figure 4). Details of network architecture, training, testing, validation (5 metrics). Various architecture modifications. VEHiCLE by itself performs well.
Kriukov, Dmitrii, Mark Zaretckii, Igor Kozlovskii, Mikhail Zybin, Nikita Koritskiy, Mariia Bazarevich, and Ekaterina Khrameeva. "Hi-C Resolution Enhancement with Genome Sequence Data" Preprint. Systems Biology, October 26, 2021.
- L-HiC-Reg (Local HiC-Reg) - a Random Forest based regression method to predict high-resolution contact counts in new cell lines, and a network-based framework to identify candidate cell line-specific gene networks targeted by a set of variants from a Genome-wide association study (GWAS). Trained on chromosome-specific 1Mb segments in one cell line to predict in another. 55 cell lines, 10 annotations (CTCF, RAD21, TBP, histone marks, DNAse), imputing missing annotations with Avocado. Outperforms GeneHancer and JEME, Networks for 15 GWASs in all cell lines are available.
Baur, Brittany, Jacob Schreiber, Junha Shin, Shilu Zhang, Yi Zhang, Jun S Song, William Stafford Noble, and Sushmita Roy. "Leveraging Epigenomes and Three-Dimensional Genome Organization for Interpreting Regulatory Variation" biorXiv, August 30, 2021
- CCIP (CTCF-mediated Chromatin Interaction Prediction) - predicting CTCF-mediated convergent and tandem loops with transitivity. Transitivity definition from the network of multiple CTCF-interacting regions, convergent and tandem. Incorporating the GCP (graph connecting probability) score, together with CTCF, RAD21, directional CTCF motif one-hot encoding into random forest. GCP is the most important predictive feature. Compared with Lollipop and CTCF-MP.
Wang, Weibing, Lin Gao, Yusen Ye, and Yong Gao. "CCIP: Predicting CTCF-Mediated Chromatin Loops with Transitivity" Bioinformatics, 20 July 2021.
- compartmap - A/B compartment reconstruction from bulk and scRNA-seq/scATAC-seq. Steps: preprocessing and summarizing the single-cell assay data; eBayes shrinkage of summarized features towards a global or local target; computing the shrinkage correlation estimate of summarized features; computing domains via SVD (Methods). Also detects genomic rearrangements. Applied to K562 data. Bioconductor, Tweet.
Johnson, Benjamin K, Jean-Philippe Fortin, Kasper D Hansen, Hui Shen, and Triche Jr. "Compartmap Enables Inference of Higher-Order Chromatin Structure in Individual Cells from ScRNA-Seq and ScATAC-Seq" preprint, May 18, 2021
- SCI - A/B sub-compartment prediction from Hi-C data, graph embedding (adaptation of LINE method, better than HOPE and DeepWalk) and k-means clustering. Five subcompartments, informed by eipgenetic modifications. Compared with compartments predicted by HMM (Rao 2014) and k-means clustering by Yaffe-Tanay 2011. Improved network centrality, clustering metrics, enrichment in relevant epigenomic marks, higher intra-loop vs. inter-loop ratio, coordinated gene expression, TF enrichment. XGBoost/RF (allowing feature selection), neural network to predict subcompartments from epigenomics, methylation, sequence-based data, replication timing. Predictions confirmed by ChIA-PET data. GM12878 and K562 data. Python implementation. sci-DNN, News.
Ashoor, Haitham, Xiaowen Chen, Wojciech Rosikiewicz, Jiahui Wang, Albert Cheng, Ping Wang, Yijun Ruan, and Sheng Li. “Graph Embedding and Unsupervised Learning Predict Genomic Sub-Compartments from HiC Chromatin Interaction Data.” Nature Communications 11, no. 1 (December 2020): 1173.
- Hi-ChIP-ML - machine learning models for TAD boundary prediction using epigenomic data (ChIP-seq, Histone-seq signal, 5 and 18 feature sets) in Drosophila. Linear regression with 4 regularization types, gradient boosting, bidirectional (to capture both sides around a boundary) LSTM. Methods describe data construction, loss function, model architectures, machine learning framework. One feature at a time to measure feature importance. Chriz factor is the top predictor, then H3K4me1, H3K27me1. Python, sklearn, keras. Colab notebook.
Rozenwald, Michal B, Aleksandra A Galitsyna, Grigory V Sapunov, Ekaterina E Khrameeva, and Mikhail S Gelfand. "A Machine Learning Framework for the Prediction of Chromatin Folding in Drosophila Using Epigenetic Features" PeerJ Comput Sci. 2020 Nov 30
- 3DPredictor - Web tool for prediction of enhnancer-promoter interactions from gene expression, CTCF binding and orientation, distance between interacting loci. Also predict changes in 3D genome organization - genomic rearrangements. Benchmarking of TargetFinder, its performance is overestimated. Class balance slightly improves performance. Two random chromosomes for validation, the rest - for training (10-90% split). Gradient boosting for prediction, significantly better than Random Forest. Multiple performance metrics. Model trained in one cell type can predict in another. Predict quantitative level of interactions. Other tools - EP2Vec, CTCF-MP, HiC-Reg. Web tool.
Belokopytova, Polina S., Miroslav A. Nuriddinov, Evgeniy A. Mozheiko, Daniil Fishman, and Veniamin Fishman. "Quantitative prediction of enhancer–promoter interactions." Genome research 30, no. 1 (2020): 72-84.
- Akita - Chromatin interaction prediction from sequence only using CNN. Takes in 1Mb sequence and predicts interactions at 2Kb resolution. Includes Distance between interacting regions included. Allows for understanding the effect of mutations. Tested on Rao Hi-C datasets, Micro-C data. Basenji network architecture. 80/10/10 training, validation, test sets.
Fudenberg, Geoff, David R Kelley, and Katherine S Pollard. "Predicting 3D Genome Folding from DNA Sequence" Nature Methods, 12 October 2020
- 3D-GNOME 2.0 - Web-based tool for modeling 3D structure changes due to structural variants (SVs). Reference data - GM12878 ChIA-PET data (CTCF, RNAPII). Changes are modeled by removing or adding contacts between chromatin interaction anchors depending on SV (deletion, duplication, insertion, inversion). Bitbucket.
Wlasnowolski, Michal, Michal Sadowski, Tymon Czarnota, Karolina Jodkowska, Przemyslaw Szalaj, Zhonghui Tang, Yijun Ruan, and Dariusz Plewczynski. "3D-GNOME 2.0: A Three-Dimensional Genome Modeling Engine for Predicting Structural Variation-Driven Alterations of Chromatin Spatial Structure in the Human Genome" Nucleic Acids Research, (July 2, 2020)
- HiC-Reg - Predicting Hi-C contact counts from one-dimensional regulatory signals (Histone marks, CTCF, RAD21, Tbp, DNAse). Random Forest regression. Feature encoding - distance between two regions, pair-concat, window, multi-cell. Works across chromosomes (some chromosomes are worse than others) and cell lines (Gm12878, K562, Huvec, Hmec, Nhek, can be used to predict interactions on new cell lines). Selection of the most important features using multi-task group LASSO (distance, CTCF, Tbp, H4K20me1, DNAse, others). Predicted contacts correspond well to the original contacts (distance-stratified Pearson correlation), define TADs similar to the originals (Jaccard), define significant contacts (Fit-Hi-C) more enriched in CTCF binding. Validated on HBA1 and PAPPA gene promoters. Hi-C normalization doesn't have much effect.
Zhang, Shilu, Deborah Chasman, Sara Knaack, and Sushmita Roy. "In Silico Prediction of High-Resolution Hi-C Interaction Matrices" Nature Communications 10, no. 1 (December 2019)
- TADBoundaryDectector - TAD boundary prediction from sequence only using deep learning models. 12 architectures tested, with three convolutional and an LSTM layer performed best. Methods, Implementation in Keras-TensorFlow. Model evaluation using different criteria, 96% accuracy reported. Deep learning outperforms feature-based models, among which Boosted Trees, Random Forest, elastic net logistic regression are the best performers. Data augmentation (aka feature engineering) by randomly shifting TAD boundary regions by some base pairs of length (0-100). Tested on Drosophila data. Github depreciated, code unavailable.
Henderson, John, Vi Ly, Shawn Olichwier, Pranik Chainani, Yu Liu, and Benjamin Soibam. "Accurate Prediction of Boundaries of High Resolution Topologically Associated Domains (TADs) in Fruit Flies Using Deep Learning" Nucleic Acids Research, May 3, 2019.
- 3DEpiLoop - prediction of 3D interactions from 1D epigenomic profiles using Random Forest trained on CTCF peaks (histone modifications are the most important predictors and TFBSs).
Al Bkhetan, Ziad, and Dariusz Plewczynski. "Three-Dimensional Epigenome Statistical Model: Genome-Wide Chromatin Looping Prediction" Scientific Reports 8, no. 1 (December 2018).
- PRISMR - a polymer-based method (strings and binders switch SBS polymer model) to model 3D chromatin folding, to predict enhancer-promoter contacts, and to model the effect of structural variations (deletions, duplications, inversions) on the 3D genome organization. Input - Hi-C data. Infers minimal factors that shape chromatin folding and its equilibrium under the laws of physics, without prior assumptions or tunable parameters. Simulated annealing Monte Carlo optimization procedure that minimizes the distance between the predicted polymer model and the input contact matrix under a Bayesian weighting factor to avoid overfitting. Tested on a EPHA4 locus associated with limb malformations, and more. Newly generated capture Hi-C of mouse limb buds at embryonic day 11.5, and human skin fibroblasts, GEO. Public murine CH12-LX cells and human IMR90 cells. Implementation - the LAMMPS package, code not available.
Bianco, Simona, Darío G. Lupiáñez, Andrea M. Chiariello, Carlo Annunziatella, Katerina Kraft, Robert Schöpflin, Lars Wittler, et al. "Polymer Physics Predicts the Effects of Structural Variants on Chromatin Architecture" Nature Genetics, (May 2018)
- Supervised Learning Method for Predicting Chromatin Boundary Associated Insulator Elements - Predicting TAD boundaries using training data and making new predictions. Bayesian network (BNFinder method), random forest vs. basic k-means clustering, ChromHMM, cdBEST. Using sequence k-mers and ChIP-seq data from modENCODE for prediction - CTCF ChIP-seq performs best. Used Boruta package for feature selection. The Bayesian network performs best.
Bednarz, Paweł, and Bartek Wilczyński. "Supervised Learning Method for Predicting Chromatin Boundary Associated Insulator Elements"( Journal of Bioinformatics and Computational Biology 12, no. 06 (December 2014)
- iRegNet3D - Integrated Regulatory Network 3D (iRegNet3D) is a high-resolution regulatory network comprised of interfaces of all known transcription factor (TF)-TF, TF-DNA interaction interfaces, as well as chromatin-chromatin interactions and topologically associating domain (TAD) information from different cell lines. Goal: SNP interpretation. Input: One or several SNPs, rsIDs, or genomic coordinates. Output: For one or two SNPs, on-screen information of their disease-related info, connection over TF-TF and chromatin interaction networks, and whether they interact in 3D and located within TADs. For multiple SNPs, the same info downloadable as text files.
Liang, Siqi, Nathaniel D. Tippens, Yaoda Zhou, Matthew Mort, Peter D. Stenson, David N. Cooper, and Haiyuan Yu. "IRegNet3D: Three-Dimensional Integrated Regulatory Network for the Genomic Analysis of Coding and Non-Coding Disease Mutations" Genome Biology 18, no. 1 (December 2017)
- HUGIn - tissue-specific Hi-C linear display of anchor position and around. Overlay gene expression and epigenomic data. Association of SNPs with genes based on Hi-C interactions. Tissue-specific.
Martin, Joshua S, Zheng Xu, Alex P Reiner, Karen L Mohlke, Patrick Sullivan, Bing Ren, Ming Hu, and Yun Li. "HUGIn: Hi-C Unifying Genomic Interrogator" Edited by Inanc Birol. Bioinformatics 33, no. 23 (December 1, 2017)
- 3DSNP - 3DSNP database integrating SNP epigenomic annotations with chromatin loops. Linear closest gene, 3D interacting gene, eQTL, 3D interacting SNP, chromatin states, TFBSs, conservation. For individual SNPs.
Lu, Yiming, Cheng Quan, Hebing Chen, Xiaochen Bo, and Chenggang Zhang. "3DSNP: A Database for Linking Human Noncoding SNPs to Their Three-Dimensional Interacting Genes" Nucleic Acids Research 45, no. D1 (January 4, 2017)
- EagleC - structural variant (SVs) detection at high resolution from Hi-C, HiChIP, ChIA-PET, single-cell Hi-C. Unique visual patterns of deletions, duplications, inversions in various read orientations, detected by CNN coupled with ensemble learning (50 models). Detects both intra- and inter-chromosomal SVs. Outperforms Hi-C breakfinder, HiCtrans, HiNT-TL in precision and recall, tested on three breast cancer cells/data (Hi-C and WGS), 26 additional cancer cell lines. Performs well at low sequencing depth, detects fusion genes. Captures SVs and fusion genes missed by WGS and Nanopore sequencing. SV breakpoints occur near TAD boundaries, preferentially form between A-A compartments. ICE-normalized matrices with distance effect removed, Gaussian filter, min-max scaling. Construction of ground truth calls, addressing class imbalance. Delly, smoowe for WGS analysis, sniffles, Picky, svim for nanopore analysis, Arriba for gene fusion detection, cooltools for A/B compartment analysis, phased by gene density, HiTAD for TAD detection. Supplementary material: Table S1 - A list of high-confidence SVs for training EagleC models. Table S2 - cancer Hi-C datasets. Table S3 - HiChIP/ChIA-PET datasets. Table S4 - Breakpoint coordinates. Table S5 - Hi-C datasets in normal cell lines or tissues. The list of cancer-related genes was obtained from the Bushman Lab. Code and documentation at Zenodo.
Wang, Xiaotao, Yu Luan, and Feng Yue. “EagleC: A Deep-Learning Framework for Detecting a Full Range of Structural Variations from Bulk and Single-Cell Contact Maps.” Science Advances 8, no. 24 (June 17, 2022): eabn9215.
- hicpipe - Alternatively, HiCnorm. Normalization preserves CNVs in Hi-C data.
Yang, L., Chen, F., Zhu, H., Chen, Y., Dong, B., Shi, M., ... & Zhang, M. Q. (2020). 3D Genome Analysis Identifies Enhancer Hijacking Mechanism for High-Risk Factors in Human T-Lineage Acute Lymphoblastic Leukemia. bioRxiv.
- HiNT - CNV and translocation detection from ~10-20% ambiguous chimeric reads in Hi-C data. Three tools: HiNT-Pre - preprocessing of Hi-C data; HiNT-CNV and HiNT-TL - CNV and translocation detection, respectively (accept HiC-Pro output). Tested on K562 (cancer) and Gm12878 (normal) data. Removal of known biases using a GAM with Poisson function. Outperforms Delly, Meerkat, hic_breakfinder, HiCtrans. Relatively little overlap with CNVs from WGS (BIC-seq2). Gold-standard - FISH data from Dixon et al., “Integrative Detection and Analysis of Structural Variation in Cancer Genomes.” Reference and background data
Wang, Su, Soohyun Lee, Chong Chu, Dhawal Jain, Geoff Nelson, Jennifer M. Walsh, Burak H. Alver, and Peter J. Park. "HiNT: A Computational Method for Detecting Copy Number Variations and Translocations from Hi-C Data" Preprint. Bioinformatics, June 3, 2019
- hic_breakfinder - Detection of structural variants (SV) by integrating optical mapping, Hi-C, and WGS. Custom pipeline using LUMPY, Delly, Control-FREEC software. New Hi-C data on 14 cancer cell lines and 21 previously published datasets. Integration of the detected SVs with genomic annotations, including replication timing. Supplementary data with SVs resolved by individual methods and integrative approaches.
Dixon, Jesse R., Jie Xu, Vishnu Dileep, Ye Zhan, Fan Song, Victoria T. Le, Galip Gürkan Yardımcı, et al. "Integrative Detection and Analysis of Structural Variation in Cancer Genomes" Nature Genetics, September 10, 2018.
- TAD fusion score - quantifying the effect of deletions on Hi-C interactions. Intro about TAD fusion effect on genome structure. TAD fusion score - the expected total number of changes in pairwise genomic interactions as a result of the deletion. TAD fusion events are negatively selected.
Huynh, Linh, and Fereydoun Hormozdiari. "Contribution of Structural Variation to Genome Structure: TAD Fusion Discovery and Ranking" BioRxiv, March 9, 2018.
- HiCnv, HiCTrans - CNV, translocation calling from Hi-C data. CNV calling using HMM on per-restriction site quantified data and 1D-normalized accounting for low GC-content (<0.2), mappability (<0.5). Translocation calling on inter-chromosomal matrices, binned.
Chakraborty, Abhijit, and Ferhat Ay. "Identification of Copy Number Variations and Translocations in Cancer Cells from Hi-C Data" Edited by Christina Curtis. Bioinformatics 34, no. 2 (January 15, 2018)
- Arima-SV-Pipeline - Docker/Singularity image for running Structural Variant Detection with Arima Hi-C Technology. Includes several tools (HiCUP, hic_breakfinder, Juicer, HiCCUPS).
- HiCrayon - An R package to integrate/visualize 1D (bigWig, bedGraph) and 3D (Hi-C) maps. Coloring of interactions depending on the associated 1D signal, non-existing interactions are not colored. A/B compartment analysis reveals the potential additional state. Docker image, web version for visualizing ENCODE-hosted data.
Ben Nolan, Hannah L. Harris, Achyuth Kalluchi, Timothy E. Reznicek, Christopher T. Cummings, and M. Jordan Rowley. “HiCrayon Reveals Distinct Layers of Multi-State 3D Chromatin Organization.” bioRxiv, January 1, 2024, 2024.02.11.579821.
- HiBrowser - interactive Hi-C data visualization, heatmaps (2D and horizontal triangles), supports multiple samples with different reference genomes, superimposing and comparing heatmaps, multi-omics tracks, 3D model display. Local deployment ising nginx. Input - .hic files, as well as FASTA, txt annotation files, loops, domains, bedGraphs, bigWigs.
Li, Pingjing, Hong Liu, Jialiang Sun, Jianguo Lu, and Jian Liu. “HiBrowser: An Interactive and Dynamic Browser for Synchronous Hi-C Data Visualization.” Briefings in Bioinformatics 24, no. 5 (September 20, 2023): bbad283.
- HiCube - web application for Hi-C map and 3D structure visualization, along with 1D annotations. Uses HiGlass. Input - mcool files, db annotation files for HiGlass (created from bigWig, BED, BEDPE, BedGraph, etc.), 3D genome structure data in the g3d format. Runs via Docker. Supplementary Table 1 - Visualization tool comparison, HiCube vs. Nucleome Browser, WashU Epigenome Browser, HiGlass, 3D Genome Browser, TADkit, HiC3d Viewer, HiCPlotter, pyGenomeTracks.
Ye, Tiantian, Yangyang Hu, Sydney Pun, and Wenxiu Ma. “HiCube: Interactive Visualization of Multiscale and Multimodal Hi-C and 3D Genome Data.” Edited by Can Alkan. Bioinformatics, March 24, 2023, btad154.
- - a Python package for genomics data visualization. Supports scRNA-seq, protein–DNA/RNA interactions, long-reads sequencing data, and Hi-C data (BAM, BED, bigWig, bigBed, GTF, BedGraph, h5, more. Strand-aware). Programming and interactive web interfaces. Output: pdf, png, jpg, svg. Bioconda, Pypi. Documentation.
Zhang, Yiming, Ran Zhou, and Yuan Wang. “Sashimi.Py: A Flexible Toolkit for Combinatorial Analysis of Genomic Data.” Preprint. Bioinformatics, November 3, 2022.
- 4D Nucleome Browser - an integrative and multimodal data navigation platform for 4D Nucleome. Synchronized multi-panel visualization of 1D, 2D, 3D models, and microscopy data. Hosts over 2K genomics and 700 image datasets for human (hg38) and mouse (mm10). Supports bigWig, bigBed, Tabix, hic formats. Functionality to compare two conditions, scatterplot for bigWig comparisons. Accepts external data via Nucleome Bridge. Integrated with HiGlass. Documentation, Youtube video tutorials. GitHub. Local installation available. Supplementary Notes - detailed description and illustration of functionality. Table S1 - functionality comparison with UCSC, WashU, others. Table S2 - URLs. Table S3 - data integrated in the browser. Six tutorials, including Single cell analysis tutorial. Gallery of examples.
Zhu, X., Zhang, Y., Wang, Y. et al. Nucleome Browser: an integrative and multimodal data navigation platform for 4D Nucleome. Nat Methods 19, 911–913 (21 July 2022).
- HiCognition - visual exploration of Hi-C data, testing for association with 1D data. Region set concept (ChIP-seq peaks, genes, TADs, etc.). Widget architecture: 1D average, 2D average, embedding/heterogeneity views, association/enrichment analysis. Input: BED, bigWig, cooler files. Data and precomputations are stored in MySQL database. Prebuild dataset of common features. Docker container. Documentation.
Langer, Christoph C. H., Michael Mitter, Roman R. Stocsits, and Daniel W. Gerlich. “HiCognition: A Visual Exploration and Hypothesis Testing Tool for 3D Genomics.” Preprint. Bioinformatics, May 1, 2022.
- plotgardener - R/Bioconductor package for multi-panel genomic and non-genomic data visualization. 45 functions for plotting and annotating multiple genomic data formats: genome sequences, gene/transcript annotations, chromosome ideograms, signal tracks, GWAS Manhattan plots, genomic ranges (e.g. peaks, reads, contact domains, etc), paired ranges (e.g. paired-end reads, chromatin loops, structural rearrangements, QTLs, etc), and 3D chromatin contact frequencies. Auto-recognizes .bam, .bigWig, .hic formats, converts genomic data into standard R objects (e.g., data.frame, tibble, GInteractions). Includes 29 genomes, more through Bioconductor integration. Documentation, Tweet 1, Tweet 2.
Kramer, Nicole E, Eric S Davis, Craig D Wenger, Erika M Deoudes, Sarah M Parker, Michael I Love, and Douglas H Phanstiel. "Plotgardener: Cultivating Precise Multi-Panel Figures in R" Preprint. Bioinformatics, September 10, 2021.
- CoolBox - a pyGenomeTracks-based Hi-C data visualization in Jupyter and command line, matplotlib plotting system. Works with .hic and .m/cool input, plus .bed, .pairs etc. Documentation and examples.
Xu, Weize, Quan Zhong, Da Lin, Guoliang Li, and Gang Cao. "CoolBox: A Flexible Toolkit for Visual Analysis of Genomics Data" Preprint. Bioinformatics, April 16, 2021
- GENOVA - an R package for the most common Hi-C analyses and visualization. Quality control - cis/trans ratio, checking for translocations; A/B compartments - single and differential compartment signal (H3K4me1 for compartment assignment); TADs - insulation score and directionality index; Genomic annotations - ChIP-seq signal, gene information; Distance decay, and differential analysis; Saddle plot; Aggregate region/peak/TAD/loop analysis, and differential analysis; Aggregated signal (tornado) plots; Intra-inter-TAD interaction differences; PE-SCAn (paired-end Spatial Chromatin Analysis) and C-SCAn enhancer-promoter aggregation. Data for statistical tests can be extracted from the discovery objects. Applied to Hi-C data from Hap1 cells, WT and deltaWAPL (published) and knockdown cohesin-SA1/SA2 conditions (HiC-pro, hg19, HiCCUPS). Input - HiC-pro, Juicer, .cool files. Storage - compressed sparse triangle format and the user-added metadata. All tools return the "discovery object". Vignette.
Weide, Robin H. van der, Teun van den Brand, Judith H.I. Haarhuis, Hans Teunissen, Benjamin D. Rowland, and Elzo de Wit. "Hi-C Analyses with GENOVA: A Case Study with Cohesin Variants" Preprint. Genomics, January 24, 2021.
- circHiC - circular representation of Hi-C data, projection into polar coordinates. Linear relationship between the radius of a circle and the corresponding genomic distance. Predominantly for bacterial chromosomes. Python implementation. Documentation.
Junier, Ivan, and Nelle Varoquaux. "CircHiC: Circular Visualization of Hi-C Data and Integration of Genomic Data" Preprint. Bioinformatics, August 14, 2020.
- HiCBricks - data format and visualization package. hdf5-based data storage format to handle large Hi-C matrices. Visualization of one or two Hi-C matrices, adding annotations.
Pal, Koustav, Ilario Tagliaferri, Carmen M Livi, and Francesco Ferrari. "HiCBricks: Building Blocks for Efficient Handling of Large Hi-C Datasets" Edited by Inanc Birol. Bioinformatics, November 7, 2019
- HiCeekR - Shiny app and GUI for Hi-C data analysis and interpretation. Input - aligned BAM file, with marked duplicates, restriction enzyme cutting sites (HRF5), genome in FASTQ, optionally ChIP-seq BAM, or RNA-seq gene expression (TSV). The workflow includes filtering (PCR artifacts, self-circle, dangling end fragments, using diffHiC) with diagnostic plots, binning interaction matrices in BED (coordinates) and TSV (counts) formats, normalization (ICE, WavSiS, using chromoR), calling A/B compartments (PCA, using HiTC), TADs (directionality index, TopDom, HiCseg), gene expression/epigenomic integration, network analysis and enrichment in GO, KEGG, other databases (using gProfileR). Visualization of zoomable heatmaps, networks (ggplot2, plotly, heatmaply, networkD3). Starts with creating configuration file. Compared with GITAR, HiC-Pro, HiC-bench, HiCdat, HiCexplorer, not Juicer or HiGlass. Illustrated using Rao2014 Gm12878 data. 32Gb RAM (minimal 16Gb) is sufficient, preprocessing of BAM files (Hi-C or ChIP-seq) is the longest.
Di Filippo, Lucio, Dario Righelli, Miriam Gagliardi, Maria Rosaria Matarazzo, and Claudia Angelini. "HiCeekR: A Novel Shiny App for Hi-C Data Analysis" Frontiers in Genetics 10 (November 4, 2019): 1079.
- Hi-C data visualization review - Good introduction into the 3D genome organization, 115 key references. Table 2. Hi-C visualization tools.
Ing-Simmons, Elizabeth, and Juan M. Vaquerizas. "Visualising Three-Dimensional Genome Organisation in Two Dimensions" Development 146, no. 19 (October 1, 2019)
- DNARchitect - a Shiny App for visualizing genomic data (HiC, mRNA, ChIP, ATAC, etc.) in bed, bedgraph, and bedpe formats. Wraps Sushi R package. Web-version.
Ramirez, R N, K Bedirian, S M Gray, and A Diallo. "DNA Rchitect: An R Based Visualizer for Network Analysis of Chromatin Interaction Data" Edited by John Hancock. Bioinformatics, August 2, 2019
- HiGlass - visualization server for Google maps-style navigation of Hi-C maps. Overlay genes, epigenomic tracks. "Composable linked views" synchronizing exploration of multiple Hi-C datasets linked by location/zoom. Documentation, links and resources.
Kerpedjiev, Peter, Nezar Abdennur, Fritz Lekschas, Chuck McCallum, Kasper Dinkla, Hendrik Strobelt, Jacob M. Luber, et al. "HiGlass: Web-Based Visual Exploration and Analysis of Genome Interaction Maps" Genome Biology, (December 2018).
- 3D Genome Browser - visualizing existing Hi-C and other chromatin conformation capture data. Alongside with genomic and epigenomic data. Own data can be submitted in BUTLR format.
Wang, Yanli, Fan Song, Bo Zhang, Lijun Zhang, Jie Xu, Da Kuang, Daofeng Li, et al. "The 3D Genome Browser: A Web-Based Browser for Visualizing 3D Genome Organization and Long-Range Chromatin Interactions" Genome Biology 19, no. 1 (December 2018)
- HiPiler - exploration and comparison of loops and domains as snippets-heatmaps of data. Documentation.
Lekschas, Fritz, Benjamin Bach, Peter Kerpedjiev, Nils Gehlenborg, and Hanspeter Pfister. "HiPiler: Visual Exploration of Large Genome Interaction Matrices with Interactive Small Multiples" IEEE Transactions on Visualization and Computer Graphics 24, no. 1 (January 2018). TechBlog: [HiPiler simplifies chromatin structure analysis](
- NAT - the 4D Nucleome Analysis Toolbox, for Hi-C data (text, cool format) normalization (ICE, Toeplitz, CNV-Toeplitz), TAD calling (Directionality index, Armatus, custom), karyotype abnormalities visualization on inter-chromosomal matrices, time-course visualization. Matlab.
Seaman, Laura, and Indika Rajapakse. "4D Nucleome Analysis Toolbox: Analysis of Hi-C Data with Abnormal Karyotype and Time Series Capabilities" Bioinformatics (Oxford, England) 34, no. 1 (01 2018)
- gcMapExplorer - Genome contact map explorer. Analyze and compare Hi-C contact maps, plot other data types alongside. Normalization: KR, IC, and their distance-centric normalization MCFS (median contact frequency). gcmap and HDF5 file format, ccmap is in numpy memmap forman for fast data access, utilities for conversion of coo (sparse 3-columns), pairCoo, homer, hic, bed, bigWig to gcmap/h5 formats. Command line, GUI, API. Python3. Documentation.
Kumar, Rajendra, Haitham Sobhy, Per Stenberg, and Ludvig Lizana. “Genome Contact Map Explorer: A Platform for the Comparison, Interactive Visualization and Analysis of Genome Contact Maps.” Nucleic Acids Research 45, no. 17 (September 29, 2017): e152–e152.
- HiC-3DViewer - HiC-3DViewer is an interactive web-based tool designed to provide an intuitive environment for investigators to facilitate the 3D exploratory analysis of Hi-C data. It based on Flask and can be run directly or as a docker container.
Mohamed Nadhir, Djekidel, Wang, Mengjie, Michael Q. Zhang, Juntao Gao. "HiC-3DViewer: a new tool to visualize Hi-C data in 3D space" Quantitative Biology (2017)
- HiCPlotter - Hi-C visualization tool, allows for integrating various data tracks. Python implementation.
Akdemir, Kadir Caner, and Lynda Chin. "HiCPlotter Integrates Genomic Data with Interaction Matrices" Genome Biology 16 (2015)
- NuChart - gene-centric network of genes interacting in 3D. Integration of epigenomic features. Statistical network analysis. Code unavailable.
Merelli, Ivan, Pietro Liò, and Luciano Milanesi. “NuChart: An R Package to Study Gene Spatial Neighbourhoods with Multi-Omics Annotations.” PloS One 8, no. 9 (2013): e75146.
- HiTC - R package for High Throughput Chromosome Conformation Capture analysis, Processed data import from TXT/BED into GRanges. Quality control, visualization. Normalization, 45-degree rotation and visualization of triangle TADs. Adding annotation at the bottom. PCA to detect A/B compartments.
Servant, Nicolas, Bryan R. Lajoie, Elphège P. Nora, Luca Giorgetti, Chong-Jian Chen, Edith Heard, Job Dekker, and Emmanuel Barillot. "HiTC: Exploration of High-Throughput ‘C’ Experiments" Bioinformatics (Oxford, England) 28, no. 21 (November 1, 2012)
HiCognition - A visual exploration and hypothesis testing tool for 3D genomics. HiCognition is a data exploration tool that allows stream-lined exploration of aggregate genomic data. HiCognition is centered around Hi-C data but also enables integration of Chip-seq and region-based data. Docker installation, uses D3.js and Pixi.js. Similar to HiGlass, HiPiler. Documentation
MuGVRE - The MuG Virtual Research Environment supports the expanding 3D/4D genomics community by developing tools to integrate and visualize genomics data from sequence to 3D/4D chromatin dynamics data
pyGenomeTracks - python module to plot beautiful and highly customizable genome browser tracks.
TADKit - 3D Genome Browser. Main web site.
- Benchmarking of three genome assemblers from Hi-C data (3d-dna, SALSA2, YaHS) on a de novo assembly of Arabidopsis Thaliana. Oxford Nanopore read processing for making the draft assembly. Hi-C scaffolding, overview of each scaffolder, processing with Arima pipeline, QUAST and BUSCO metrics for benchmarking. YaHS performs best, easiest to install and use.
Obinu, Lia, Urmi Trivedi, and Andrea Porceddu. “Benchmarking of Hi-C Tools for Scaffolding de Novo Genome Assemblies.” Preprint. Genomics, May 18, 2023.
- Benchmark of five Hi-C scaffolders (Lachesis, HiRise, 3D-DNA, SALSA, ALLHiC). Accuracy measured by matching scaffolds with the assembly contigs. On average, HiRise and Lachesis performed the best, with HiRise and Salsa working best on less fragmented assemblies, and HiRise, Lacheis, or AllHiC being better choices for more fragmented assemblies. Details and problems with some software. Docker images for individual tools.
Sur, Aakash, William Stafford Noble, and Peter J. Myler. "A benchmark of Hi-C scaffolders using reference genomes and de novo assemblies." bioRxiv (April 20, 2022).
- HapHiC - a reference-independent allele-aware Hi-C scaffolding tool. HapHiC exhibits superior performance in handling haplotype-resolved assemblies without the need for reference genomes. This scaffolding tool supports various ploidy levels and has been successfully validated across diverse taxa. This study also provides new insights into the challenges in allele-aware scaffolding, comparing it with other scaffolders such as ALLHiC, YaHS, 3D-DNA, LACHESIS, and SALSA2 through comprehensive analyses on various adverse factors. Conda-installable
Zeng, Xiaofei, Zili Yi, Xingtan Zhang, Yuhui Du, Zhiqing Zhou, Sijie Chen, Huijie Zhao, Sai Yang, Yibin Wang, and Guoan Chen. “Chromosome-level scaffolding of haplotype-resolved assemblies using Hi-C data without reference genomes.” Nature Plants, 05 August,2024.
- pin_hic - Hi-C graph scaffolder based on N-best neighbor contigs and an iterative weighted linking approach to extend them. Tested on three draft assemblies (Hi-C from Arima), outperforms SALSA2. Requires draft assembly, to align Hi-C reads with bwa mem -SP -B10 -t12. New SAT format for expanded graph information storage, satool to visualize and convert.
Guan, Dengfeng, Shane A. McCarthy, Zemin Ning, Guohua Wang, Yadong Wang, and Richard Durbin. “Efficient Iterative Hi-C Scaffolder Based on N-Best Neighbors.” BMC Bioinformatics 22, no. 1 (December 2021): 569.
- EndHiC - chromosome scaffolding using Hi-C links from contig ends. Requires contigs from PacBio's HiFiasm technology (assembled by HiCanu), and Hi-C data processed by HiC-Pro. Applied to human, rice, Arabidopsis, achieves higher accuracy than Lachesis, ALLHiC, 3D-DNA. Perl scripts.
Wang, Sen, Hengchao Wang, Fan Jiang, Anqi Wang, Hangwei Liu, Hanbo Zhao, Boyuan Yang, Dong Xu, Yan Zhang, and Wei Fan. "EndHiC: Assemble Large Contigs into Chromosomal-Level Scaffolds Using the Hi-C Links from Contig Ends]( ArXiv, 30 Nov 2021
- instaGRAAL - reimplementation of GRAAL genome assembler (chromosome level) for large genomes. Similar MCMC approach, implemented on NVIDIA GPU. Tested, among others, on segments of the human genome.
Baudry, Lyam, Nadège Guiglielmoni, Hervé Marie-Nelly, Alexandre Cormier, Martial Marbouty, Komlan Avia, Yann Loe Mie, et al. "InstaGRAAL: Chromosome-Level Quality Scaffolding of Genomes Using a Proximity Ligation-Based Scaffolder" Genome Biology 21, no. 1 (December 2020)
- bin3C - resolving metagenome-assembled genomes from Hi-C data. Metagenomic assembly using SPAdes. Tested using simulated (Sim3C and MetaART) and real-life data. Performance metrics: adjusted mutual information, weighted Bcubed. Contact matrix where bins are contigs. Infomap method for clustering the whole-contig graph. Compared with ProxiMeta (Phase Genomics).
DeMaere, Matthew Z., and Aaron E. Darling. "Bin3C: Exploiting Hi-C Sequencing Data to Accurately Resolve Metagenome-Assembled Genomes" Genome Biology 20, no. 1 (December 2019)
- HiCAssembler - Hi-C scaffolding tool combining assembly using Hi-C data with scaffolds from regular sequencing (short or long sequencing). Uses strategies from Lachesis and 3D-DNA. Visual adjustment of scaffolding errors. Automatic and manual misassembly correction.
Renschler, Gina, Gautier Richard, Claudia Isabelle Keller Valsecchi, Sarah Toscano, Laura Arrigoni, Fidel Ramirez, and Asifa Akhtar. "Hi-C Guided Assemblies Reveal Conserved Regulatory Topologies on X and Autosomes despite Extensive Genome Shuffling" BioRxiv, March 18, 2019.
- 3D-DNA Hi-C genome assembler and its application/validation. Methods are in the supplemental. DNA Zoo - genome assemblies using Hi-C, methods, papers.
Dudchenko, Olga, Sanjit S. Batra, Arina D. Omer, Sarah K. Nyquist, Marie Hoeger, Neva C. Durand, Muhammad S. Shamim, et al. "De Novo Assembly of the Aedes Aegypti Genome Using Hi-C Yields Chromosome-Length Scaffolds" Science (New York, N.Y.) 356, no. 6333 (07 2017)
Genome Assembly Cookbook - Genome assembly from Hi-C data, pipeline and instructions from the Aiden Lab. GRAAL - Genome (Re)Assembly Assessing Likelihood - genome assembly from Hi-C data. Gaps in genome assembly that can be filled by scaffolding. Superior to Lachesis and dnaTri, which are sensitive to duplications, clustering they use to initially arrange the scaffolds, parameters, unknown reliability. A Bayesian approach, prior assumptions are that cis-contact probabilities follow a power-law decay and that counts in the interaction matrix are Poisson. Multiple genomic structures tested using MCMC (Multiple-Try Metropolis algorithm) to maximize the likelihood of data given a genomic structure.
Marie-Nelly, Hervé, Martial Marbouty, Axel Cournac, Jean-François Flot, Gianni Liti, Dante Poggi Parodi, Sylvie Syan, et al. "High-Quality Genome (Re)Assembly Using Chromosomal Contact Data" Nature Communications 5 (December 17, 2014)
- dnaTri - genome scaffolding via probabilistic modeling using two constraints of Hi-C data - distance-dependent decay and cis-trans ratio. Using known chromosome scaffolds and de novo assembly. Naive Bayes classifier to distinguish chromosome-specific vs. on different chromosomes contigs. Average linkage clustering to assemble contigs into 23 groups of chromosomes. Completed 65 previously unplaced contigs. Data.
Kaplan, Noam, and Job Dekker. "High-Throughput Genome Scaffolding from in Vivo DNA Interaction Frequency" Nature Biotechnology 31, no. 12 (December 2013)
- Lachesis - a three-step genome scaffolding tool: 1) graph clustering of scaffolds to chromosome groups, 2) ordering clustered scaffolds (minimum spanning tree, reassembling longest-to-shortest branches), 3) assigning orientation (exact position and the decay of interactions). Duplications and repeat regions may be incorrectly ordered/oriented. Tested on a normal human, mouse, drosophila genomes, and on the HeLa cancer genome.
Burton, Joshua N., Andrew Adey, Rupali P. Patwardhan, Ruolan Qiu, Jacob O. Kitzman, and Jay Shendure. "Chromosome-Scale Scaffolding of de Novo Genome Assemblies Based on Chromatin Interactions" Nature Biotechnology 31, no. 12 (December 2013)
- YaHS - yet another Hi-C scaffolding tool. It relies on a new algothrim for contig joining detection which considers the topological distribution of Hi-C signals aiming to distingush real interaction signals from mapping nosies. Implemented in C, very fast.
- OpenMiChroM - OpenMiChroM is a Python library for performing chromatin dynamics simulations. See documentation here here. Primarily using the OpenMM Python API and the MiChroM (Minimal Chromatin Model) energy function, it generates an ensemble of 3D chromosomal structures consistent with experimental Hi-C maps for Human and other organisms. OpenMiChroM offers flexibility by supporting custom energy functions and optimization procedures like direct inversion, as exemplified in the Woolly Mammoth genome modeling. This makes it suitable for simulations of single or multiple chromosome chains using High-Performance Computing in different platforms (GPUs and CPUs).
Oliveira Jr, A. B., Contessoto, V. G., Mello, M. F., & Onuchic, J. N. (2020). A scalable computational approach for simulating complexes of multiple chromosomes. Journal of Molecular Biology, 433(6), 166700.
- The Nucleome Data Bank (NDB) - A web platform to simulate and browse the three-dimensional architecture of genomes. Data download from different studies, and computationally simulated. Interactive visualization of these data. Molecular Dynamics simulation using the MEGABASE (Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles, neural network to predict A/B compartments from 1D epigenomic data) + MiChroM (Minimal Chromatin Model, energy landscape model, documentation) computational pipeline. Input: 1D epigenomic data, DNA proximity (3C, HiC, GAM, SPRITE), imaging data. Output: chromosomal folding predictive of HiC, imaging data, GROMACS input files. .ndb format to store nucleome structural data, analog of the Protein Data Bank .pdb format.
Contessoto, Vinícius G., Ryan R. Cheng, Arya Hajitaheri, Esteban Dodero-Rojas, Matheus F. Mello, Erez Lieberman-Aiden, Peter G. Wolynes, Michele Di Pierro, and José N. Onuchic. "The Nucleome Data Bank: web-based resources to simulate and analyze the three-dimensional genome." Nucleic Acids Research 49, no. D1 (8 January 2021): D172-D182.
- Pastis-NB - an extrension of Pastis 3D modeling tool with negative binomial distribution-based modeling. Compared with MDS-based methods (ShRec3D, ChromSDE, Pastis-MDS) and Pastis- PM (Poisson model). More accurate and stable results. Supplementary material.
Nelle Varoquaux, William Stafford Noble, Jean-Phillipe Vert. "Inference of genome 3D architecture by modeling overdispersion of Hi-C data" bioRxiv. February 05, 2021
- TADdyn - studying time-dependent dynamics of chromatin domains during natural and induced cell processes by simulating smooth 3D transitions of chromosome structure. A part of TADBit, developed by the Marti-Renom group. Tested on in situ Hi-C time course experiment, reprogramming of murine B cells to pluripotent cells, changes of 21 genomic loci. Data and video.
Di Stefano, Marco, Ralph Stadhouders, Irene Farabella, David Castillo, François Serra, Thomas Graf, and Marc A. Marti-Renom. "Transcriptional Activation during Cell Reprogramming Correlates with the Formation of 3D Open Chromatin Hubs" Nature Communications 11, no. 1 (December 2020)
- StoH-C - 3D genome reconstruction using tSNE. Python scripts for 3D embedding and visualization (plot-ly, matplotlib, Chart Studio). Visually tested on fission yeast genome as compared with MDS-reconstructed genome (wild type, G1-arrested, rad21 mutation, clr4 deletion).
MacKay, Kimberly, and Anthony Kusalik. "StoHi-C: Using t-Distributed Stochastic Neighbor Embedding (t-SNE) to Predict 3D Genome Structure from Hi-C Data" Preprint. Bioinformatics, January 29, 2020.
- Hierarchical3DGenome - high-resolution (5kb) reconstruction of the 3D structure of the genome. Using LorDG, first, assemble the 3D model at the level of TADs, then inside individual TADs. Gm12878 cell line, Arrowhead for TAD calling, KR and ICE normalization, benchmarking against miniMDS, five tests including comparison with FISH.
Trieu, Tuan, Oluwatosin Oluwadare, and Jianlin Cheng. "Hierarchical Reconstruction of High-Resolution 3D Models of Large Chromosomes" Scientific Reports 9, no. 1 (March 21, 2019): 4971.
- CSynth - 3D genome interactive modeling on GPU, and visualization.
Todd, Stephen, Peter Todd, Simon J McGowan, James R Hughes, Yasutaka Kakui, Frederic Fol Leymarie, William Latham, and Stephen Taylor. "CSynth: A Dynamic Modelling and Visualisation Tool for 3D Chromatin Structure" BioRxiv, January 1, 2019
- 3DMax - reconstruction of 3D genome structure. Maximum likelihood objective function. Many simplifying assymptions. Gradient ascent algorithm. Distance Pearson and Spearman correlation coefficients for comparing 3D structures.
Oluwadare, Oluwatosin, Yuxiang Zhang, and Jianlin Cheng. “A Maximum Likelihood Algorithm for Reconstructing 3D Structures of Human Chromosomes from Chromosomal Contact Data.” BMC Genomics 19, no. 1 (December 2018).
- GenomeFlow - a complete set of tools for Hi-C data alignment, normalization, 2D visualization, 3D genome modeling and visualization. ClusterTAD for TAD identification. LorDG and 3DMax for 3D genome reconstruction.
Trieu, Tuan, Oluwatosin Oluwadare, Julia Wopata, and Jianlin Cheng. "GenomeFlow: A Comprehensive Graphical Tool for Modeling and Analyzing 3D Genome Structure" Bioinformatics (Oxford, England), September 12, 2018.
- ShRec3D - shortest-path reconstruction in 3D. Genome reconstruction by translating a Hi-C matrix into a distance matrix, then multidimensional scaling. Uses binary contact maps.
Lesne, Annick, Julien Riposo, Paul Roger, Axel Cournac, and Julien Mozziconacci. "3D Genome Reconstruction from Chromosomal Contacts" Nature Methods 11, no. 11 (November 2014): 1141–43.
- 4DMax - 3D modeling over time, predicts dynamic chromosome conformation using time course Hi-C data. In contrast to TADdyn, models entire chromosomes, uses gradient descent optimization of a spatial restraint-based maximum-likelihood function. Also in contrast to TADdyn that focuses on approx. 2Mb retions, 4DMax models whole chromosomes. Tested on simulate Hi-C progression over 6 time points, and 10-day time course of induced stem cell pluripotency in mice. Preserves and predicts A/B compartments, TADs. Output - video of chromosome dynamics
- THUNDER - cell-type deconvolution of Hi-C data. NMF, uses informative interactions within and between chromosomes (top 1000 features by Fano factor), reformatted into matrix form by concatenation. Needs number of cell types k. Tested on in silico mixture of cell types. Outperforms TOAST, CIBERSORT. R implementation.
Rowland, Bryce and Huh, Ruth and Hou, Zoey and Hu, Ming and Shen, Yin and Li, Yun "THUNDER: A Reference-Free Deconvolution Method to Infer Cell Type Proportions from Bulk Hi-C Data" bioRxiv, November 12, 2020
- CIRCLET - cell cycle phases reconstruction from scHi-C data. Circular trajectory reconstruction using diffusion mapr to embed the data and construct a k-nearest-neighbor (kNN) graph with cells as nodes and edges as connections to closest cells. Random selection of starting cell, continuing to clockwise and counterclockwise directions and linking two semi-circles. Input: four different feature sets: multiple com- posite metrics (MCM), contact probability distribution versus genomic distance (CDD), pairs’ contact coverage (PCC), and insulation score of each bin (Ins), and their combinations. 12 different stages can be distinguished based on the dynamics of chromatin structure.
Ye, Yusen, Lin Gao, and Shihua Zhang. “Circular Trajectory Reconstruction Uncovers Cell‐Cycle Progression and Regulatory Dynamics from Single‐Cell Hi‐C Maps.” Advanced Science 6, no. 23 (December 2019): 1900986.
- GAMIBHEAR - an R package for haplotype reconstruction in GAM (Genome Architecture Mapping, nuclear cryosectioned profiles) data. Uses the local proximity of SNVs extended to larger genomic windows using a proximity-scaled graph-based approach. Tested on the F123 mESCs, outperforms WhatsHap and HapCHAT, fast.
Markowski, Julia, Rieke Kempfer, Alexander Kukalev, Ibai Irastorza-Azcarate, Gesa Loof, Birte Kehr, Ana Pombo, Sven Rahmann, and Roland F Schwarz. “GAMIBHEAR: Whole-Genome Haplotype Reconstruction from Genome Architecture Mapping Data.” Edited by Yann Ponty. Bioinformatics 37, no. 19 (October 11, 2021): 3128–35.
HaploHiC - Hi-C phasing using SNPs/InDels, placement of unphased reads inferred from the nearby phased reads.
Lindsly, Stephen, Wenlong Jia, Haiming Chen, Sijia Liu, Scott Ronquist, Can Chen, Xingzhao Wen, et al. "Functional Organization of the Maternal and Paternal Human 4D Nucleome" bioRxiv, June 17, 2021. Supplementary note 2 - algorithm details. -
HiCHap - a Python 2.7 package designed to process diploid (and haploid) Hi-C data by using phased SNPs. Propose a novel strategy to correct the systematic biases in diploid Hi-C contact maps, includes VC or ICE normalization. Perform read mapping with read rescue using ligation junction sites, contact map construction based on phased SNPs, whole-genome identification of A/B compartments (PCA_, topological domains (DI) and chromatin loops (HiCCUPS), and allele-specific testing for diploid Hi-C data (permutation, paired t-test, binomial). GitHub.
Luo, H., Li, X., Fu, H. et al. HiCHap: a package to correct and analyze the diploid Hi-C data. BMC Genomics, October 27, 2020
- 3D Genome, from technology to visualization - a GitBook by Xingzhao Wen and Sheng Zhong covering biological and computational aspects of 3D genomics and RNA-genome interactions.
- Review of computational tools for 3D genome analysis. Hi-C technology and data introduction, features and their visual representation (A/B compartments, subcompartments, TADs, meta-TADs, sub-TADs, Structural variations, Chromatin loops, SMC stalled on one side, Rabl configurations, Chromatin jets, SMC interactions, Stripes), methods and tools for detecting them. Systematic description of all tools (Tables 2-5), categorized by methodoligical advances. Comprehensive, up-to-date (2023).
Raffo, Andrea, and Jonas Paulsen. “The Shape of Chromatin: Insights from Computational Recognition of Geometric Patterns in Hi-C Data.” Briefings in Bioinformatics 24, no. 5 (September 20, 2023): bbad302.
pipelines_list.csv - A list of available pipelines, URLs, from Miura et al., “Practical Analysis of Hi-C Data”
pipeline_comparison.csv - Available analysis options in each pipeline, from Miura et al., “Practical Analysis of Hi-C Data”
Table summarizing functionality of Hi-C data analysis tools, from Calandrelli et al., “GITAR: An Open Source Tool for Analysis and Visualization of Hi-C Data(”
Review of computational methods for 3D genome analysis. Technologies (ligation-based and free, methods for multiway interactions, single-cell, imaging). Genomics-based analysis (from Hi-C data only, compartments, TADs, loops, stripes), predictive methods (machine learning, neural networks, predicting pairwise chromatin interactions, including enhancer-promoter, contact frequency maps, multiway interactions, using sequence information and/or epigenomic data), 3D polymer models (loop extrusion, phase separation). Single-cell 3D genome analysis (sparsity problem, imputation, dimensionality reduction, cell cycle analysis). Glossary of terms.
Zhang, Yang, Lorenzo Boninsegna, Muyu Yang, Tom Misteli, Frank Alber, and Jian Ma. “Computational Methods for Analysing Multiscale 3D Genome Organization.” Nature Reviews Genetics, September 6, 2023.
- “Analysis Methods for Studying the 3D Architecture of the Genome" - Genome Hi-C technology and methods review. Table 1 - list of tools. Biases, normalization, matrix balancing. Extracting significant contacts, obs/exp ratio, parametric (power-law, neg binomial, double exponential), non-parametric (splines). 3D enrichment. References. TAD identification, directionality index. Outlook, the importance of comparative analysis.
Ay, Ferhat, and William S. Noble. “Analysis Methods for Studying the 3D Architecture of the Genome(” Genome Biology 16 (September 2, 2015)
- “Computational Methods for Assessing Chromatin Hierarchy" - Review of higher-order (chromatin conformation capture) and primary order (DNAse, ATAC) technologies and analysis tools. Table 1 - technology summaries. Table 2 - tool summaries. Inter-chromosomal calls using Binarized contact maps. Visualization. Primary order technologies - details and peak calling.
Chang, Pearl, Moloya Gohain, Ming-Ren Yen, and Pao-Yang Chen. “Computational Methods for Assessing Chromatin Hierarchy(” Computational and Structural Biotechnology Journal 16 (2018)
- “Computational Methods for Analyzing Genome-Wide Chromosome Conformation Capture Data" - 3C-Hi-C tools review, Table 1 lists categorizes main tools, Figure 1 displays all steps in technology and analysis (alignment, resolution, normalization, including accounting for CNVs, A/B compartments, TAD detection, visualization). A concise description of all tools.
Nicoletti, Chiara, Mattia Forcato, and Silvio Bicciato. “Computational Methods for Analyzing Genome-Wide Chromosome Conformation Capture Data(” Current Opinion in Biotechnology 54 (December 2018)
- “Hi-C Analysis: From Data Generation to Integration" - Hi-C technology, data, 3D structures, analysis, and tools. Technology improvement and increasing resolution. FASTQ processing steps ("Hi-C data analysis: from FASTQ to interaction maps" section), pipelines, finding minimum resolution, normalization. Downstream analysis: A/B compartment detection, TAD callers, Hierarchical TADs, interaction callers. Data formats (pairix, sparse matrix format, cool, hic, butlr, hdf5, pgl). Hi-C visualization tools. Table 2 - summary and comparison of all tools
Pal, Koustav, Mattia Forcato, and Francesco Ferrari. “Hi-C Analysis: From Data Generation to Integration(” Biophysical Reviews, December 20, 2018.
- “Software Tools for Visualizing Hi-C Data" - Hi-C technology, data, and visualization review. Suggestions about graph representation.
Yardımcı, Galip Gürkan, and William Stafford Noble. “Software Tools for Visualizing Hi-C Data(” Genome Biology 18, no. 1 (December 2017).
- “Storage, Visualization, and Navigation of 3D Genomics Data" - Review of tools for visualization of 3C-Hi-C data, challenges, analysis (Table 1). Data formats (hic, cool, BUTLR, ccmap). Database to quickly access 3D data. Details of each visualization tool in Section 4.
Waldispühl, Jérôme, Eric Zhang, Alexander Butyaev, Elena Nazarova, and Yan Cyr. “Storage, Visualization, and Navigation of 3D Genomics Data(” Methods, May 2018
- “An Overview of Methods for Reconstructing 3-D Chromosome and Genome Structures from Hi-C Data" - 3D genome reconstruction review. Intro into equilibrium/fractal globule models. Classification of reconstruction methods: distance-, contact-. and probability-based. Table 1 summarizes many tools, methods, and references.
Oluwadare, Oluwatosin, Max Highsmith, and Jianlin Cheng. “An Overview of Methods for Reconstructing 3-D Chromosome and Genome Structures from Hi-C Data.(” Biological Procedures Online 21, no. 1 (December 2019)
- 4D Nucleome project overview, current state and future goals. Studying chromatin dynamics at different timescales, during cell cycle, differentiation, stress response, senescence, and pathogenic processes, including in live mammalian cells. Spatial and temporal organization of the nubleus by integrating imaging and genomic datasets. Overview of 4DN technologies for nucleosome mapping (Hi-C 3.0, Micro-C, multi-way contacts with GAM, PLAC-seq, ChIA-Drop, ChIA-PET, SPRITE, RNA-chromatin interactions with iMARGI, protein-chromatin with C-BERST, cytological distances with TSA-seq, GPSeq), time-resolved and single-cell technologies (DamID, liquid chromatin HiC LC-Hi-C, multi-contact 3C, Repli-seq), multiplexFISH for single-cell chromatin measures (ball-and-stick, or volumetric chromatin tracing). Standards (Open Chromosome Collective), including for imaging (Open Microscopy Environment, BioImaging North America). Association of sequence signatures with chromatin organization, the effect of repetitive DNA, the effect of phase separation, RNA, cohesin complexes, machine learning tools for integrative models. Data portal.
Dekker, Job, Frank Alber, Sarah Aufmkolk, Brian J. Beliveau, Benoit G. Bruneau, Andrew S. Belmont, Lacramioara Bintu, et al. “Spatial and Temporal Organization of the Genome: Current State and Future Aims of the 4D Nucleome Project.” Molecular Cell, July 2023, S1097276523004653.
- “Getting the Genome in Shape: The Formation of Loops, Domains and Compartments".” - TAD/loop formation review. Convergent CTCF, cohesin, mediator, different scenarios of loop formation. Stability and dynamics of TADs. Rich source of references.
Bouwman, Britta A. M., and Wouter de Laat. “Getting the Genome in Shape: The Formation of Loops, Domains and Compartments(” Genome Biology 16 (August 10, 2015)
- “The Role of 3D Genome Organization in Disease: From Compartments to Single Nucleotides" - 3D genome structure and disease. Evolution of technologies from FISH to variants of chromatin conformation capture. Hierarchical 3D organization, Table 1 summarizes each layer and its involvement in disease. Rearrangement of TADs/loops in cancer and other diseases. Specific examples of the biological importance of TADs, loops as means of distal communication.
Chakraborty, Abhijit, and Ferhat Ay. “The Role of 3D Genome Organization in Disease: From Compartments to Single Nucleotides(” Seminars in Cell & Developmental Biology 90 (June 2019): 104–13.
- "The role of 3D genome organization in development and cell differentiation" - 3D structure of the genome and its changes during gametogenesis, embryonic development, lineage commitment, differentiation. Changes in developmental disorders and diseases. Chromatin compartments and TADs. Chromatin changes during X chromosome inactivation. Promoter-enhancer interactions established during development are accompanied by gene expression changes. Polycomb-mediated interactions may repress developmental genes. References to many studies.
Zheng, H., and Xie, W. (2019). "The role of 3D genome organization in development and cell differentiation(" Nat. Rev. Mol. Cell Biol.
- “The Three-Dimensional Organization of Mammalian Genomes" - 3D genome structure review. The role of gene promoters, enhancers, and insulators in regulating gene expression. Imaging-based tools, all flavors of chromatin conformation capture technologies. 3D features - chromosome territories, topologically associated domains (TADs), the association of TAD boundaries with replication domains, CTCF binding, transcriptional activity, housekeeping genes, genome reorganization during mitosis. Use of 3D data to annotate noncoding GWAS SNPs. 3D genome structure change in disease.
Yu, Miao, and Bing Ren. “The Three-Dimensional Organization of Mammalian Genomes(” Annual Review of Cell and Developmental Biology 33 (06 2017)
- “Hierarchical Folding and Reorganization of Chromosomes Are Linked to Transcriptional Changes in Cellular Differentiation" - 3D genome organization parts. Well-written and detailed. References. Technologies: FISH, 3C. 4C, 5C, Hi-C, GCC, TCC, ChIA-PET. Typical resolution - 40bp to 1Mb. LADs - conserved, but some are cell type-specific. Chromosome territories. Cell type-specific. Inter-chromosomal interactions may be important to define cell-specific interactions. A/B compartments identified by PCA. Chromatin loops, marked by CTCF and Cohesin binding, sometimes, with Mediator. Transcription factories.
Fraser, J., C. Ferrai, A. M. Chiariello, M. Schueler, T. Rito, G. Laudanno, M. Barbieri, et al. “Hierarchical Folding and Reorganization of Chromosomes Are Linked to Transcriptional Changes in Cellular Differentiation(” Molecular Systems Biology 11, no. 12 (December 23, 2015)
- “Exploring the Three-Dimensional Organization of Genomes: Interpreting Chromatin Interaction Data" - 3D genome review. Chromosomal territories, transcription factories. Details of each 3C technology. Exponential decay of interaction frequencies. Box 2: A/B compartments (several Mb), TAD definition, size (hundreds of kb). TADs are largely stable, A/B compartments are tissue-specific. Adjacent TADs are not necessarily of opposing signs, may jointly form A/B compartments. Genes co-expression, enhancer-promoters interactions are confined to TADs. 3D modeling.
Dekker, Job, Marc A. Marti-Renom, and Leonid A. Mirny. “Exploring the Three-Dimensional Organization of Genomes: Interpreting Chromatin Interaction Data(” Nature Reviews. Genetics 14, no. 6 (June 2013)
- “On the Assessment of Statistical Significance of Three-Dimensional Colocalization of Sets of Genomic Elements"
Witten, Daniela M., and William Stafford Noble. “On the Assessment of Statistical Significance of Three-Dimensional Colocalization of Sets of Genomic Elements(” Nucleic Acids Research 40, no. 9 (May 2012)
- 4D Nucleome analysis. An in-depth analysis and integration of the 3D genomics mapping assays (chromatin interaction, Hi-C, Micro-C, ChIA-PET, PLAC-Seq, and physical distance-based, SPRITE, GAM, bulk and single-cell). Human embryonic stem cells (H1-hESCs) and immortalized fibroblasts (HFFc6). Comparing distance-dependent decays, A/B compartments and subcompartments, LADs, TAD and loop boundaries, SPIN states, their association with gene expression and with each other. Different chromatin interaction assays tend to detect different types of 3D structures with different efficiency.
4D Nucleome Consortium, Job Dekker, Betul Akgol Oksuz, Yang Zhang, Ye Wang, Miriam K. Minsk, Shuzhen Kuang, et al. “An Integrated View of the Structure and Function of the Human 4D Nucleome,” September 19, 2024.
4D Nucleome Protocols - Collection of genomic technologies currently in use or being developed in the 4DN network - links to wet-lab protocols and papers. 4DN portal blog
liCHi-C - low input capture Hi-C technology (above 50K cells in contrast to 30-50M cells for promoter-capture). Developed for promoter capture but can be adapted to any capture regions. HiCUP pipeline, CHiCAGO to detect significant interactions, HICCUPS, Mustache and HiCExplorer to call loops on Knight-Ruiz normalized matrices at 5kb resolution. Applied to normal and malignant human hematopoietic hierarchy in clinical samples, reconstructs lineages. Benchmarked against Low-C, Hi-C with different restriction enzymes, TagHi-C. Can be used to detect structural variants, breakpoints, link disease variants to genes. Data, scripts and processed data on Zenodo. Additional software: CHIGP R package for integrating GWAS summary statistics with CHiCAGO output, poor man's imputation algorithm for GWAS, Blockshifter, COGS. RegioneReloaded - Bioconductor package for comparative permutation analysis of multiple genomic region sets. HiCaptuRe - R package for Capture Hi-C data management, handles CHiCAGO output. See also the promoter-capture PCHi-C paper. GitHub.
Tomás-Daza, Laureano, Llorenç Rovirosa, Paula López-Martí, Andrea Nieto-Aliseda, François Serra, Ainoa Planas-Riverola, Oscar Molina, et al. “Low Input Capture Hi-C (LiCHi-C) Identifies Promoter-Enhancer Interactions at High-Resolution.” Nature Communications 14, no. 1 (January 17, 2023): 268.
- pHi-C 3.0 protocol - Evaluation of 12 experimental Hi-C protocols - they capture different 3D genome features with different efficiencies. Additional crosslinking with DSG improves signal-to-noise, loop detection, reduced compartment detection. Evaluating 4 restriction enzymes, MNase, DdeI, DpnII, HindIII. 4 cell lines - H1-hESCs, differentiated endoderms, HFF, HeLa (two cell cycle stages). 63 libraries total. All protocols detect cell type-specific differences, A/B compartments, insulation strength. MNase digestion improves loop detection. Anchors for multi-loop interactions can be detected. Double enzyme use improves loop detection. Evaluation of enrichment of CTCF, SMC3, H3K4me3, H3K27ac at loop boundaries. Ultra-deeply sequenced maps using Hi-C, Micro-C, and Hi-C 3.0 protocols (not yet available). cLIMS Hi-C data management system. Scripts to reproduce results. Tweet.
Oksuz, Betul Akgol, Liyan Yang, Sameer Abraham, Sergey V. Venev, Nils Krietenstein, Krishna Mohan Parsi, Hakan Ozadam, et al. "Systematic Evaluation of Chromosome Conformation Capture Assays" Preprint. Genomics, December 27, 2020.
- Chromatin conformation capture technologies, from 3C to imaging - 3D structures (nucleolus, nuclear speckles, polycomb bodies, chromosome territories, A/B compartments, TADs, loops), their roles in gene expression (sometimes, conflicting), replication timing, DNA repair. Agreement (and disagreement) examples between 3C methods and FISH. Description of libation-based (3C - Hi-C) and ligation-free (GAM, SPRITE, DamC) technologies. Multiway interactions, primarily occur within TADs. TAD formation, loop extrusion mechanisms. Association between replication timing and A/B compartments. Effect of mechanical forces on chromosome folding.
McCord, Rachel Patton, Noam Kaplan, and Luca Giorgetti. "Chromosome Conformation Capture and Beyond: Toward an Integrative View of Chromosome Structure and Function" Molecular Cell, January 2020
- Review of Hi-C, Capture-C, and Capture-C technologies, their computational preprocessing - Experimental protocols, similarities and differences, types of reads (figures), details of alignment, read orientation, elimination of artifacts, quality metrics. A brief overview of preprocessing tools. Example preprocessing of three types of data. Java tool for preprocessing all types of data, Diachromatic (Differential Analysis of Chromatin Interactions by Capture), GOPHER (Generator Of Probes for capture Hi-C Experiments at high Resolution) for genome cutting, probe design.
Hansen, Peter, Michael Gargano, Jochen Hecht, Jonas Ibn-Salem, Guy Karlebach, Johannes T. Roehr, and Peter N. Robinson. "Computational Processing and Quality Control of Hi-C, Capture Hi-C and Capture-C Data" Genes 10, no. 7 (July 18, 2019): 548.
- Promoter Capture Hi-C wet-lab protocol and video instructions.
Schoenfelder, Stefan, Biola-Maria Javierre, Mayra Furlan-Magaril, Steven W. Wingett, and Peter Fraser. “Promoter Capture Hi-C: High-Resolution, Genome-Wide Profiling of Promoter Interactions.” Journal of Visualized Experiments, no. 136 (June 28, 2018): 57320.
- Chromosome conformation capture technologies, 4C, 5C, Hi-C, ChIP-loop, ChIA-PET - From microscopy observations (constrained m