# Balanced Non-negative Matrix Factorization on Hi-C Contact Maps
#### Xihao Hu, Christina Huan Shi, and Kevin Yip*
#### The Chinese University of Hong Kong

## Introduction


Hi-C is a powerful experimental method to probe DNA-DNA long-range interactions on the whole genome [Lieberman-Aiden2009].

We developed a novel computational method using a balanced non-negative matrix factorization (BNMF) that can flexibly identify small clusters of spatially proximal genomic regions based on Hi-C contact maps.

Here, we give examples on how to use our tool.

## Materials and Methods
The environmental requirement is:
- python 2.7 (https://www.python.org)
- numpy (http://www.numpy.org/)
- matplotlib (http://matplotlib.org/)
- scipy [optional] (http://www.scipy.org/)
- numexpr [optional] (https://github.com/pydata/numexpr)

Or, you can use the free installer provided by [Anaconda](https://store.continuum.io/cshop/anaconda/) to avoid any configuration issue.

Download the [bnmf package](bnmf_v2.tar.gz) that contains source codes and data.

Uncompress it and you will see
  - `contact_map.py` -- source code for BNMF
  - `yeast_chr_len.txt` -- chromosome lengths for the yeast genome
  - `hg18_chr_len.txt` -- chromosome lengths for the human genome using hg18 reference
  - `HindIII_intersect_EcoRI_fdr0.01_inter.txt` -- yeast inter-chromosome interactions [Duan2010]
  - `HindIII_intersect_EcoRI_fdr0.01_intra.txt` -- yeast intra-chromosome interactions [Duan2010]
  - `origins_nonCDR_early.txt` -- yeast early origin sites [Duan2010]
  - `IMR90.uij.chr22` -- human intra-chromosome interaction matrix at 40k resolution [Dixon2012]
  - `IMR90.domain.txt` -- topological domains defined for IMR90 cell line [Dixon2012]
  - `*.ipynb` -- the raw files used to generate this tutorial
  - `*.html` -- ipython notebook files transformed into html format
  
Then, let's go through following examples:
  
1. [A toy example to show the idea of BNMF](example.html)
2. [Studying on a yeast Hi-C contact map](yeast-hic.html)
3. [Studying on a human Hi-C contact map](human-hic.html)

If you have any problem on this tutorial or our paper, please contact the authors.
## References

1. Lieberman-Aiden et. al. 2009 Science. [Comprehensive mapping of long-Range interactions reveals folding principles of the human genome.](http://www.sciencemag.org/content/326/5950/289)
2. Duan et. al. 2010 Nature. [A three-dimensional model of the yeast genome.](http://www.nature.com/nature/journal/v465/n7296/full/nature08973.html)
3. Dixon et. al 2012 Nature. [Topological domains in mammalian genomes identified by analysis of chromatin interactions.](http://www.nature.com/nature/journal/v485/n7398/full/nature11082.html)