Skip to content

essHIC.hic

stefanofranzini edited this page Sep 27, 2020 · 1 revision
essHIC.hic(datafile,from_pairs=True)

The hic class provides a wrapper for an HiC matrix which also contains its metadata. It can also perform some operations on the matrix, such as normalization and cleaning, and compute its spectrum.

It relies on the name format adopted by make_hic and the metadata.txt and chromosomes.txt files it generates to automatically extract metadata about the matrix.


Parameters:

datafile: string
Name of the binary file which contains the hic matrix.
from_pairs: bool, default=True
If *True*, it will read the matrix from a binary file containing the indexes of the bins and their value.

Attributes:

matrix: numpy.ndarray
the HiC matrix.
indir: string
the root directory of the data, which should contain all metadata files.
refname: string
reference name of the experiment of the HiC matrix.
norm: string
kind of normalizationg applied to the matrix.
chromo: integer
number of the chromosome of the matrix.
res: string
resolution of the matrix.
length: integer
size of the matrix at the chosen resolution.
eig: numpy.ndarray
array of floats containing the computed eigenvalues of the matrix
eigv: numpy.ndarray
array of floats containing the normalized eigenvectors of the matrix.


Methods

method function
resize computes a lower resolution version of the matrix.
decay_norm computes the decay normalization of the matrix.
clean remove empty rows and columns from the matrix.
vc_norm applies the vanilla coverage normalization to the matrix.
vcsqrt_norm applies the square root vanilla coverage normalization to the matrix.
pearson computes a matrix of the correlation coefficients between rows of the matrix.
laplacian computes laplacian of the matrix.
reduce computes the essential matrix.
get_spectrum computes the spectrum of the matrix.
norm_spectrum applies normalizations to the spectrum.
print_matrix prints the matrix to a binary file.
plot plots matrix.
plot_chromosome plots chromosome cartoon.

__init__(datafile,from_pairs=True)

initialize self.


resize

resize(self,res_factor)

computes a new matrix at a fraction of the original matrix resolution. Each bin of the new matrix contains the average of the res_factor x res_factor bins of the original matrix.

Parameters:

res_factor: integer
factor by which the resolution is decreased.

Returns:

none

decay_norm

decay_norm(self)

computes the decay normalization (or observed vs expected normalization ) of the matrix.

Returns:

none

clean

clean(self)

removes empty rows and columns from the matrix.

Returns:

none

vc_norm

vc_norm(self,res_factor)

computes the vanilla coverage normalization of the matrix.

Returns:

none

vcsqrt_norm

vcsqrt_norm(self,res_factor)

computes the square root vanilla coverage normalization of the matrix.

Returns:

none

pearson

pearson(self)

computes the matrix of the correlations between the columns of the original HiC matrix, using the pearson correlation factor.

Returns:

none

laplacian

laplacian(self)

computes the laplacian matrix from the original HiC matrix.

Returns:

none

reduce

reduce(self,nvec=10,order='abs',norm=1.):

computes the essential matrix from the first nvec eigenspaces of the original HiC matrix. The new matrix is given by

where are the normalized eigenvalues of the original matrix.

Parameters:

nvec: integer
number of eigenspaces to use; negative to compute whole spectrum.
order:{'abs','sgn'}, default='abs'
order of the eigenspaces. 'abs' orders them in decreasing order according to their eigenvalue absolute value, 'sgn' orders them in decreasing order according to their eigenvalue.
norm: {'flat','norm','none', float}, default=1.0
normalization to apply to the spectrum (see **norm_spectrum**).

Returns:

none

get_spectrum

get_spectrum(self,nvec=-1,order='abs')

computes the first nvec eigenvalues and eigenvector of the HiC matrix, according to the chosen ordering.

Parameters:

nvec: integer
number of eigenspaces to compute; negative to compute whole spectrum.
order:{'abs','sgn'}, default='abs'
order of the eigenspaces. 'abs' orders them in decreasing order according to their eigenvalue absolute value, 'sgn' orders them in decreasing order according to their eigenvalue.

Returns:

eig: numpy.ndarray
array of the eigenvalues
eigv: numpy.ndarray
array of the eigenvectors

norm_spectrum

norm_spectrum(self,norm=1.0)

applies normalization to the computed eigenvalues. There are various normalization modes.

  1. flat: flattens the spectrum, the values of the eigenvalues become
  1. norm: normalizes the computed eigenvalues so that the sum of their squares is 1.
  1. none: does not normalize the spectrum and preserves the original eigenvalues.

  2. float: if the normalization mode is a number p, it normalizes the eigenvalues so that their norm is 1.

Parameters:

norm: {'flat','norm','none',float}, default=1.0
normalization mode.

Returns:

none

print_matrix

print_matrix(self,save)

prints the matrix as a binary file containing the indices of non-zero bins and their values.

Parameters:

save: string
output file.

Returns:

none

plot

plot(self, vmax=2.5, vmin=0.0, cmap='Reds', plotkind='flat', cbar=False, triangle=False)

plots a heatmap matrix according to the specifications. There are several plotting modes:

  1. flat: plots the heatmap of the matrix.
  2. log: plots the heatmap of the LogNorm of the matrix (see matplotlib documentation)
  3. bilog: plots the SymLogNorm of the matrix (see matplotlib documentation)

Parameters:

vmax: float, default=2.5
maximum of the heatmap color range.
vmin: float, default=0.0
minimum of the heatmap color range.
cmap: string, default='Reds'
color map to use.
plotkind: string, default='flat'
plotting mode to use.
cbar: bool, default=False
if True, plots a color bar next to the heatmap.
triangle: bool, default=False
if True, only plots the lower triangular matrix.

Returns:

none

plot_chromosome

plot_chromosome(self,centromere='none',regions='none',bins='none',orientation='horizontal',ticks="none")

plots a cartoon of the chromosome next to or below the heatmap. It can be used to display one dimensional information about the genome side by side with the heatmap.

Parameters:

centromere: {'none','auto',float}, default='None'
if it is not 'none', it draws the centromere position on the cartoon. If 'auto' it uses human centromeres.
regions:{'none',list of dict}, default='none'
if not 'none' colors regions according to the color indicated by the dictionary. Each region dictionary in the list must contain the key 'bounds' corresponding to a list of two integers (the boundaries of the region), and the key 'color' which contains a color in the hex format.
bins:{'none',list of dict}, default='none'
if not 'none' colors each bin indicated according to the chosen color. Each bin dictionary in the list must contain the key 'bins' corresponding to a list of integers (the bins to color), and the key 'color' which contains a color in the hex format.
orientation:{'horizontal','vertical'}, default='horizontal'
whether the cartoon should be drown in the horizontal orientation or the vertical orientation.
ticks:{'none',integer}, default='none'
set ticks on the cartoon every *ticks* bins.

Returns:

none