Skip to content

essHIC.dist

stefanofranzini edited this page Sep 27, 2020 · 1 revision
essHIC.dist(filename,metafile=None)

The dist class provides methods to analyze distance matrices. You can apply clustering and dimensional reduction techniques to the distance matrix, as well as assay the quality of the classifier by comparing distances between HiC experiments to their known labeling.


Parameters:

filename: string
path of the file containing the distance matrix.
metfile: string, default=None
path of the metadata file containing cell type information and a list of the outliers to remove

Attributes:

metafile: string
the metadata file containing the cell types of the HiC matrices and a list of outliers.
pseudo: numpy ndarray
which experiments are pseudo-replicates according to the metadata file.
col2lab: dictionary
dictionary which turns integer "color" numbers into the corresponding cell type label.
lab2col: dictionary
dictionary which turns cell type labels into the corresponding integer "color" number.
colors: numpy ndarray
list of the integer "color" numbers of the experiments, which encode their cell type.
mask: numpy ndarray
which experiments are being removed from the distance matrix according to the metadata file.
dist: numpy masked ndarray
masked array which contains the distances between all couples of experiments in the dataset. It masks the outliers according to the metadata file.
mdist: numpy ndarray
array which contains distances between all couples of experiments except the outliers given by the metadata file.
mcol: numpy ndarray
list of the integer "color" numbers of the experiments, except the outliers.
mpsd: numpy ndarray
list of the pseudo-replicate experiments after removing outliers.
dlist: numpy ndarray
list of the distances at which the ROC curve has been computed.
roc: numpy ndarray
ROC curve values.
sim_map: numpy ndarray
affinity map computed from the distance matrix.
MDSrep: numpy ndarray
n-dimensional positions of the experiments according to multidimensional scaling embedding.
clusters: numpy ndarray
list of the clusters labels.


Methods

method function
print_dist prints the distance to a file.
order orders experiments according to their cell type.
get_cmap builds a matrix which is one when two experiments have the same cell-type and zero otherwise.
get_roc_area returns the area under the ROC curve.
get_roc computes the ROC curve.
get_gauss_sim computes a affity matrix from the distances using a gaussian kernel.
MDS computes the multidimensional scaling embedding of the distance matrix.
spec_clustering computes clusters using spectral clustering.
hier_clustering computes hierarchical clustering.
get_dunn_score computes dunn score.
get_quality_score computes the quality score.
get_purity_score computes the purity score.
plot plots the distance matrix.
plot_masked plots the masked distance matrix.
plot_squares plots colored squares over the distance matrix to point out experiments with the same cell-type.
plot_similarity plots the affinity matrix.
plot_roc plots ROC curve.
show_hist plots and dispays a histogram of the distribution of the distances. Same cell type, different cell type, and pseudoreplicate experiments are colored differently.
show_MDS plots and displays the MDS embedding of the matrix.
show_clusters plots and displays a cartoon of the clusters.
show_dendrogram plots and displays a dendrogram of the clusters.
Clone this wiki locally