MPSSC: Spectral clustering based on learning similarity matrix

Overview

MPSSC is a novel spectral clustering framework that imposes sparse structures on a target matrix. Specifically, it utilizes multiple doubly stochastic affinity matrices to construct a normalized graph Laplacian (Step 1). Then, it imposes a sparse structure on the target matrix (Step 2) followed by shrinking pairwise differences of the rows in the target matrix (Step 3). This spectral clustering method uses multiple similarity matrices via non-convex optimization framework. MPSSC solves the proposed non-convex problem iteratively with the embedded ADMM algorithm.

Main functions

clus_fin_update.m : Main MPSSC algorithm (with similarity learning) consisting of the three steps.

clus_fin_update_no_learning.m : Main PSSC algorithm (without similarity learning).

func_doubly.m : Contruct multiple doubly stochastic similarity matrices using Gaussian kernels (Step 1).

clus_sim_update2_2.m : Obtain the intermediate target matrix involving ADMM step (Step 2).

clus_sim_update0_3.m : Obtain the final target matrix involving ADMM step (Step 3).

Example files

Please follow the links to reproduce the clustering results of real scRNA-seq data sets

small-scale scRNA-seq data sets : Generate all the results related to the six small-scale scRNA-seq data sets.
large-scale scRNA-seq data sets : Generate all the results related to the three large-scale scRNA-seq data sets. These were run on the computing cluster (6 CPUs, 800 GB of memory).
generating all figures : Generate all figures presented in the manuscript.

Note Most of the simulations and scRNA-seq applications were implemented on an Apple MacBook Pro (2.7 GHz, 8 GB of memory) using the MATLAB 2016b. However, certain computational or memory-intensive steps (e.g. larger-scale data sets) were run on the computing cluster (6 CPUs, 800 GB of memory).

%Example using Deng data set:

clear all
addpath(genpath(pwd))


%% load data sets ('in_X' and 'true_labs')
load('Data_Deng.mat')

% Note: one can use any data set that consists of in_X and true_labs, where in_X is an n by p gene 
%expression matrix and true_labs is the ground truth labels. Here n and p are number of cells and 
%genes, respectively.


%% Penalty parameters. We use the following specification:
rho=0.2; lam=0.0001; lam2=lam; eta=1; c=0.1;  


%% Run MPSSC and obtain the target matrix P
[P] = clus_fin_update(rho, lam, lam2, eta, c, in_X, true_labs); 

%% Obtain clustering labels "clus_labs" and compute NMI measure:
[NMI, ~,clus_labs,~]=calc2_nmis(max(true_labs), double(P),true_labs);   

%% Compute performance measures
% Compute Purity
Purity=purity(max(true_labs), clus_labs, true_labs)

% Compute ARI
ARI=RandIndex(clus_labs,true_labs)

%%% Final output: Performances (three measures) of MPSSC
[NMI, Purity, ARI]

Example data sets

The 9 data sets are provided in the directory Data.

Specifically, the dataset of Data_Deng.mat refers to http://science.sciencemag.org/content/343/6167/193.

Data_Ting.mat refers to https://www.ncbi.nlm.nih.gov/pubmed/25242334.

Data_Treutlin.mat refers to https://www.ncbi.nlm.nih.gov/pubmed/24739965.

Data_Ginhoux.mat refers to https://www.ncbi.nlm.nih.gov/pubmed/26054720.

Data_Buettner.mat refers to https://www.ncbi.nlm.nih.gov/pubmed/25599176.

Data_Pollen.mat refers to https://www.nature.com/articles/nbt.2967.

For the large scale data, Data_Zeisel.mat refers to https://www.ncbi.nlm.nih.gov/pubmed/25700174.

Data_Tasic.mat refers to https://www.ncbi.nlm.nih.gov/pubmed/26727548.

Data_Macosko.mat refers to https://www.ncbi.nlm.nih.gov/pubmed/26000488.

DOWNLOAD

We provide MATLAB implementations of MPSSC in the MPSSC branch.

Authors

Seyoung Park and Hongyu Zhao

Department of Biostatistics, School of Public Health, Yale University

Contact

seyoung.park@yale.edu

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 200 Commits
MPSSC		MPSSC
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MPSSC: Spectral clustering based on learning similarity matrix

Overview

Main functions

Example files

Directory

Example data sets

DOWNLOAD

Authors

Contact

License

About

Releases

Packages

Languages

ishspsy/project

Folders and files

Latest commit

History

Repository files navigation

MPSSC: Spectral clustering based on learning similarity matrix

Overview

Main functions

Example files

Directory

Example data sets

DOWNLOAD

Authors

Contact

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages