Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Implementation of kmeans SDP and experiments from paper: Clustering subgaussian mixtures by semidefinite programming. Dustin Mixon, Soledad Villar, Rachel Ward http://arxiv.org/abs/1602.06612
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
SDP clustering on NMIST data ***** Authors: Dustin Mixon, Soledad Villar, Rachel Ward. contact: firstname.lastname@example.org ***** This code runs Peng and Wei's kmeans SDP  (using SDPNAL+ ) and Matlab's built in kmeans++ version of Lloyd's algorithm on NMIST data  previously preprocessed using TensorFlow  (mapped into feature space and saved in './data/data_features.mat'). It prints out a numerical comparison and produces the graphs in . The python script generate_data.py uses TensorFlow to create the feature data and saves it in './data/data_features.mat'. It is not necessary to run the python code in this experiment since data is already included but we include the code for completeness. ***** Kmeans SDP implementation The kmeans semidefinite relaxation has several advantages with respect to Lloyds algorithm and kmeans++. - It is deterministic. - It always converges to the global minimizer of kmeans SDP objective. (kmeans++ often gets stuck in local optimizers). - Kmeans SDP produces the kmeans optimal for data coming from the stochastic ball model with high probability . - It produces an approximate optimal solution for points coming from separated subgaussian mixtures . - If P is a matrix which columns are the coordinates of the data points to cluster; and X is the solution of kmeans SDP, then P*X is a ‘denoised’ version of the data points. In fact, we observe empirically that, even when the relaxation is not tight, X has many repeated columns. - Dual certificates of this SDP can be leveraged to implement a fast certifier of kmeans optimality of clusterings found by faster algorithms (see ). ***** Requires CVX and SDPNAL+. Note: CVX is only required for running misclassification.m which is not critic for this experiment. ***** How to run Run main.m (in Matlab) Use kmeans_sdp.m to solve the kmeans SDP optimization problem for a provided set of points and number of clusters. ***** References:  Abadi et al. TensorFlow: Large-scale machine learning on heterogeneous systems.  Iguchi, Mixon, Peterson, Villar. Probably certifiably correct kmeans clustering.  LeCun, Cortes. Mnist handwritten digit database.  Mixon, Villar, Ward. Clustering subgaussian mixtures via semidefinite programming  Peng, Wei. Approximating k-means-type clustering via semidefinite programming.  Yang, Sun, Toh. Sdpnal+: a majorized semismooth newton-cg augmented lagrangian method for semidefinite programming with nonnegative constraints.  CVX Research, Inc. CVX: Matlab Software for Disciplined Convex Programming