Being Robust (in High Dimensions) Can Be Practical
A MATLAB implementation of Being Robust (in High Dimensions) Can Be Practical from ICML 2017.
This project requires installation of the following packages:
- Fast SVD and PCA: Used for faster computation of SVD for some large matrices.
- Novembre_etal_2008_misc: A repository containing data from Genes Mirror Geography in Europe. Used for our semi-synthetic experiments. Extract the following three files to
The following packages are optional, and are only used for comparison of our algorithm with alternative estimators:
- AgnosticMeanAndCovarianceCode: An implementation of the algorithms from the paper Agnostic Estimation of Mean and Covariance from FOCS 2016, authored by Kevin A. Lai, Anup B. Rao, and Santosh Vempala. Used to compare the statistical accuracy of our algorithms against theirs.
- CVX: A MATLAB library for convex optimization. Used for implementing methods from Robust Principal Component Analysis? and Robust PCA via Outlier Pursuit.
Explanation of Files
This repository contains several algorithms -- we identify files relevant to our algorithm, and those which are only used for comparison with alternatives.
Our Filter algorithms' files
The following are files which are used in our algorithms' implementations, and can all be found in the
filterGaussianMean.m: Our algorithm for estimating the mean of a Gaussian
filterGaussianCov.m: Our algorithm for estimating the covariance of a Gaussian
findMaxPoly.m: Finds the structured degree-two polynomial which is maximized by the data
sharpen.m: Convert between matrix and vector representations
filterGaussianCovTuned.m: A version of our covariance estimation algorithm which is tuned to select hyperparameters
mahalanobis.m: Compute Mahalanobis rescaling of a matrix
Test files, for demonstrating performance of our estimators for mean and covariance. These can be found in the
testGaussianMean.m: Tests our mean estimation algorithm
testGaussianCov.m: Tests our covariance estimation algorithm
testGenomicData.m: Tests our covariance estimation algorithm on semi-synthetic genome dataset
Other algorithms' files
The following are files which are used for comparison with other competing algorithms, and can be found in the
Algorithms for estimating the mean of a Gaussian:
ransacGaussianMean.m: A RANSAC-based method
geoMedianGaussianMean.m: Geometric median
pruneGaussianMean.m: Coordinate-wise median, followed by naive pruning of distant points
Algorithms for estimating the covariance of a Gaussian:
pruneGaussianCov.m: Naive pruning of distant points
ransacMVE.m: A RANSAC-based method
MVE.m: Approximating the MVE for a small dataset
ADPCP.m: Principal Component Pursuit by Alternating Directions, from Robust Principal Component Analysis?
shrinkage.m: Singular value thresholding and shrinkage operators
norm_nuc.m: Compute the nuclear norm of a matrix
Comparison files, for evaluating and comparing the performance of estimators for mean and covariance:
testGeoMedian.m: Demonstrates that the geometric median incurs an O(sqrt(d)) loss in accuracy
compareMeanEstimators.m: Compares mean estimation algorithms
compareCovEstimators.m: Compares covariance estimation algorithms
compareGenomicData.m: Compares covariance estimation algorithms to semi-synthetic genome dataset
Figures in the paper can be approximately reproduced by running the following scripts, which are in the
- Figure 1:
- Figure 2:
- Figures 3 and 4:
This repository is an implementation of our paper Being Robust (in High Dimensions) Can Be Practical from ICML 2017, authored by Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, and Alistair Stewart.
See also our original theory paper, Robust Estimators in High Dimensions without the Computational Intractability, which appeared in FOCS 2016.