Robust estimation of mean and covariance in high dimensions
Switch branches/tags
Nothing to show
Clone or download
Latest commit 13346a6 Mar 15, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
comparisonCode added folders Mar 15, 2018
filterCode added folders Mar 15, 2018
genomicData added colors back Mar 15, 2018
testCode added folders Mar 15, 2018
README.md Update README.md Mar 16, 2018

README.md

Being Robust (in High Dimensions) Can Be Practical

A MATLAB implementation of Being Robust (in High Dimensions) Can Be Practical from ICML 2017.

Prerequisites

This project requires installation of the following packages:

  • Fast SVD and PCA: Used for faster computation of SVD for some large matrices.
  • Novembre_etal_2008_misc: A repository containing data from Genes Mirror Geography in Europe. Used for our semi-synthetic experiments. Extract the following three files to robust-filter/genomicData:
    • POPRES_08_24_01.EuroThinFinal.LD_0.8.exLD.out0-PCA.eigs
    • POPRES_08_24_01.EuroThinFinal.LD_0.8.exLD.out0-PCA.eval
    • POPRESID_Color.txt

Optional

The following packages are optional, and are only used for comparison of our algorithm with alternative estimators:

Explanation of Files

This repository contains several algorithms -- we identify files relevant to our algorithm, and those which are only used for comparison with alternatives.

Our Filter algorithms' files

The following are files which are used in our algorithms' implementations, and can all be found in the filterCode subdirectory.

  • filterGaussianMean.m: Our algorithm for estimating the mean of a Gaussian
  • filterGaussianCov.m: Our algorithm for estimating the covariance of a Gaussian
    • findMaxPoly.m: Finds the structured degree-two polynomial which is maximized by the data
    • flatten.m and sharpen.m: Convert between matrix and vector representations
  • filterGaussianCovTuned.m: A version of our covariance estimation algorithm which is tuned to select hyperparameters
  • mahalanobis.m: Compute Mahalanobis rescaling of a matrix

Test files, for demonstrating performance of our estimators for mean and covariance. These can be found in the testCode subdirectory:

  • testGaussianMean.m: Tests our mean estimation algorithm
  • testGaussianCov.m: Tests our covariance estimation algorithm
  • testGenomicData.m: Tests our covariance estimation algorithm on semi-synthetic genome dataset

Other algorithms' files

The following are files which are used for comparison with other competing algorithms, and can be found in the comparisonCode subdirectory:

Algorithms for estimating the mean of a Gaussian:

  • ransacGaussianMean.m: A RANSAC-based method
  • geoMedianGaussianMean.m: Geometric median
  • pruneGaussianMean.m: Coordinate-wise median, followed by naive pruning of distant points

Algorithms for estimating the covariance of a Gaussian:

  • pruneGaussianCov.m: Naive pruning of distant points
  • ransacMVE.m: A RANSAC-based method
    • MVE.m: Approximating the MVE for a small dataset
  • ADPCP.m: Principal Component Pursuit by Alternating Directions, from Robust Principal Component Analysis?
    • specThresh.m and shrinkage.m: Singular value thresholding and shrinkage operators
    • norm_nuc.m: Compute the nuclear norm of a matrix

Comparison files, for evaluating and comparing the performance of estimators for mean and covariance:

  • testGeoMedian.m: Demonstrates that the geometric median incurs an O(sqrt(d)) loss in accuracy
  • compareMeanEstimators.m: Compares mean estimation algorithms
  • compareCovEstimators.m: Compares covariance estimation algorithms
  • compareGenomicData.m: Compares covariance estimation algorithms to semi-synthetic genome dataset

Reproducibility

Figures in the paper can be approximately reproduced by running the following scripts, which are in the comparisonCode subdirectory:

  • Figure 1: compareMeanEstimators.m
  • Figure 2: compareCovEstimators.m
  • Figures 3 and 4: compareGenomicData.m

Reference

This repository is an implementation of our paper Being Robust (in High Dimensions) Can Be Practical from ICML 2017, authored by Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, and Alistair Stewart.

See also our original theory paper, Robust Estimators in High Dimensions without the Computational Intractability, which appeared in FOCS 2016.