Skip to content

smair/archetypalanalysis-coreset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coresets for Archetypal Analysis

This repository contains the source code of the paper Coresets for Archetypal Analysis.

Abstract

Archetypal analysis (AA) represents instances as linear mixtures of prototypes (the archetypes) that lie on the boundary of the convex hull of the data. Archetypes are thus often better interpretable than factors computed by other matrix factorization techniques. However, the interpretability comes with high computational cost due to additional convexity-preserving constraints. In this paper, we propose efficient coresets for archetypal analysis. Theoretical guarantees are derived by showing that quantization errors of k-means upper bound archetypal analysis; the computation of a provable absolute-coreset can be performed in only two passes over the data. Empirically, we show that the coresets lead to improved performance on several data sets.

visualization of the approach

Prerequisites

You might consider building the nnls module which has a higher number of max. iterations for improved stability of solving non-negative least squares problems which is needed for Archetypal Analysis.

$ bash build_nnls.sh

The code was tested with the following versions:

  • python 3.7.3
  • numpy 1.16.4
  • scipy 1.3.1
  • sklearn 0.21.2
  • ray 0.7.3

In the paper we used the following four datasets which are not included in this repository:

You have to download them yourself and specify the location within experiment_settings.py.

Usage

The file example.py shows how to run Archetypal Analysis on the full data set as well as on a coreset and compares the results.

To perform experiments similar to those in the paper you can run

$ python3 run_experiment.py NAME_OF_DATASET

To replicate the experiments of the paper you have to run

$ python3 run_experiment.py ijcnn1
$ python3 run_experiment.py pose
$ python3 run_experiment.py song
$ python3 run_experiment.py covertype

About

Coresets for Archetypal Analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published