Sparse Projection Pursuit Analysis (SPPA)

Kurtosis-based projection pursuit analysis (PPA) was developed as an alternative exploratory data analysis algorithm. Instead of using variance and distance-based metrics to obtain, hopefully, informative projections of high-dimensional data (like PCA, HCA, and kNN), ordinary PPA searches for interesting projections by optimizing the kurtosis. However, if the sample-variable ratio is too low, it is possible for ordinary PPA to "overmodel" the data by finding spurious combinations of the original variables that give a low kurtosis value. To overcome this, one can compress their data with PCA prior to applying PCA (~10:1 sample-to-variable ratio). To make PPA independent of PCA, we have developed a sparse implementation of PPA (SPPA), where subsets of the original variables are selected using a genetic algorithm. This repository contains MATLAB code that can be used to apply SPPA to high-dimensional data, examples of SPPA in use, and the corresponding paper published on SPPA. Below is a figure from our recent paper that shows the basic approach of the algorithm.

MATLAB function

SPPA.m is a MATLAB function to perform sparse kurtosis-based projection pursuit using a genetic algorithm.

Citing this algorithm

Please cite Sparse Projection Pursuit Analysis: An Alternative for Exploring Multivariate Chemical Data (2020).

Structure of this repository

The master branch of this repository contains the original SPPA code (version 1.0) implemented for the work published in Sparse Projection Pursuit Analysis: An Alternative for Exploring Multivariate Chemical Data (2020). If available, enhancements to the original code can be found in additional branches named with a corresponding version number.

Current branches

Master - Original SPPA code (version 1.0)
Version 1.1 - Improved selection of initial population to ensure maximum coverage of variables. If population size is sufficient, each variable is selected a minimum of n times. Residual individuals are selected at random without repetition. This is more equitable than the original version which might exclude some variables and over-represent others.

Literature related to PPA

Literature related to SPPA

Sparse Projection Pursuit Analysis: An Alternative for Exploring Multivariate Chemical Data (2020)

Examples

To be completed. Please check demo.m for a quick demonstration showing the use of SPPA to explore a salmon plasma data set (Nuclear Magnetic Resonance (NMR) Spectroscopy).

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
common		common
.gitignore		.gitignore
README.md		README.md
SPPA.m		SPPA.m
Salmon.mat		Salmon.mat
demo.m		demo.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common

common

.gitignore

.gitignore

README.md

README.md

SPPA.m

SPPA.m

Salmon.mat

Salmon.mat

demo.m

demo.m

Repository files navigation

Sparse Projection Pursuit Analysis (SPPA)

MATLAB function

Citing this algorithm

Structure of this repository

Current branches

Literature related to PPA

Literature related to SPPA

Examples

About

Releases

Packages

Languages

S-Driscoll/SparseProjectionPursuit

Folders and files

Latest commit

History

Repository files navigation

Sparse Projection Pursuit Analysis (SPPA)

MATLAB function

Citing this algorithm

Structure of this repository

Current branches

Literature related to PPA

Literature related to SPPA

Examples

About

Topics

Resources

Stars

Watchers

Forks

Languages