Unsupervised and Supervised Fourier Feature Selection (UFFS and SFFS)

This repository provides implementation of the Fourier-based feature selection algorithms — Fourier-Orth or UFFS and SFFS proposed in the following papers:

Finding Relevant Information via a Discrete Fourier Expansion
Mohsen Heidari, Jithin K. Sreedharan, Gil Shamir, and Wojciech Szpankowski
International Conference on Machine Learning (ICML), 2021
Information Sufficiency via Fourier Expansion
Mohsen Heidari, Jithin K. Sreedharan, Gil Shamir, and Wojciech Szpankowski
IEEE International Symposium on Information Theory (ISIT), 2021

Prerequisites

Python 3.5+ with NumPy, SciPy, scikit-learn, Cython support.
GCC/Clang

Installation

Compile the Cython files associated to Fourier-Orth and SFFS algorithms from the src folder

python setup.py build_ext --inplace

Using the Fourier-based feature selection algorithms

The following function shows how to use the UFFS (Fourier-Orth) and SFFS in a sequence.

import fourier_learning
import numpy as np

def SFFS(X, y, t, k, fourier_orth_params):
    """Perform feature selection based on SFFS algorithm (Algorithm 1 in the paper)

    Args:
        X: Input data; NumPy 2-dimensional array of size (no. of data samples, no. of features)
        y: Output data; Numpy 1-dimensional array of size no. of data samples
        t: Depth parameter of SFFS; int
        k: No. of desired features after feature selection; int
        fourier_orth_params: Parameters of Fourier-Orth (Procedure 1) algorithm; dictionary

    Returns:
        features_selected: Selected feature indices; list of integers.
    """
    # Perform Fourier-Orth orthogonalization
    # This step is not necessary and can be avoided. 
    # We can run SFFS directly without Fourier-Orth step
    fourier_orth_options = fourier_learning.OptionsUnsupervisedFourierFS(**fourier_orth_params)
    sel_features_fourier_orth = fourier_learning.UnsupervisedFourierFS(X, fourier_orth_options)

    # Fourier-Orth can be considered as an unsupervised feature selection algorithm
    # It act as a preprocessing step before applying SFFS
    X = X[:, sel_features_fourier_orth]
    mean_emp = np.mean(X, axis=0)
    std_emp = np.std(X, ddof=1, axis=0)

    # Perform SFFS
    fourier_featsel = fourier_learning.SupervisedFourierFS(
        k, mean_emp, std_emp, approx="depth_based", depth=DEPTH_T_SFFS)
    feats_selected = fourier_featsel.fit(X, y)

    return features_selected[:k]

Parameters of the Fourier-Orth procedure are:

max_depth (int): Depth parameter of Fourier-Orth procedure. We perform Fourier-Orth in a sequential way —- the procedure is first run with depth 1, then the training set is filtered with the selected features from depth 1 and is passed to depth 2 of the procedure. This process is repeated until the max_depth is reached.
cluster_sizes (list of ints): At each depth, to reduce the demand for high computational resources, we partition the features into batches, run Fouier-Orth on each batch and finally combine the results. This parameter specifies the sizes of clusters at each depth
norm_epsilon (list of floats): Threshold epsilon at step 9 in the procedure. The parameter lists epsilon at different depths.
selection_thresholds (list of floats): Equivalent to threshold epsilon at step 15, for each depth. We either use selection_thresholds as it is when employing a technique fof explained variance or 1-selection_thresholds if we use exact step 15.

An example is given below:

fourier_orth_params = {
    "max_depth": 3,
    "cluster_sizes": [-1, 50, 31],
    "selection_thresholds": [0.95, 0.995, 0.999],
    "norm_epsilon": [0.001, 0.001, 0.0001]
}

Citing the work

If you find our algorithms useful, please cite the following the papers:

@InProceedings{sfs_icml21,
  title     = {Finding Relevant Information via a Discrete Fourier Expansion},
  author    =  {Mohsen Heidari and Jithin K. Sreedharan and Gil Shamir and Wojciech Szpankowski},
  booktitle = {Proceedings of the 38th International Conference on Machine Learning},
  year      = {2021},
  series    = {Proceedings of Machine Learning Research},
  publisher = {PMLR},
}

@InProceedings{uffs_isit21,
  title     = {Information Sufficiency via Fourier Expansion},
  author    = {Mohsen Heidari and Jithin K. Sreedharan and Gil Shamir and Wojciech Szpankowski},
  booktitle = {Proc. IEEE Int. Symp. Information Theory (ISIT)},
  year      = {2021},
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Unsupervised and Supervised Fourier Feature Selection (UFFS and SFFS)

Prerequisites

Installation

Using the Fourier-based feature selection algorithms

Citing the work

About

Releases

Packages

Languages

License

jithin-k-sreedharan/Fourier_feature_selection

Folders and files

Latest commit

History

Repository files navigation

Unsupervised and Supervised Fourier Feature Selection (UFFS and SFFS)

Prerequisites

Installation

Using the Fourier-based feature selection algorithms

Citing the work

About

Resources

License

Stars

Watchers

Forks

Languages