This repository provides implementation of the Fourier-based feature selection algorithms — Fourier-Orth or UFFS and SFFS proposed in the following papers:
-
Finding Relevant Information via a Discrete Fourier Expansion
Mohsen Heidari, Jithin K. Sreedharan, Gil Shamir, and Wojciech Szpankowski
International Conference on Machine Learning (ICML), 2021 -
Information Sufficiency via Fourier Expansion
Mohsen Heidari, Jithin K. Sreedharan, Gil Shamir, and Wojciech Szpankowski
IEEE International Symposium on Information Theory (ISIT), 2021
- Python 3.5+ with NumPy, SciPy, scikit-learn, Cython support.
- GCC/Clang
Compile the Cython files associated to Fourier-Orth and SFFS algorithms from the src
folder
python setup.py build_ext --inplace
The following function shows how to use the UFFS (Fourier-Orth) and SFFS in a sequence.
import fourier_learning
import numpy as np
def SFFS(X, y, t, k, fourier_orth_params):
"""Perform feature selection based on SFFS algorithm (Algorithm 1 in the paper)
Args:
X: Input data; NumPy 2-dimensional array of size (no. of data samples, no. of features)
y: Output data; Numpy 1-dimensional array of size no. of data samples
t: Depth parameter of SFFS; int
k: No. of desired features after feature selection; int
fourier_orth_params: Parameters of Fourier-Orth (Procedure 1) algorithm; dictionary
Returns:
features_selected: Selected feature indices; list of integers.
"""
# Perform Fourier-Orth orthogonalization
# This step is not necessary and can be avoided.
# We can run SFFS directly without Fourier-Orth step
fourier_orth_options = fourier_learning.OptionsUnsupervisedFourierFS(**fourier_orth_params)
sel_features_fourier_orth = fourier_learning.UnsupervisedFourierFS(X, fourier_orth_options)
# Fourier-Orth can be considered as an unsupervised feature selection algorithm
# It act as a preprocessing step before applying SFFS
X = X[:, sel_features_fourier_orth]
mean_emp = np.mean(X, axis=0)
std_emp = np.std(X, ddof=1, axis=0)
# Perform SFFS
fourier_featsel = fourier_learning.SupervisedFourierFS(
k, mean_emp, std_emp, approx="depth_based", depth=DEPTH_T_SFFS)
feats_selected = fourier_featsel.fit(X, y)
return features_selected[:k]
Parameters of the Fourier-Orth procedure are:
max_depth
(int): Depth parameter of Fourier-Orth procedure. We perform Fourier-Orth in a sequential way —- the procedure is first run with depth 1, then the training set is filtered with the selected features from depth 1 and is passed to depth 2 of the procedure. This process is repeated until themax_depth
is reached.cluster_sizes
(list of ints): At each depth, to reduce the demand for high computational resources, we partition the features into batches, run Fouier-Orth on each batch and finally combine the results. This parameter specifies the sizes of clusters at each depthnorm_epsilon
(list of floats): Thresholdepsilon
at step 9 in the procedure. The parameter listsepsilon
at different depths.selection_thresholds
(list of floats): Equivalent to thresholdepsilon
at step 15, for each depth. We either useselection_thresholds
as it is when employing a technique fof explained variance or 1-selection_thresholds
if we use exact step 15.
An example is given below:
fourier_orth_params = {
"max_depth": 3,
"cluster_sizes": [-1, 50, 31],
"selection_thresholds": [0.95, 0.995, 0.999],
"norm_epsilon": [0.001, 0.001, 0.0001]
}
If you find our algorithms useful, please cite the following the papers:
@InProceedings{sfs_icml21,
title = {Finding Relevant Information via a Discrete Fourier Expansion},
author = {Mohsen Heidari and Jithin K. Sreedharan and Gil Shamir and Wojciech Szpankowski},
booktitle = {Proceedings of the 38th International Conference on Machine Learning},
year = {2021},
series = {Proceedings of Machine Learning Research},
publisher = {PMLR},
}
@InProceedings{uffs_isit21,
title = {Information Sufficiency via Fourier Expansion},
author = {Mohsen Heidari and Jithin K. Sreedharan and Gil Shamir and Wojciech Szpankowski},
booktitle = {Proc. IEEE Int. Symp. Information Theory (ISIT)},
year = {2021},
}