Feature Selection with Quantum Computing

This repository contains the source code for the article "Towards Feature Selection for Ranking and Classification Exploiting Quantum Annealers" published at SIGIR 2022. See the websites of our quantum computing group for more information on our teams and works.

Here we explain how to install dependencies, setup the connection to D-Wave Leap quantum cloud services and how to run experiments included in this repository.

If you want to cite us or use our repository you can use the following bibtex entry:

@inproceedings{DBLP:conf/sigir/DacremaMN0FC22,
  author    = {Maurizio {Ferrari Dacrema} and Fabio Moroni and Riccardo Nembrini and Nicola Ferro and Guglielmo Faggioli and Paolo Cremonesi},
  editor    = {Enrique Amig{\'{o}} and Pablo Castells and Julio Gonzalo and Ben Carterette and J. Shane Culpepper and Gabriella Kazai},
  title     = {Towards Feature Selection for Ranking and Classification Exploiting Quantum Annealers},
  booktitle = {{SIGIR} '22: The 45th International {ACM} {SIGIR} Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11 - 15, 2022},
  pages     = {2814--2824},
  publisher = {{ACM}},
  year      = {2022},
  url       = {https://doi.org/10.1145/3477495.3531755},
  doi       = {10.1145/3477495.3531755},
}

Installation

NOTE: This repository requires Python 3.8 and has been developed for Linux

It is suggested to install all the required packages into a new Python environment. So, after repository checkout, enter the repository folder and run the following commands to create a new environment:

If you're using conda:

conda create -n QFeatureSelection python=3.8 anaconda
conda activate QFeatureSelection

If you run the experiments on the terminal it may be necessary to add this project in the PYTHONPATH environmental variable:
export PYTHONPATH=$PYTHONPATH:/path/to/project/folder

Then, make sure you correctly activated the environment and install all the required packages through pip:

pip install -r requirements.txt

D-Wave Setup

In order to make use of D-Wave cloud services you must first sign-up to D-Wave Leap and get your API token.

Then, you need to run the following command in the newly created Python environment:

dwave setup

This is a guided setup for D-Wave Ocean SDK. When asked to select non-open-source packages to install you should answer y and install at least D-Wave Drivers (the D-Wave Problem Inspector package is not required, but could be useful to analyse problem solutions, if solving problems with the QPU only).

Then, continue the configuration by setting custom properties (or keeping the default ones, as we suggest), apart from the Authentication token field, where you should paste your API token obtained on the D-Wave Leap dashboard.

You should now be able to connect to D-Wave cloud services. In order to verify the connection, you can use the following command, which will send a test problem to D-Wave's QPU:

dwave ping

PyMIToolbox Setup

PyMIToolbox is a Python wrapper to the C library MIToolbox which is used to compute Mutual Information.

Download

In order to use PyMIToolbox you first need to download and compile the MIToolbox library in the PyMIToolbox directory. To download the MIToolbox source code execute the following command:

cd PyMIToolbox/
wget https://github.com/Craigacp/MIToolbox/archive/refs/tags/v3.0.2.zip

Unzip the file with:

unzip v3.0.2.zip

and rename the extracted folder with:

mv MIToolbox-3.0.2 MIToolbox

Building the C library

Now, go into the MIToolbox directory and compile the C library. If you are on Linux or macOS run the following command:

cd MIToolbox/
make x64

while on Windows, install MinGW, add MinGW binaries to the PATH and run:

make x64_win

This will result in a compiled library file (.so on Linux/macOS and .dll on Windows) to be placed in the PyMIToolbox/MIToolbox/ folder. If you don't see the file, it may have been compiled to another directory and should be moved to the correct folder.

Running Classification Experiments

To run the experiments enter the root folder of the project, activate the environment and run the following script:

conda activate QFeatureSelection
python run_feature_selection.py

This python script will automatically download and split the datasets used in the experiments. The resulting splits are saved in the results_classification/[dataset_name]/data directory.

The script will then proceed to run all experiments: baseline and QUBO with both classical and quantum based solvers. All the results will be saved in the results_classification/[dataset_name]/[method_name] directory.

NOTE: Running all the experiments requires a significant amount of QPU time and will exhaust all the free time given with the developer plan on D-Wave Leap. If the available time runs out it will result in errors or invalid selections. We suggest to select a limited number of datasets at a time.

For each dataset the script will also generate a dataframe summarizing the results and at the end of all experiments it will generate summary tables in latex format.

Within each dataset folder the file result_dataset_summary.csv will contain one row per each feature selection method and, for QUBO methods, QUBO solvers. The row is selected as the one with the best validation score across all the target numbers of features (i.e., k) for that experiment.

Running Ranking Experiments

To run the Ranking experiments on LETOR, you first need to download the datasets:

Downoad OHSUMED

Download MQ2007

Download MQ2008

After downloading, unzip them in the folder data/letor/ in order to have the structure data/letor/[dataset_name].

RankLib

In order to execute the Learning to Rank algorithm you need the RankLib library. In the experiments RankLib 2.17 is used, you can download it here. Place the downloaded RankLib-2.17.jar file in the RankLib/ directory.

Running

To run the experiments enter the root folder of the project, activate the environment and run the following script:

conda activate QFeatureSelection
python run_letor_feature_selection.py

The script will proceed to run all experiments: baseline and QUBO with both classical and quantum based solvers. All the results will be saved in the results_ranking/[dataset_name]/[method_name] directory.

NOTE: Running all the experiments requires a significant amount of QPU time and will exhaust all the free time given with the developer plan on D-Wave Leap. If the available time runs out it will result in errors or invalid selections. We suggest to select a limited number of datasets at a time.

For each dataset the script will also generate a dataframe summarizing the results. Within each dataset folder the file result_dataset_summary.csv will contain (slightly differently from classification experiments) all the feature selection information.

The results of the ranking algorithm will be instead saved in the results_ranking/processed/[dataset_name]_eval.csv files.

Note that the ranking experiments have been tested only on a Unix system with bash. Running them on Windows may require additional setup.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
FeatureSelection		FeatureSelection
Letor		Letor
PyMIToolbox		PyMIToolbox
utils		utils
.gitignore		.gitignore
DataLoader.py		DataLoader.py
LICENSE		LICENSE
LetorLoader.py		LetorLoader.py
README.md		README.md
SIGIR - Towards Feature Selection - Supplementary.pdf		SIGIR - Towards Feature Selection - Supplementary.pdf
requirements.txt		requirements.txt
run_feature_selection.py		run_feature_selection.py
run_letor_feature_selection.py		run_letor_feature_selection.py

License

qcpolimi/SIGIR22_QuantumFeatureSelection

Folders and files

Latest commit

History

Repository files navigation

Feature Selection with Quantum Computing

Installation

D-Wave Setup

PyMIToolbox Setup

Download

Building the C library

Running Classification Experiments

Running Ranking Experiments

RankLib

Running

About

Topics

Resources

License

Stars

Watchers

Forks

Languages