This repository contains the code of the paper
T. Fuchs, F. Kalinke, K. Böhm (2025): "Partial-Label Learning with a Reject Option", Transactions on Machine Learning Research
This document provides (1) an outline of the repository structure and (2) steps to reproduce the experiments including setting up a virtual environment.
-
The folder
datacontains all datasets used within our work.- The subfolder
realworld-datasetscontains commonly used real-world datasets for partial-label learning, which are initially provided by Min-Ling Zhang. - The subfolder
ucippcontains UCI datasets that have been used in our controlled experiments. The files are provided by Luis Paulo on GitHub.
- The subfolder
-
The folder
experimentscontains the data to run all experiments. This directory is initially empty. Runpython script_create_data.pyto populate it. -
The folder
partial_label_learningcontains the code of the experiments.- The subfolder
related_workcontains all implementations of related work algorithms and our method. config.pycontains configurations.data.pycontains utility methods to generate and load data.pll_classifier_base.pyis the base class for all our implementations.result.pycontains utility methods to save the experiments' results.
- The subfolder
-
The folder
plotscontains all the plots that appear in the paper. -
The folder
reference_modelscontains source code for supervised models such as the LeNet architecture. -
The folder
resultscontains the results of all experiments. This directory is initially empty. Runpython script_runall.pyto populate it. -
The folder
saved_modelscontains saved variational auto-encoders for the MNIST datasets to be used by our nearest-neighbor method. -
Additionally, there are the following files in the root directory:
.gitignoreLICENSEdescribes the repository's licensing.README.mdis this document.requirements.txtis a list of all requiredpippackages for reproducibility.script_create_data.pyis a Python script to create all experimental data.script_create_plots.pyis a Python script to create all plots and tables in the paper.script_run_all.pyruns all experimental configurations in theexperimentsfolder.script_train_vae.pytrains a variational auto-encoder on the MNIST train datasets.
Before running scripts to reproduce the experiments, you need to set up an environment with all the necessary dependencies. Our code is implemented in Python (version 3.11.5; other versions, including lower ones, might also work).
We used virtualenv (version 20.24.3; other versions might also work) to create an environment for our experiments.
First, you need to install the correct Python version yourself.
Next, you install virtualenv with
| Linux + MacOS (bash-like) | Windows (powershell) |
python -m pip install virtualenv==20.24.3 |
python -m pip install virtualenv==20.24.3 |
To create a virtual environment for this project, you have to clone this repository first. Thereafter, change the working directory to this repository's root folder. Run the following commands to create the virtual environment and install all necessary dependencies:
| Linux + MacOS (bash-like) | Windows (powershell) |
python -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt |
python -m venv venv
.\venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -r requirements.txt |
Make sure that you created the virtual environment as stated above.
The script script_create_data.py creates all experimental settings including the artificial noise.
The script script_run_all.py runs all the experiments.
Running all experiments takes two to three days on a system with 48 cores and one NVIDIA GeForce RTX 3090.
| Linux + MacOS (bash-like) | Windows (powershell) |
python script_create_data.py
python script_run_all.py |
python script_create_data.py
python script_run_all.py |
This creates .parquet.gz files in results containing the results of all experiments.
The experiments' results are compressed .parquet files.
You can easily read any of them with pandas.
import pandas as pd
results = pd.read_parquet("results/xyz.parquet.gz")To obtain plots from the data, use the python script script_create_plots.py.
Note that this script requires a working installation of LaTeX on your local system.
Use the following snippets to generate all plots in the paper.
Generating all of them takes about half an hour.
| Linux + MacOS (bash-like) | Windows (powershell) |
python script_create_plots.py |
python script_create_plots.py |