iFlipper: Label Flipping for Individual Fairness

As machine learning becomes prevalent, mitigating any unfairness present in the training data becomes critical. Among the various notions of fairness, this paper focuses on the well-known individual fairness, which states that similar individuals should be treated similarly. While individual fairness can be improved when training a model (in-processing), we contend that fixing the data before model training (pre-processing) is a more fundamental solution. In particular, we show that label flipping is an effective pre-processing technique for improving individual fairness.
Our system iFlipper solves the optimization problem of minimally flipping labels given a limit to the individual fairness violations, where a violation occurs when two similar examples in the training data have different labels. We first prove that the problem is NP-hard. We then propose an approximate linear programming algorithm and provide theoretical guarantees on how close its result is to the optimal solution in terms of the number of label flips. We also propose techniques for making the linear programming solution more optimal without exceeding the violations limit. Experiments on real datasets show that iFlipper significantly outperforms other pre-processing baselines in terms of individual fairness and accuracy on unseen test sets. In addition, iFlipper can be combined with in-processing techniques for even better results.

Setup

Requirements

Create a conda environment (python=3.8.11) and install with setup.sh.

Manual installation

We can also manually install package with pip and conda.

conda install jupyter
conda install scikit-learn
conda install -c conda-forge aif360
pip install FALCONN
conda install pytorch==1.12.1 -c pytorch

You can also check version of packages from requirements.txt

License for Optimization Solver

Both MOSEK and CPLEX optimization packages are free for students and academics. Installing these solvers is straightforward, as you can simply follow the provided guidelines for each package.

# MOSEK
https://www.mosek.com/products/academic-licenses/
https://www.mosek.com/downloads/

# CPLEX
https://community.ibm.com/community/user/datascience/blogs/xavier-nodet1/2020/07/09/cplex-free-for-students

Datasets

Download and pre-process the datasets using IBM’s AI Fairness 360 toolkit: https://github.com/Trusted-AI/AIF360.

Hardware

The majority of my tasks involve heavy CPU usage, with only a small portion (graident method) requiring the use of PyTorch, which is more GPU-intensive. The runtime below is measured on

CPU: Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz
GPU: NVIDIA TITAN Xp $\times$ 2

Using iFlipper

Code description

The paper consists of four main experiments, each with its dedicated section. The corresponding codes for these experiments are provided below each section in the paper.

4.3.1: baseline.py
4.3.2: runtime.py
4.4: solution.py
4.5: ablation.py

Detailed usage of each python file can checked by --help argument.

All-In-One script

You can run bash run.sh to make all figures from the paper. Results is saved in results/. You can check runtime of each .py files in results/out.txt which is generated by run.sh.

Disclaimer

Runtime

Solving Integer Linear Programming (ILP) problems to find optimal solutions within integer values can be time-consuming, particularly when employing the Euclidean distance measure. In such cases, it is not uncommon for the ILP process to span several days before reaching the best solution.

Loading pre-trained model

Additionally, the baselines of the paper ([1], [2]) require a significant amount of time for comparison with iFlipper. To address this, we have saved the trained model in the baselines/ directory. You have the flexibility to either utilize the pre-trained model or train it yourself by making modifications to the options --load_LFR or --load_iFair as needed.

Possible errors

To resolve the error message "Could not load the Qt platform plugin 'xcb'" on Linux, you can rectify it by updating the system's packages and installing the necessary dependencies.

sudo apt-get update
sudo apt-get install -y qt5-default libxcb-xinerama0-dev libxcb-xinerama0 libxkbcommon-x11-0

References

[1]: Zemel, R., Wu, Y., Swersky, K., Pitassi, T., & Dwork, C. (2013, May). Learning fair representations. In International conference on machine learning (pp. 325-333). PMLR.

[2]: Lahoti, P., Gummadi, K. P., & Weikum, G. (2019, April). ifair: Learning individually fair data representations for algorithmic decision making. In 2019 ieee 35th international conference on data engineering (icde) (pp. 1334-1345). IEEE.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
baselines		baselines
experiments_demo		experiments_demo
iFlipper		iFlipper
results		results
.gitignore		.gitignore
README.md		README.md
ablation.py		ablation.py
baseline.py		baseline.py
configs.py		configs.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
run.sh		run.sh
runtime.py		runtime.py
setup.sh		setup.sh
solution.py		solution.py
techreport.pdf		techreport.pdf
test.ipynb		test.ipynb

khtae8250/iFlipper

Folders and files

Latest commit

History

Repository files navigation