Existing work on fairness modeling commonly assumes that sensitive attributes are fully available for all instances, which may not hold in many real-world applications due to the high cost of acquiring sensitive information. When sensitive attributes are not disclosed or available, it is in need to manually annotate some sensitive attributes as part of the training data for bias mitigation. However, selecting appropriate instances for annotation is a nontrivial task, since skewed distributions across sensitive groups lead to a sub-optimal solution which still preserves discrimination. In this work, we propose APOD, an end-to-end framework to actively select a small portion of representative instances for annotation and maximally mitigate algorithmic bias with limited annotated sensitive information.
An example of binary classification task (e.g. positive class denoted as gray + and •, negative class as red + and •) with two sensitive groups shown in the following figure. In the left-side figure, the positive instances (gray +) is significantly less than negative instances (red +) in group 0, which leads to a classification boundary deviated from perfect fair boundary. An intuitive way to annotate sensitive attributes is through random selection. The randomly selected instances follow the same skewed distribution across sensitive groups, which still preserve the bias information in the classification model, as shown in the middle figure.
As shown in the following figure, APOD integrates penalization of discrimination (POD) and active instance selection (AIS) in a unified and iterative framework. Specifically, in each iteration, POD focus on the debiasing of classifier f on the partially annotated dataset (x, y, a) ∈ S and (x, y) ∈ U; while AIS selects the optimal instance (x*, y*) from the unannotated dataset U that can promote the bias mitigation. The sensitive attribute of selected instance will be annotated by human experts: (x*, y*) → (x*, y*, a*). After that, the instance will be moved from the unannotated dataset U ← U\{(x*, y*)} to the annotated dataset S ← S ∪ {(x*, y*, a*)} for debiasing the model in the next iteration.
torch >= 1.9.0
scikit-learn >= 0.24.2
bash script/apd/medical.sh
bash script/fal/medical_fal.sh
bash script/DRO/medical_DRO.sh
bash script/lff/medical_lff.sh
cd test_script
python apd_test_eop.py
python fal_test.py
python lff_test.py
cd ../
cd plot
python acc_eop_plot_sota.py
cd ../