This archive includes code and data for the manuscript: "Fairness and Unfairness in Binary and Multiclass Classification: Quantifying, Calculating, and Bounding" by Sivan Sabato, Eran Treister and Elad Yom-Tov.
The code is released under the MIT Open Source license.
*.m files are Matlab code files, which we ran using Matlab R2021b. *.py files are pytnon3.9 code files.
- README - this file
- find_lb.m - the implementation of Alg. 1 from the paper. Type "help find_lb" for input and output arguments.
-The following scripts use the data file USCensus1990raw.data.mat which can be downloaded from the following link: https://archive.ics.uci.edu/ml/datasets/US+Census+Data+(1990)
- census_commands.m - the script for running the experiments for binary classification with beta = 1 on the UC Census data set.
- census_commands_beta.m - the script for running the experiments for binary classification with variable beta on the UC Census data set.
- cancer.m - a script for running the experiments on the classifiers generated from search-engine data.
- poll_run.m - a script for running the poll experiments. This script requires downloading the relevant data; See explanataions on how to do this in the comment at the top of the script.
- cancer_mortality.m - a script for running the cancer mortality experiments. This script requires downloading the relevant data; See explanataions on how to do this in the comment at the top of the script.
- simpleClassifierAnalysis.m - an auxiliary file
- generate_compas_classifier.py - generate the classifier for the COMPAS dataset, save informaion into a csv file.
- compas.m - generate output for the COMPAS experiments
- adultRunExp.m - generate output for the Adult experiments
- cancer_data.m - the statistics of the classifiers generated by the search-engine data in the first reported experiment. Used by the cancer.m script
- calcunfairness.m
- calculate_classifier.m
- find_best_alphas.m
The following files contain the experiment results that were used to generate the graphs in the paper.
- census_commands_tree.csv: results of census_commands.m with decision tree classifiers
- census_commands_linear.csv: results of census_commands.m with linear classifiers
- discdata_tree.csv: results of census_commands_beta.m with decision tree classifiers
- discdata_linear.csv: results of census_commands_beta.m with linear classifiers
- cancer.csv: results of cancer.m
- polls.csv: results of poll_run.m
- cancer_mortality.csv: results of cancer_mortality.m
- runexps.py - the main python script for running the experiments. It uses the modules in the following files:
- localmin.py
- solve_large.py
- load_mat_params.py
- run_census_multiclass.m - the script for generating the multiclass classifiers for the UC Census multiclass experiments. This script uses the data file USCensus1990raw.data.mat (see above on how to obtain it)
- Files for generating the input for the Natality data set experiments:
- runner_read_births_data.m - reads the relevant data from the input data set file. This file can be downloaded here: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/DVS/natality/Nat2017us.zip
- runner_test_train_split_births.m - splits the data into train and test
- runner_model_births.m - generates a classifier from the variables generated by the previous scripts. The classifier type is determined by the variable 'classifier_type'.
- get_labor_params.m - genrates a data file for the experiments from the classifier file generated by the previous script.
- education.py - reads the US Education data file (downloaded from here: https://data.ers.usda.gov/reports.aspx?ID=17829) and saves a data file to be used in the experiments.
- ukelections.py - reads the UK elections data file (downloaded from here: https://commonslibrary.parliament.uk/research-briefings/cbp-8647/) and saves a data file to be used in the experiments.
- census_commands_multiclass.m - a script used by run_census_multiclass.m.
- calculate_classifier_multiclass.m - a script used by run_census_multiclass.m.