# Building Datasets

In most of our examples, we use dataset_loader to avoid boilerplate code when training fair classifiers. 
This notebook sets out how to create similar code for new datasets.

For evaluating and fitting fair classifiers we require access to the group each datapoint is assigned to and the target (i.e. ground-truth) label the classifier is trying to predict. 

For sklearn, classifiers assume that they only recieve the data used to predict, and as the target labels should never be passed with the rest of the data, and the groups should only be passed if the classifier uses them directly (i.e. if we are not using infered attributes).



There are four important cases:
1. Fair Classifiers using autogluon.
    Create a dataframe or tabular dataset containing all data used for classification, target labels, and groups.
    Autogluon takes pandas dataframes or their own internal tabular dataset type and only uses the columns the model was trained on to classify the data.
    When using infered attributes you should ensure that neither the classifier predicting groups nor the classifier predicting target labels has access to the groups or target labels at training time. This is taken care for you automatically by using  `oxonfair.inferred_attribute_builder`.
2. Fair Classifiers using Sklearn with known attributes.
    Create a dataset by calling `oxonfair.build_data_dict` with two arguments - the target labels `y` and the data `X` used by the classifier. 
3. Fair Classifiers using Sklearn with inferred attributes. 
    Create a dataset by calling `oxonfair.build_data_dict` with three arguments - the target labels `y`, the data `X` used by the classifier, and the groups. 
4. Fair Classifiers using Deep networks.
    Create a classifier by calling `oxonfair.DeepFairPredictor` with three arguments - the target labels, the predictions made by the classifier, and the groups. See [this notebook for examples](quickstart_DeepFairPredictor_computer_vision.ipynb).

## AutoGluon Example


In [1]:
from autogluon.tabular import TabularDataset, TabularPredictor
import oxonfair
from oxonfair import FairPredictor, inferred_attribute_builder 
from oxonfair.utils import group_metrics as gm
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
# Train base classifier
predictor = TabularPredictor(label='class').fit(train_data=train_data, time_limit=3)

  from .autonotebook import tqdm as notebook_tqdm


No path specified. Models will be saved in: "AutogluonModels/ag-20240617_142353"


No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets.
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='best_quality'   : Maximize accuracy. Default time_limit=3600.
	presets='high_quality'   : Strong accuracy with fast inference speed. Default time_limit=3600.
	presets='good_quality'   : Good accuracy with very fast inference speed. Default time_limit=3600.
	presets='medium_quality' : Fast training time, ideal for initial prototyping.


Beginning AutoGluon training ... Time limit = 3s


AutoGluon will save models to "AutogluonModels/ag-20240617_142353"


AutoGluon Version:  1.1.0
Python Version:     3.10.13
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 23.5.0: Wed May  1 20:14:38 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6020
CPU Count:          10
Memory Avail:       8.10 GB / 16.00 GB (50.6%)
Disk Space Avail:   363.56 GB / 460.43 GB (79.0%)


Train Data Rows:    39073


Train Data Columns: 14


Label Column:       class


AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).


	2 unique label values:  [' <=50K', ' >50K']


	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])


Problem Type:       binary


Preprocessing data ...


Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K


	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.


Using Feature Generators to preprocess the data ...


Fitting AutoMLPipelineFeatureGenerator...


	Available Memory:                    8318.38 MB


	Train Data (Original)  Memory Usage: 21.86 MB (0.3% of available memory)


	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.


	Stage 1 Generators:


		Fitting AsTypeFeatureGenerator...


			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.


	Stage 2 Generators:


		Fitting FillNaFeatureGenerator...


	Stage 3 Generators:


		Fitting IdentityFeatureGenerator...


		Fitting CategoryFeatureGenerator...


			Fitting CategoryMemoryMinimizeFeatureGenerator...


	Stage 4 Generators:


		Fitting DropUniqueFeatureGenerator...


	Stage 5 Generators:


		Fitting DropDuplicatesFeatureGenerator...


	Types of features in original data (raw dtype, special dtypes):


		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]


		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]


	Types of features in processed data (raw dtype, special dtypes):


		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]


		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]


		('int', ['bool']) : 1 | ['sex']


	0.1s = Fit runtime


	14 features in original data used to generate 14 features in processed data.


	Train Data (Processed) Memory Usage: 2.09 MB (0.0% of available memory)


Data preprocessing and feature engineering runtime = 0.18s ...


AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'


	To change this, specify the eval_metric parameter of Predictor()


Automatically generating train/validation split with holdout_frac=0.0639828014229775, Train Rows: 36573, Val Rows: 2500


User-specified model hyperparameters to be fit:
{
	'NN_TORCH': {},
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
	'CAT': {},
	'XGB': {},
	'FASTAI': {},
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}


Fitting 13 L1 models ...


Fitting model: KNeighborsUnif ... Training model for up to 2.82s of the 2.82s of remaining time.


	0.7752	 = Validation score   (accuracy)


	1.47s	 = Training   runtime


	0.1s	 = Validation runtime


Fitting model: KNeighborsDist ... Training model for up to 1.24s of the 1.24s of remaining time.


	0.766	 = Validation score   (accuracy)


	0.03s	 = Training   runtime


	0.04s	 = Validation runtime


Fitting model: LightGBMXT ... Training model for up to 1.14s of the 1.14s of remaining time.


Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.



	Ran out of time, early stopping on iteration 70. Best iteration is:
	[58]	valid_set's binary_error: 0.1328


	0.8672	 = Validation score   (accuracy)


	2.64s	 = Training   runtime


	0.01s	 = Validation runtime


Fitting model: WeightedEnsemble_L2 ... Training model for up to 2.82s of the -1.63s of remaining time.


	Ensemble Weights: {'LightGBMXT': 1.0}


	0.8672	 = Validation score   (accuracy)


	0.02s	 = Training   runtime


	0.0s	 = Validation runtime


AutoGluon training complete, total runtime = 4.74s ... Best model: "WeightedEnsemble_L2"


TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240617_142353")


In [2]:
# Modify predictor to enforce fairness over the train_data with respect to groups given by the column 'sex'
fpredictor = FairPredictor(predictor,train_data,'sex')
# Maximize accuracy while enforcing that the demographic parity (the difference in positive decision rates between men and women is at most 0.02)
fpredictor.fit(gm.accuracy,gm.demographic_parity,0.02)
#Evaluate per group on test data
fpredictor.evaluate_groups(test_data)

Unnamed: 0_level_0,Unnamed: 1_level_0,Accuracy,Balanced Accuracy,F1 score,MCC,Precision,Recall,ROC AUC,Positive Count,Negative Count,Positive Label Rate,Positive Prediction Rate
Unnamed: 0_level_1,Groups,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
original,Overall,0.863036,0.77172,0.674453,0.597347,0.773438,0.597929,0.918163,2318.0,7451.0,0.237281,0.183437
original,Female,0.934631,0.760266,0.638655,0.61712,0.785124,0.538244,0.938701,353.0,2936.0,0.107327,0.073579
original,Male,0.826698,0.765123,0.680512,0.571345,0.771613,0.608651,0.896625,1965.0,4515.0,0.303241,0.239198
original,Maximum difference,0.107933,0.004857,0.041857,0.045775,0.013511,0.070408,0.042075,1612.0,1579.0,0.195913,0.165619
updated,Overall,0.843177,0.736263,0.617191,0.532205,0.733373,0.532787,0.819932,2318.0,7451.0,0.237281,0.172382
updated,Female,0.894193,0.859737,0.623377,0.587947,0.504378,0.815864,0.938701,353.0,2936.0,0.107327,0.173609
updated,Male,0.817284,0.722584,0.615335,0.542526,0.850854,0.481934,0.896625,1965.0,4515.0,0.303241,0.171759
updated,Maximum difference,0.076909,0.137153,0.008042,0.045421,0.346475,0.33393,0.042075,1612.0,1579.0,0.195913,0.00185


# Autogluon with inferered attributes 

In [3]:
predictor2, protected = inferred_attribute_builder(train_data, 'class', 'sex', time_limit=3)
# Modify predictor to enforce fairness over the train_data with respect to groups given by the column 'sex'
fpredictor = FairPredictor(predictor2, train_data, 'sex', inferred_groups=protected)
# Maximize accuracy while enforcing that the demographic parity (the difference in positive decision rates between men and women is at most 0.02)
fpredictor.fit(gm.accuracy,gm.demographic_parity,0.02)
#Evaluate per group on test data
fpredictor.evaluate_groups(test_data)

No path specified. Models will be saved in: "AutogluonModels/ag-20240617_142358"


No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets.
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='best_quality'   : Maximize accuracy. Default time_limit=3600.
	presets='high_quality'   : Strong accuracy with fast inference speed. Default time_limit=3600.
	presets='good_quality'   : Good accuracy with very fast inference speed. Default time_limit=3600.
	presets='medium_quality' : Fast training time, ideal for initial prototyping.


Beginning AutoGluon training ... Time limit = 3s


AutoGluon will save models to "AutogluonModels/ag-20240617_142358"


AutoGluon Version:  1.1.0
Python Version:     3.10.13
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 23.5.0: Wed May  1 20:14:38 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6020
CPU Count:          10
Memory Avail:       6.16 GB / 16.00 GB (38.5%)
Disk Space Avail:   363.53 GB / 460.43 GB (79.0%)


Train Data Rows:    39073


Train Data Columns: 13


Label Column:       class


AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).


	2 unique label values:  [' <=50K', ' >50K']


	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])


Problem Type:       binary


Preprocessing data ...


Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K


	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.


Using Feature Generators to preprocess the data ...


Fitting AutoMLPipelineFeatureGenerator...


	Available Memory:                    6323.39 MB


	Train Data (Original)  Memory Usage: 19.53 MB (0.3% of available memory)


	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.


	Stage 1 Generators:


		Fitting AsTypeFeatureGenerator...


	Stage 2 Generators:


		Fitting FillNaFeatureGenerator...


	Stage 3 Generators:


		Fitting IdentityFeatureGenerator...


		Fitting CategoryFeatureGenerator...


			Fitting CategoryMemoryMinimizeFeatureGenerator...


	Stage 4 Generators:


		Fitting DropUniqueFeatureGenerator...


	Stage 5 Generators:


		Fitting DropDuplicatesFeatureGenerator...


	Types of features in original data (raw dtype, special dtypes):


		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]


		('object', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]


	Types of features in processed data (raw dtype, special dtypes):


		('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]


		('int', [])      : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]


	0.2s = Fit runtime


	13 features in original data used to generate 13 features in processed data.


	Train Data (Processed) Memory Usage: 2.05 MB (0.0% of available memory)


Data preprocessing and feature engineering runtime = 0.23s ...


AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'


	To change this, specify the eval_metric parameter of Predictor()


Automatically generating train/validation split with holdout_frac=0.0639828014229775, Train Rows: 36573, Val Rows: 2500


User-specified model hyperparameters to be fit:
{
	'NN_TORCH': {},
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
	'CAT': {},
	'XGB': {},
	'FASTAI': {},
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}


Fitting 13 L1 models ...


Fitting model: KNeighborsUnif ... Training model for up to 2.77s of the 2.77s of remaining time.


	0.7752	 = Validation score   (accuracy)


	0.04s	 = Training   runtime


	0.02s	 = Validation runtime


Fitting model: KNeighborsDist ... Training model for up to 2.71s of the 2.7s of remaining time.


	0.766	 = Validation score   (accuracy)


	0.03s	 = Training   runtime


	0.02s	 = Validation runtime


Fitting model: LightGBMXT ... Training model for up to 2.64s of the 2.64s of remaining time.


	Ran out of time, early stopping on iteration 124. Best iteration is:
	[68]	valid_set's binary_error: 0.1288


	0.8712	 = Validation score   (accuracy)


	2.67s	 = Training   runtime


	0.01s	 = Validation runtime


Fitting model: WeightedEnsemble_L2 ... Training model for up to 2.77s of the -0.06s of remaining time.


	Ensemble Weights: {'LightGBMXT': 1.0}


	0.8712	 = Validation score   (accuracy)


	0.02s	 = Training   runtime


	0.0s	 = Validation runtime


AutoGluon training complete, total runtime = 3.16s ... Best model: "WeightedEnsemble_L2"


TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240617_142358")


No path specified. Models will be saved in: "AutogluonModels/ag-20240617_142401"


No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets.
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='best_quality'   : Maximize accuracy. Default time_limit=3600.
	presets='high_quality'   : Strong accuracy with fast inference speed. Default time_limit=3600.
	presets='good_quality'   : Good accuracy with very fast inference speed. Default time_limit=3600.
	presets='medium_quality' : Fast training time, ideal for initial prototyping.


Beginning AutoGluon training ... Time limit = 3s


AutoGluon will save models to "AutogluonModels/ag-20240617_142401"


AutoGluon Version:  1.1.0
Python Version:     3.10.13
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 23.5.0: Wed May  1 20:14:38 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6020
CPU Count:          10
Memory Avail:       5.80 GB / 16.00 GB (36.2%)
Disk Space Avail:   363.52 GB / 460.43 GB (79.0%)


Train Data Rows:    39073


Train Data Columns: 13


Label Column:       sex


AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).


	2 unique label values:  [' Female', ' Male']


	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])


Problem Type:       binary


Preprocessing data ...


Selected class <--> label mapping:  class 1 =  Male, class 0 =  Female


	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( Male) vs negative ( Female) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.


Using Feature Generators to preprocess the data ...


Fitting AutoMLPipelineFeatureGenerator...


	Available Memory:                    5956.42 MB


	Train Data (Original)  Memory Usage: 19.53 MB (0.3% of available memory)


	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.


	Stage 1 Generators:


		Fitting AsTypeFeatureGenerator...


	Stage 2 Generators:


		Fitting FillNaFeatureGenerator...


	Stage 3 Generators:


		Fitting IdentityFeatureGenerator...


		Fitting CategoryFeatureGenerator...


			Fitting CategoryMemoryMinimizeFeatureGenerator...


	Stage 4 Generators:


		Fitting DropUniqueFeatureGenerator...


	Stage 5 Generators:


		Fitting DropDuplicatesFeatureGenerator...


	Types of features in original data (raw dtype, special dtypes):


		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]


		('object', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]


	Types of features in processed data (raw dtype, special dtypes):


		('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]


		('int', [])      : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]


	0.2s = Fit runtime


	13 features in original data used to generate 13 features in processed data.


	Train Data (Processed) Memory Usage: 2.05 MB (0.0% of available memory)


Data preprocessing and feature engineering runtime = 0.23s ...


AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'


	To change this, specify the eval_metric parameter of Predictor()


Automatically generating train/validation split with holdout_frac=0.0639828014229775, Train Rows: 36573, Val Rows: 2500


User-specified model hyperparameters to be fit:
{
	'NN_TORCH': {},
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
	'CAT': {},
	'XGB': {},
	'FASTAI': {},
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}


Fitting 13 L1 models ...


Fitting model: KNeighborsUnif ... Training model for up to 2.77s of the 2.77s of remaining time.


	0.674	 = Validation score   (accuracy)


	0.05s	 = Training   runtime


	0.02s	 = Validation runtime


Fitting model: KNeighborsDist ... Training model for up to 2.67s of the 2.67s of remaining time.


	0.7208	 = Validation score   (accuracy)


	0.03s	 = Training   runtime


	0.03s	 = Validation runtime


Fitting model: LightGBMXT ... Training model for up to 2.61s of the 2.61s of remaining time.


	Ran out of time, early stopping on iteration 108. Best iteration is:
	[35]	valid_set's binary_error: 0.1476


	0.8524	 = Validation score   (accuracy)


	2.63s	 = Training   runtime


	0.01s	 = Validation runtime


Fitting model: WeightedEnsemble_L2 ... Training model for up to 2.77s of the -0.12s of remaining time.


	Ensemble Weights: {'LightGBMXT': 0.733, 'KNeighborsDist': 0.267}


	0.8652	 = Validation score   (accuracy)


	0.03s	 = Training   runtime


	0.0s	 = Validation runtime


AutoGluon training complete, total runtime = 3.23s ... Best model: "WeightedEnsemble_L2"


TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240617_142401")


Unnamed: 0_level_0,Unnamed: 1_level_0,Accuracy,Balanced Accuracy,F1 score,MCC,Precision,Recall,ROC AUC,Positive Count,Negative Count,Positive Label Rate,Positive Prediction Rate
Unnamed: 0_level_1,Groups,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
original,Overall,0.866721,0.779485,0.685962,0.609715,0.777899,0.61346,0.918873,2318.0,7451.0,0.237281,0.187123
original,Female,0.936151,0.771087,0.653465,0.629785,0.782609,0.560907,0.94107,353.0,2936.0,0.107327,0.076923
original,Male,0.831481,0.77258,0.691525,0.584217,0.777143,0.622901,0.897262,1965.0,4515.0,0.303241,0.243056
original,Maximum difference,0.104669,0.001493,0.03806,0.045568,0.005466,0.061994,0.043808,1612.0,1579.0,0.195913,0.166132
updated,Overall,0.847681,0.737878,0.622335,0.543907,0.755857,0.528904,0.834367,2318.0,7451.0,0.237281,0.166035
updated,Female,0.906659,0.848028,0.640094,0.600067,0.546,0.773371,0.914831,353.0,2936.0,0.107327,0.152022
updated,Male,0.817747,0.723778,0.617428,0.543701,0.849376,0.484987,0.862494,1965.0,4515.0,0.303241,0.173148
updated,Maximum difference,0.088912,0.124249,0.022666,0.056366,0.303376,0.288384,0.052337,1612.0,1579.0,0.195913,0.021126


## SKlearn with known attributes.

In [4]:
import xgboost
import pandas as pd
data=pd.concat((train_data, test_data))
target = data['class']!= ' <=50K'
data.drop('class',inplace=True,axis=1)
data = pd.get_dummies(data)
# get_dumies must be called on all data.
# Otherwise it breaks if either train and test contain a feature missing from the other.
training_data = data.sample(frac=0.7)
training_target = target.iloc[training_data.index]
testing_data = data.drop(training_data.index)
testing_target = target.iloc[testing_data.index]

### Unlike autogluon much of sklearn breaks if you pass round dataframes 
### containing features such as target labels, and groups that are not used by the classifier.
### We pass dictionaries that represent the entire dataset to get round this.
### They contain 'target' 'data', 'groups' (optional), and 'factor' (optional)
    
training_set = oxonfair.build_data_dict(training_target,training_data)
testing_set = oxonfair.build_data_dict(testing_target, testing_data) 
#train base classifier
classifier = xgboost.XGBClassifier()
classifier.fit(y=training_target, X=training_data)


In [5]:
# Modify predictor to enforce fairness over the train_data with respect to groups given by the column 'sex'
fpredictor = FairPredictor(classifier, training_set, 'sex_ Female')
# Maximize accuracy while enforcing that the demographic parity (the difference in positive decision rates between men and women is at most 0.02)
fpredictor.fit(gm.accuracy, gm.demographic_parity,0.02)
#Evaluate per group on test data
fpredictor.evaluate_groups(testing_set)

Unnamed: 0_level_0,Unnamed: 1_level_0,Accuracy,Balanced Accuracy,F1 score,MCC,Precision,Recall,ROC AUC,Positive Count,Negative Count,Positive Label Rate,Positive Prediction Rate
Unnamed: 0_level_1,Groups,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
original,Overall,0.848938,0.742409,0.631163,0.553226,0.765369,0.537,0.880708,2527.0,7972.0,0.24069,0.168873
original,False,0.818143,0.743533,0.645656,0.537888,0.75951,0.561487,0.868309,2098.0,5012.0,0.295077,0.218143
original,True,0.913544,0.701361,0.549923,0.541227,0.806306,0.417249,0.863653,429.0,2960.0,0.126586,0.065506
original,Maximum difference,0.0954,0.042172,0.095733,0.00334,0.046796,0.144238,0.004656,1669.0,2052.0,0.168491,0.152637
updated,Overall,0.836937,0.703423,0.568331,0.505734,0.783183,0.445983,0.797238,2527.0,7972.0,0.24069,0.137061
updated,False,0.804219,0.689733,0.552987,0.494518,0.847441,0.410391,0.868309,2098.0,5012.0,0.295077,0.142897
updated,True,0.905577,0.783503,0.624413,0.570435,0.628842,0.620047,0.863653,429.0,2960.0,0.126586,0.124816
updated,Maximum difference,0.101357,0.093771,0.071427,0.075918,0.218599,0.209656,0.004656,1669.0,2052.0,0.168491,0.018082


In [6]:
# We can also make predictions on data that doesn't have target labels attached.
fpredictor.predict(testing_data)

array([0, 0, 0, ..., 0, 0, 1])

## SKlearn with infered groups

In [7]:
y_train = training_set['target']
groups_train = training_set['data']['sex_ Female']
X_train = training_set['data'].drop('sex_ Female', axis=1)
training_set = oxonfair.build_data_dict(y_train, X_train,groups_train)
y_test = testing_set['target']
groups_test = testing_set['data']['sex_ Female']
X_test = testing_set['data'].drop('sex_ Female', axis=1)
test_set = oxonfair.build_data_dict(y_test, X_test,groups_test)

#train base classifiers
classifier = xgboost.XGBClassifier()
classifier.fit(y=y_train,X=X_train)
group_classifier = xgboost.XGBClassifier()
group_classifier.fit(y=groups_train,X=X_train)

In [8]:
# Modify predictor to enforce fairness over the train_data with respect to groups given by the column 'sex'
fpredictor = FairPredictor(classifier, training_set, inferred_groups=group_classifier)
# Maximize accuracy while enforcing that the demographic parity (the difference in positive decision rates between men and women is at most 0.02)
fpredictor.fit(gm.accuracy,gm.demographic_parity,0.02)
#Evaluate per group on test data
fpredictor.evaluate_groups(test_set)

Some groups were not assigned, we only saw: [1 2]


Unnamed: 0_level_0,Unnamed: 1_level_0,Accuracy,Balanced Accuracy,F1 score,MCC,Precision,Recall,ROC AUC,Positive Count,Negative Count,Positive Label Rate,Positive Prediction Rate
Unnamed: 0_level_1,Groups,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
original,Overall,0.848938,0.742409,0.631163,0.553226,0.765369,0.537,0.880708,2527.0,7972.0,0.24069,0.168873
original,False,0.818143,0.743533,0.645656,0.537888,0.75951,0.561487,0.868309,2098.0,5012.0,0.295077,0.218143
original,True,0.913544,0.701361,0.549923,0.541227,0.806306,0.417249,0.863653,429.0,2960.0,0.126586,0.065506
original,Maximum difference,0.0954,0.042172,0.095733,0.00334,0.046796,0.144238,0.004656,1669.0,2052.0,0.168491,0.152637
updated,Overall,0.836937,0.703423,0.568331,0.505734,0.783183,0.445983,0.797238,2527.0,7972.0,0.24069,0.137061
updated,False,0.804219,0.689733,0.552987,0.494518,0.847441,0.410391,0.868309,2098.0,5012.0,0.295077,0.142897
updated,True,0.905577,0.783503,0.624413,0.570435,0.628842,0.620047,0.863653,429.0,2960.0,0.126586,0.124816
updated,Maximum difference,0.101357,0.093771,0.071427,0.075918,0.218599,0.209656,0.004656,1669.0,2052.0,0.168491,0.018082


In [9]:
# We can also make predictions on data that doesn't have target labels attached.
fpredictor.predict(test_set['data'])

array([0, 0, 0, ..., 0, 0, 1])