Fairness-aware machine learning toolkit with a scikit-learn compatible API.
scikit-fair (skfair) is a Python library for fairness-aware binary classification. It covers the full pipeline — preprocessing, evaluation, auditing, comparison, and experimentation — and integrates seamlessly with scikit-learn and imbalanced-learn workflows.
Documentation: https://jmcfig.github.io/scikit-fair/
pip install scikit-fairOr install from source:
git clone https://github.com/jmcfig/scikit-fair.git
cd scikit-fair
pip install -e .Requirements: Python >= 3.9, numpy >= 1.22, pandas >= 1.5, scikit-learn >= 1.3, imbalanced-learn >= 0.12, cvxpy >= 1.3.
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from skfair.datasets import load_adult
from skfair.preprocessing import Massaging
from skfair.metrics import accuracy, disparate_impact, statistical_parity_difference
# 1. Load data
X, y = load_adult(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 2. Baseline — no fairness preprocessing
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
sens = X_test["sex"].values
print(f"Baseline — Accuracy: {accuracy(y_test.values, y_pred):.3f} "
f"DI: {disparate_impact(y_test.values, y_pred, sens):.3f} "
f"SPD: {statistical_parity_difference(y_test.values, y_pred, sens):.3f}")
# 3. Apply Massaging to reduce label bias
sampler = Massaging(sens_attr="sex", priv_group=1)
X_fair, y_fair = sampler.fit_resample(X_train, y_train)
clf_fair = LogisticRegression(max_iter=1000)
clf_fair.fit(X_fair, y_fair)
y_pred_fair = clf_fair.predict(X_test)
print(f"Fair — Accuracy: {accuracy(y_test.values, y_pred_fair):.3f} "
f"DI: {disparate_impact(y_test.values, y_pred_fair, sens):.3f} "
f"SPD: {statistical_parity_difference(y_test.values, y_pred_fair, sens):.3f}")| Class | Family | Reference |
|---|---|---|
Reweighing |
Weighting | Kamiran & Calders (2012) |
FairBalance |
Weighting | Yu et al. (2024) |
ReweighingClassifier |
Meta-estimator | — |
FairBalanceClassifier |
Meta-estimator | — |
Massaging |
Label modification | Kamiran & Calders (2012) |
FairwayRemover |
Undersampling | Fairway (2019) |
FairOversampling |
Oversampling | Dablan et al. |
FairSmote |
Oversampling | Chakraborty et al. (2021) |
FAWOS |
Oversampling | Salazar et al. (2021) |
HeterogeneousFOS |
Oversampling | Sonoda et al. (2023) |
DisparateImpactRemover |
Feature transformation | Feldman et al. (2015) |
OptimizedPreprocessing |
Feature transformation | Calmon et al. (2017) |
LearningFairRepresentations |
Feature transformation | Zemel et al. (2013) |
FairMask |
Meta-estimator | Peng et al. (2021) |
IntersectionalBinarizer |
Utility | — |
DropColumns |
Utility | — |
Each family of algorithms has its own API contract.
Label-modification and oversampling methods return a resampled dataset. They extend imblearn.BaseSampler and work directly inside an imblearn.Pipeline.
from skfair.preprocessing import FairSmote
sampler = FairSmote(sens_attr="sex", random_state=0)
X_resampled, y_resampled = sampler.fit_resample(X_train, y_train)Reweighing and FairBalance return the original X unchanged alongside a weight Series. Pass the weights to your classifier via sample_weight.
from skfair.preprocessing import Reweighing
rw = Reweighing(sens_attr="sex", priv_group=1)
X_unchanged, weights = rw.fit_transform(X_train, y_train)
clf = LogisticRegression(max_iter=1000)
clf.fit(X_unchanged, y_train, sample_weight=weights)ReweighingClassifier and FairBalanceClassifier encapsulate the weighting step inside a full sklearn-compatible classifier, including sample_weight handling.
from skfair.preprocessing import ReweighingClassifier
clf = ReweighingClassifier(
estimator=LogisticRegression(max_iter=1000),
sens_attr="sex",
priv_group=1,
)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)DisparateImpactRemover, OptimizedPreprocessing, and LearningFairRepresentations transform X directly and slot into sklearn.Pipeline as standard transformers.
from skfair.preprocessing import DisparateImpactRemover
repair = DisparateImpactRemover(
sensitive_attribute="sex",
repair_columns=["age", "hours-per-week"],
lambda_param=1.0,
)
X_repaired = repair.fit_transform(X_train)Combine preprocessing with downstream estimators, optionally using DropColumns to remove the sensitive attribute just before the classifier.
from imblearn.pipeline import Pipeline
from skfair.preprocessing import FairSmote, DropColumns
pipe = Pipeline([
("fair_smote", FairSmote(sens_attr="sex", random_state=42)),
("drop_sens", DropColumns("sex")), #optional
("classifier", LogisticRegression(solver="liblinear", max_iter=1000, random_state=42)),
])
pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)Tip: We recommend always using
imblearn.pipeline.Pipeline— it extends sklearn's Pipeline withfit_resamplesupport, so it works with all scikit-fair methods (transformers, samplers, and meta-estimators) without needing to switch imports.
Define complex, multi-column privilege criteria with IntersectionalBinarizer.
from skfair.preprocessing import IntersectionalBinarizer
binarizer = IntersectionalBinarizer(
privileged_definition={"race": "White", "sex": "Male"},
group_col_name="_is_privileged",
)
X_with_group = binarizer.fit_transform(X_train)Nine group-fairness metrics and nine performance metrics share a unified signature: metric(y_true, y_pred, sensitive_attr).
| Function | Definition | Perfect value |
|---|---|---|
disparate_impact |
P(Y=1|S=0) / P(Y=1|S=1) | 1.0 |
statistical_parity_difference |
P(Y=1|S=0) - P(Y=1|S=1) | 0.0 |
equal_opportunity_difference |
TPR(S=0) - TPR(S=1) | 0.0 |
equal_opportunity_ratio |
TPR(S=0) / TPR(S=1) | 1.0 |
average_odds_difference |
0.5 x [(FPR diff) + (TPR diff)] | 0.0 |
true_negative_rate_difference |
TNR(S=0) - TNR(S=1) | 0.0 |
false_negative_rate_difference |
FNR(S=0) - FNR(S=1) | 0.0 |
predictive_equality |
FPR(S=0) / FPR(S=1) | 1.0 |
accuracy_parity |
Acc(S=0) / Acc(S=1) | 1.0 |
accuracy, true_positive_rate, false_positive_rate, true_negative_rate, false_negative_rate, balanced_accuracy, precision, recall, f1_score.
from skfair.metrics import (
disparate_impact,
statistical_parity_difference,
equal_opportunity_difference,
predictive_equality,
accuracy,
balanced_accuracy,
precision,
recall,
f1_score,
)
sens = X_test["sex"].values
print(f"Accuracy: {accuracy(y_test.values, y_pred):.3f}")
print(f"Balanced accuracy: {balanced_accuracy(y_test.values, y_pred):.3f}")
print(f"Precision: {precision(y_test.values, y_pred):.3f}")
print(f"Recall: {recall(y_test.values, y_pred):.3f}")
print(f"F1 score: {f1_score(y_test.values, y_pred):.3f}")
print(f"Disparate impact: {disparate_impact(y_test.values, y_pred, sens):.3f}")
print(f"Stat. parity diff: {statistical_parity_difference(y_test.values, y_pred, sens):.3f}")
print(f"Equal opp. diff: {equal_opportunity_difference(y_test.values, y_pred, sens):.3f}")
print(f"Pred. equality: {predictive_equality(y_test.values, y_pred, sens):.3f}")Five standard fairness benchmarks are bundled.
| Loader | Samples | Features | Sensitive attribute | Label |
|---|---|---|---|---|
load_adult |
48 842 | 14 | sex (1 = male) |
income > 50k |
load_german |
1 000 | 20 | sex |
credit risk |
load_heart_disease |
740 | 13 | sex |
heart disease |
load_compas |
~7 214 | 11 | sex, race |
two-year recidivism |
load_ricci |
118 | 5 | Race |
promotion eligibility |
from skfair.datasets import load_adult, load_german, load_heart_disease, load_compas, load_ricci
X, y = load_adult(preprocessed=True)
X, y = load_german()
X, y = load_heart_disease()
X, y = load_compas()
X, y = load_ricci()The audit module provides data-level and prediction-level fairness analysis.
Examines sensitive-group proportions, target rates, and feature distributions before training.
from skfair.audit import BiasAuditor
auditor = BiasAuditor(X_train, y_train, sens_attr="sex")
print(auditor.group_proportions())
print(auditor.target_rate_by_group())
auditor.plot_summary()Evaluates how fair a model's predictions are across groups.
from skfair.audit import FairnessAuditor
fa = FairnessAuditor(y_test.values, y_pred, X_test["sex"].values)
print(fa.performance_by_group())
print(fa.fairness_metrics())
fa.plot_fairness_radar()The comparison module provides a ComparisonReport for comparing multiple preprocessing methods across datasets and classifiers.
ComparisonReport expects a DataFrame with the following columns:
| Column | Required | Description |
|---|---|---|
dataset |
yes | Dataset name (e.g. "adult", "compas") |
method |
yes | Preprocessing method name (e.g. "Massaging", "FairSmote") |
classifier |
yes | Classifier name (e.g. "LogReg") |
{metric} |
yes (at least one) | Value for each metric (e.g. accuracy, spd) |
{metric}_std |
no | Standard deviation — included when Experiment(std=True), not used by plots |
This is the format returned by Experiment.run(), but you can also build it manually.
from skfair.comparison import ComparisonReport
report = ComparisonReport(results_df)
# Summary tables — pivot of metric means per method, averaged over classifiers
tables = report.summary_tables()
# Performance bar charts
report.plot_metric_bar(metric="accuracy")
# Fairness bar chart for a single metric
report.plot_metric_bar(metric="spd")
# Accuracy vs |fairness| scatter — ideally a method sits in the top-right corner
report.plot_tradeoff(fairness_metric="spd", performance_metric="accuracy")
# Heatmap ranking methods per dataset across all metrics
report.plot_ranking()
# Or generate all plots at once
report.plot_all(fairness_metric="spd")
# Export a self-contained HTML report
report.to_html("report.html")The experimentation module automates dataset x method x classifier experiments with cross-validation.
from skfair.experimentation import Experiment
exp = Experiment(
datasets=["adult", "compas"],
methods=["Massaging", "FairSmote", "ReweighingClassifier"],
n_splits=5,
)
results = exp.run()
# Generate a ComparisonReport
report = exp.to_report()
report.plot_metric_bar(metric="accuracy")Experiments can also be configured via YAML files:
exp = Experiment.from_config("config.yaml")
results = exp.run()The examples/ folder contains step-by-step Jupyter notebooks that walk through every module:
| Notebook | Description |
|---|---|
01_datasets |
Loading, exploring, and preprocessing the bundled datasets |
02_methods |
Using fairness methods — transformers, samplers, and meta-estimators |
03_audit |
Pre-model bias analysis and post-model fairness auditing |
04_comparison |
Comparing methods side-by-side with ComparisonReport |
05_experiment |
Running cross-validated experiments with Experiment |
05a_experiment_config |
Configuring experiments from Python and YAML |
05b_custom_datasets |
Using custom (user-provided) datasets in experiments |
06_benchmark |
Full-scale benchmark driven by a YAML config file |
BSD 3-Clause. See LICENSE for details.