# Equality Of Opportunity
Equality of Opportunity (EoP) is a metric used to assess the fairness of a predictive model, especially in the context of binary classification. This metric is relevant when one wants to ensure that the model treats different groups (e.g., demographic groups defined by gender, ethnicity, etc.) equally with respect to a given positive outcome.

In [1]:
import sys
import os

sys.path.append(os.path.abspath(os.path.join('..')))

In [2]:
import fairlib as fl
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

INFO:fairlib:fairlib loaded


## Dataset
A dataframe is created from our library, this will be used for training.

In [3]:
df = fl.DataFrame({
    "name":     ["Alice", "Bob", "Carla", "Davide", "Elena", "Francesco", "Giorgia", "Hanna", "Luca", "Maria",
                 "Marco", "Chiara", "Tommaso", "Silvia", "Antonio", "Rosa", "Giovanni", "Lucia", "Fabio", "Anna"],
    "age":      [25, 32, 45, 29, 34, 38, 27, 50, 31, 44,
                 22, 48, 24, 39, 28, 52, 26, 43, 23, 54],
    "sex":      ["F", "M", "F", "M", "F", "M", "F", "F", "M", "F",
                 "M", "F", "M", "F", "M", "F", "M", "F", "M", "F"],
    "income":   [40000, 38000, 43400, 43000, 48000, 49000, 42000, 41000, 47000, 40000,
                 52000, 39000, 53000, 45000, 51000, 38000, 55000, 40000, 56000, 37000]
})

A dataframe is created with the same features as the previous dataframe that will be used later for testing

In [4]:
df_test = fl.DataFrame({
    "name":     ["Andrea", "Bianca", "Claudio", "Daniela", "Emanuele", "Franca", "Gabriele", "Irene", "Leonardo", "Margherita",
                 "Nicola", "Olivia", "Pietro", "Quirina", "Raffaele", "Sara", "Simone", "Teresa", "Umberto", "Veronica"],
    "age":      [27, 46, 25, 21, 46, 50, 24, 41, 29, 48,
                 22, 44, 26, 55, 39, 39, 23, 51, 21, 19],
    "sex":      ["F", "M", "M", "F", "M", "F", "F", "F", "M", "F",
                 "M", "F", "M", "M", "M", "F", "M", "F", "F", "F"],
    "income":   [51000, 39000, 52000, 36000, 48000, 37000, 54000, 40000, 50000, 38000,
                 53000, 41000, 55000, 35000, 56000, 42000, 57000, 38000, 58000, 39000]
})

In [5]:
df.sensitive = {"age", "sex"}
df.targets = {"income"}

df_test.sensitive = {"age", "sex"}
df_test.targets = {"income"}

## Pre-processing data
Binarization, in order to enable the calculation of emtrics we must make some fields binary and/or categorical.

In [6]:
for data in [df, df_test]:
    data.drop("name", axis=1, inplace=True)
    data.discretize(
        ("male", data.sex == 'M'),
        age=("age<38", lambda age: age < 38),
        income=("income>45k", lambda income: income > 45000),
        in_place=True,
    )

In [7]:
X_train, y_train = df.separate_columns(as_array=True)

In [8]:
X_test, y_test = df_test.separate_columns(as_array=True)

## Model Training
A standard binary classifier is created.

In [9]:
model = RandomForestClassifier(random_state=42, n_estimators=130)
model.fit(X_train, y_train)

  return fit_method(estimator, *args, **kwargs)


In [10]:
y_pred = model.predict(X_test)

In [11]:
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nEquality of opportunity:\n", df_test.equality_of_opportunity(y_pred))

Accuracy: 0.75

Equality of opportunity:
 {(income>45k=1, male): 1.2857142857142858, (income>45k=1, not(male)): -1.2857142857142858, (income>45k=1, age<38): -1.375, (income>45k=1, not(age<38)): 1.375}
