# Fairness Warnings Example
When fair-classification methods are called, warnings related to fairness violations may be raised.
The goal of the two fairness warnings classes is, to provide easy access to methods to check data sets and classifiers for imbalances.  

Two types of warning can be raised, based on either the data set being skewed or the classifier missing some boundary.

In [1]:
import sys
import numpy as np
import pandas as pd
from aif360.datasets import BinaryLabelDataset
sys.path.append("../")
 
from fairensics.methods import FairDisparateImpact, AccurateDisparateImpact
from fairensics.methods.fairness_warnings import FairnessBoundsWarning, DataSetSkewedWarning

In [2]:
# helper function to generate skewed data sets
def to_aif_dataset(X, X_protected, y):

    d = {"age": X_protected, "label": y}
    for i in range(np.shape(X)[1]):
        d["feature_"+str(i)] = X[:, i]
        
    df = pd.DataFrame(data=d)
    
    aif_df = BinaryLabelDataset(df=df, 
                                label_names=["label"], 
                                protected_attribute_names=["age"])
    return aif_df    

# 1. Data Set Warnings
Data set warnings should be executed before the classifiers ``fit`` method is called. The warnings are implemented in the class ``DatasetSkewedWarning`` and are executed by calling the ``check_dataset`` method. Global attributes of the class store the warning thresholds. If a threshold is ``None`` the warning is ignored.

Three different types of skewness are distinguished:
- label skewness (unbalanced ration of different label classes)
- attribute class skewness (unbalanced ration of protected attribute classes)
- label and attribute class skewness (unbalanced ration for each combination of protected attribute and label class)

## 1.1 Skewed Labels

In [3]:
# create a data set with skewed labels
n_samples = 100
n_pos_label = 10

X = np.random.rand(n_samples, 3)
X_protected = np.random.randint(0, 2, size=n_samples)
y = np.hstack((np.ones(n_pos_label), 
               np.zeros(n_samples-n_pos_label)))

In [4]:
skew_labels = to_aif_dataset(X, X_protected, y)
data_warning = DataSetSkewedWarning(skew_labels)
data_warning.check_dataset()



## 1.2 Unbalanced Protected Attribute Classes

In [5]:
# create a data set with unbalanced class count
n_samples = 100
n_pos_label = 10

X = np.random.rand(n_samples, 3)
X_protected = np.hstack((np.ones(n_pos_label), np.zeros(n_samples-n_pos_label)))
y = np.random.randint(0, 2, size=n_samples)

In [6]:
skew_prot_attr = to_aif_dataset(X, X_protected, y)
data_warning = DataSetSkewedWarning(skew_prot_attr)
data_warning.check_dataset()



## 1.3 Unbalanced Combination of Protected Attribute and Label

In [7]:
# create a data set with skewed labels
n_samples = 100
n_pos_label = 10

X = np.random.rand(n_samples, 3)
X_protected = np.hstack((np.ones(n_pos_label), np.zeros(n_samples-n_pos_label)))
y = np.random.randint(0, 2, size=n_samples)

In [8]:
skew_prot_attr = to_aif_dataset(X, X_protected, y)
data_warning = DataSetSkewedWarning(skew_prot_attr)
data_warning.check_dataset()



## 1.4 Redefining the Default Bounds

In [9]:
data_warning = DataSetSkewedWarning(skew_prot_attr)

data_warning.POSITIVE_NEGATIVE_CLASS_FRACTION = .1 # default is.4
data_warning.POSITIVE_NEGATIVE_LABEL_FRACTION = .2 # default is .4
data_warning.CLASS_LABEL_FRACTION = .05 # default is .4
data_warning.check_dataset()



# 2. Classifier Warnings
Classifier warnings are executed after the classifier is trained. Again, thresholds are stored in global variables of the class and checks are only executed if a bound is not ``None``.

Both thresholds for ratios and differences can be provided.

## 2.1 Using Default Bounds

In [10]:
n_samples = 100
n_pos_label = 10

X = np.random.rand(n_samples, 3)
X_protected = np.hstack((np.ones(n_pos_label), np.zeros(n_samples-n_pos_label)))

In [11]:
# defining predictions
y = np.random.randint(0, 2, size=n_samples)

predicted_dataset = to_aif_dataset(X, X_protected, y)

In [12]:
# raw data set 
y_new = np.random.randint(0, 2, size=n_samples)

raw_dataset = to_aif_dataset(X, X_protected, y)

In [13]:
clf_warning = FairnessBoundsWarning(raw_dataset, predicted_dataset)
data_warning.check_dataset()



## 2.2 Defining New Bounds
If a Bound is set to ``None`` the metric is not checked.

In [14]:
clf_warning = FairnessBoundsWarning(raw_dataset, predicted_dataset)

clf_warning.DISPARATE_IMPACT_RATIO_BOUND = None # default is .8
clf_warning.FPR_RATIO_BOUND = None # default is .8
clf_warning.FNR_RATIO_BOUND = None # default is .8
clf_warning.ERROR_RATIO_BOUND = None # default is .8

clf_warning.EO_DIFFERENCE_BOUND = .2 # default is .1

clf_warning.FPR_DIFFERENCE_BOUND = None # default is None
clf_warning.FNR_DIFFERENCE_BOUND = None # None default is None
clf_warning.ERROR_DIFFERENCE_BOUND = .3 # default is None

data_warning.check_dataset()

