# Task 1
Let's name the variables as per lecture:

Y - will student find XAI useful in future (1 means yes)

$\hat{Y}$ - is student enrolled in training (1 means enrolled)

A - population (either B (privileged) for blue or R for red - unprivileged)

## Demographic parity

P($\hat{Y}$ = 1 | A = B) = 0.65

P($\hat{Y}$ = 1 | A = R) = 0.5

$\frac{P(\hat{Y} = 1 | A = B)}{P(\hat{Y} = 1 | A = R)}$ = 130%

## Equal opportunity

P($\hat{Y}$ = 1 | A = B, Y = 1) = 0.75

P($\hat{Y}$ = 1 | A = R, Y = 1) = 0.5

$\frac{P(\hat{Y} = 1 | A = B, Y = 1)}{P(\hat{Y} = 1 | A = R, Y = 1)}$ = 150%

## Predictive rate parity

P(Y = 1 | A = B, $\hat{Y}$ = 1) = $\frac{60}{65}$

P(Y = 1 | A = R, $\hat{Y}$ = 1) = 0.5

$\frac{P(Y = 1 | A = B, \hat{Y} = 1)}{P(Y = 1 | A = R, \hat{Y} = 1)}$ = $\frac{24}{13} \simeq 185$% 

## How can this decision rule be changed to improve its fairness?

If we used same model, but enrolled all those which we originally didn't enroll and 20% of those originally enrolled, we will have enrolled on average 32 students who will use XAI and 16 who will not. 48% to 50% in demographic parity is better than 65% to 50%. 40% to 50% is more equal than 75% to 50% in equal opportunity. 66.7% to 50% is more equal than 92% to 50%.

Long story short we need to deliberately use model to make worse assignment than random.


# Task 2

In will explore fairness of model predicting adult income in regards of gender.

## Part 2

In [14]:
part2()

Bias detected in 2 metrics: FPR, STP

Conclusion: your model is not fair because 2 or more criteria exceeded acceptable limits set by epsilon.

Ratios of metrics, based on 'Male'. Parameter 'epsilon' was set to 0.8 and therefore metrics should be within (0.8, 1.25)
            TPR       ACC       PPV       FPR       STP
Female  0.83196  1.117788  1.023899  0.217391  0.304721


Statistical parity coefficient is 0.304721, which is outside four-fifths rule. This means that among men, there is higher likelihood of earning above 50k.

Equal opportunity coeffiecient is 0.83196, which is within four-fifths rule.

Predictive parity coeffiecient is 1.023899, which also is within [0.8, 1.25] range.


## Part 3

In [15]:
part3()

Bias detected in 2 metrics: PPV, STP

Conclusion: your model is not fair because 2 or more criteria exceeded acceptable limits set by epsilon.

Ratios of metrics, based on 'Male'. Parameter 'epsilon' was set to 0.8 and therefore metrics should be within (0.8, 1.25)
             TPR       ACC       PPV       FPR       STP
Female  1.007634  1.143036  0.797235  0.884615  0.471014


Now let's compare the original model trained on full dataset with model trained on dataset without protected groups or variables directly related - for example being husband indicates being male.

Statistical parity coefficient is 0.471014, which is outside four-fifths rule, same as before. But on the other hand we improved on this statistic compared to original XGBoost on full data.

Equal opportunity coeffiecient is 1.007634, this statistic was within four-fifths rule before, but now it's even closer to 1.

Predictive parity coeffiecient is 0.797235, we now slightly fell out from [0.8, 1.25] range. This is a result of tradeoff for vast improvement on other statistics like TPR, FPR and STP.


## Part 4

In [16]:
part4()

Bias detected in 1 metric: STP

Conclusion: your model cannot be called fair because 1 criterion exceeded acceptable limits set by epsilon.
It does not mean that your model is unfair but it cannot be automatically approved based on these metrics.

Ratios of metrics, based on 'Male'. Parameter 'epsilon' was set to 0.8 and therefore metrics should be within (0.8, 1.25)
             TPR       ACC       PPV    FPR       STP
Female  1.233962  1.120627  0.824356  0.875  0.560847


In part 4, we improved original model by applying reweighting.

Statistical parity coefficient is 0.560847, which is the only statistic outside four-fifths rule. We improved on this statistic compared to both previous models.

Equal opportunity coeffiecient is 1.233962, this is much higher than ever before and almost equally distant from being 1 as original model, but still within four-fifths rule.

Predictive parity coeffiecient is 0.824356. This statistic decreased vastly compared to original model, but it's a tradeoff for all other statistics being within four-fifths threshold. It's also better than model from part 3.


## Part 5

In [19]:
stats, plot = part5()

![](img/p5.png)

In [18]:
stats

Unnamed: 0,recall,precision,f1,accuracy,auc
Base XGBoost,0.591062,0.797497,0.678935,0.864278,0.925081
XGBoost no protected groups or hints on groups,0.393761,0.83542,0.535244,0.833982,0.843579
XGBClassifier with Reweight,0.548904,0.82197,0.658241,0.861617,0.922079


Models with better precision have better fairness. Reweighted model and original model have similar accuracy. Model with few columns removed have far worse recall, f1-score and auc and slightly worse accuracy, but is slightly more fair by being a worse model.

In conclusion, we don't have to loose model quality to get better fairness as can be proven by comparing base XGBoost with same model but with Reweighting.

# Appendix
## Prepare data

In [1]:
%%javascript
//hack to fix export
require.config({
  paths: {
    d3: 'https://cdnjs.cloudflare.com/ajax/libs/d3/5.9.2/d3',
    jquery: 'https://code.jquery.com/jquery-3.4.1.min',
    plotly: 'https://cdn.plot.ly/plotly-latest.min'
  },

  shim: {
    plotly: {
      deps: ['d3', 'jquery'],
      exports: 'plotly'
    }
  }
});

<IPython.core.display.Javascript object>

In [2]:
import pandas as pd
import numpy as np

In [3]:
import random
import os

RND_SEED = 123

def seed_everything(seed=RND_SEED):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    
seed_everything()

In [4]:
df = pd.read_csv("adult.csv")
df["income_over_50"] = df["income"] == '>50K'
df.loc[:, df.dtypes == 'object'] =\
    df.select_dtypes(['object'])\
    .apply(lambda x: x.astype('category'))
df = df.sample(frac=1).reset_index(drop=True)
df

Unnamed: 0,age,workclass,fnlwgt,education,educational-num,marital-status,occupation,relationship,race,gender,capital-gain,capital-loss,hours-per-week,native-country,income,income_over_50
0,52,Private,117700,HS-grad,9,Divorced,Adm-clerical,Not-in-family,White,Female,0,0,40,United-States,<=50K,False
1,19,Private,351757,10th,6,Never-married,Other-service,Unmarried,White,Male,0,0,30,El-Salvador,<=50K,False
2,31,Federal-gov,101345,HS-grad,9,Never-married,Handlers-cleaners,Own-child,White,Female,0,0,40,United-States,<=50K,False
3,25,Private,324854,Bachelors,13,Never-married,Sales,Not-in-family,White,Female,0,0,40,United-States,<=50K,False
4,36,Private,245521,7th-8th,4,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,35,Mexico,<=50K,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48837,42,Federal-gov,37997,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,40,United-States,<=50K,False
48838,65,Self-emp-not-inc,326936,HS-grad,9,Married-civ-spouse,Sales,Wife,White,Female,0,0,40,United-States,<=50K,False
48839,44,Self-emp-inc,229466,Some-college,10,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,50,United-States,>50K,True
48840,35,Private,265954,Bachelors,13,Never-married,Handlers-cleaners,Not-in-family,Black,Male,0,0,40,United-States,<=50K,False


In [5]:
df["income_over_50"].astype('int').describe()

count    48842.000000
mean         0.239282
std          0.426649
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max          1.000000
Name: income_over_50, dtype: float64

In [6]:
from sklearn.model_selection import train_test_split

data = df.drop(columns=["income", "income_over_50"])
label = df["income_over_50"]

X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.1, random_state=RND_SEED)


## Part 1

In [7]:
import xgboost
import dalex as dx

def pf_xgboost_classifier_categorical(model, df):
    df.loc[:, df.dtypes == 'object'] =\
        df.select_dtypes(['object'])\
        .apply(lambda x: x.astype('category'))
    return model.predict_proba(df)[:, 1]

def train_and_explain(model, X, Y, X_e, Y_e, label):
    model.fit(X, Y)
    explainer = dx.Explainer(model, X_e, Y_e, predict_function=pf_xgboost_classifier_categorical, label=label, verbose=False)
    return explainer



In [8]:
from sklearn.metrics import accuracy_score

model = xgboost.XGBClassifier(
    enable_categorical=True,
    eval_metric="logloss",
    tree_method="hist",
)
explainer = train_and_explain(model, X_train, y_train, X_test, y_test, "Base XGBoost")
accuracy_score(y_test, model.predict(X_test))

0.8642784032753327

## Part 2

In [9]:
protected_variable = X_test.gender
privileged_group = "Male"

fobject = explainer.model_fairness(
    protected=protected_variable,
    privileged=privileged_group,
)
def part2():
    return fobject.fairness_check()

## Part 3

In [10]:
X_simplified = X_train.drop(columns=["gender", "marital-status", "relationship", "age", "race"])
X_test_simp = X_test.drop(columns=["gender", "marital-status", "relationship", "age", "race"])

model_simp = xgboost.XGBClassifier(
    enable_categorical=True,
    eval_metric="logloss",
    tree_method="hist",
)

explanation_simp = train_and_explain(model_simp, X_simplified, y_train, X_test_simp, y_test, "XGBoost no protected groups or hints on groups")

fobject_simp = explanation_simp.model_fairness(
    protected=protected_variable,
    privileged=privileged_group,
)
def part3():
    return fobject_simp.fairness_check()

In [11]:
accuracy_score(y_test, model_simp.predict(X_test_simp))

0.8339815762538383

## Part 4

In [12]:
from dalex.fairness import reweight
from copy import copy

protected_variable_train = X_train.gender

# reweight
sample_weight = reweight(
    protected_variable_train, 
    y_train, 
    verbose=False
)
model_reweight = copy(model_simp)
model_reweight.fit(X_train, y_train, sample_weight=sample_weight)
explainer_reweight = dx.Explainer(
    model_reweight, 
    X_test, 
    y_test, 
    label='XGBClassifier with Reweight',
    verbose=False
)
fobject_reweight = explainer_reweight.model_fairness(
    protected_variable, 
    privileged_group
)

def part4():
    return fobject_reweight.fairness_check()

## Part 5

In [13]:
def part5():
    model_stats = pd.concat([
        explainer.model_performance().result, 
        explanation_simp.model_performance().result,
        explainer_reweight.model_performance().result,
    ], axis=0)
    return model_stats, fobject.plot([fobject_simp, fobject_reweight], show=False).\
    update_layout(autosize=False, width=800, height=450, legend=dict(yanchor="bottom", y=1, xanchor="left", x=1))