<b> References </b>

1. https://github.com/EthicalML/xai/tree/master/examples
(Published here for learning purposes only)

This library is developed and mainted by The Institute for Ethical Machine Learning
(https://github.com/EthicalML)

2. https://towardsdatascience.com/identifying-and-correcting-label-bias-in-machine-learning-ed177d30349e

<h1> Assesing Bias in Algorithm </h1>


When attempting to assess bias in algorithms, researchers commonly look at four key metrics:

<h2> Demographic parity </h2>
<p>Classifier should make positive predictions on a protected population group at the same rate as the entire population.</p>


<h2> Demographic parity </h2>
<p> Similar to demographic parity but without the classifier knowing which protected population groups exist and which data points relate to such protected groups.</p>


<h2> Equal opportunity </h2>
<p> Classifier should have equal true positive rates on a protected population group as those of the entire population.</p>

<h2> Equalized odds </h2>
<p> Classifier should have both equal true positive and false positive rates on a protected population group as those of the entire population.Each high-level metric is expressed as a non-negative number which describes how close the classifier is to full fairness, with a score of 0 representing no bias </p>

In [None]:
!pip install xai
!pip install xai_data
import sys, os
import pandas as pd
import numpy as np
from collections import defaultdict
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.pipeline import make_pipeline

# Use below for charts in dark jupyter theme

THEME_DARK = False

if THEME_DARK:
    # This is used if Jupyter Theme dark is enabled. 
    # The theme chosen can be activated with jupyter theme as follows:
    # >>> jt -t oceans16 -T -nfs 115 -cellw 98% -N  -kl -ofs 11 -altmd
    font_size = '20.0'
    dark_theme_config = {
        "ytick.color" : "w",
        "xtick.color" : "w",
        "text.color": "white",
        'font.size': font_size,
        'axes.titlesize': font_size,
        'axes.labelsize': font_size, 
        'xtick.labelsize': font_size, 
        'ytick.labelsize': font_size, 
        'legend.fontsize': font_size, 
        'figure.titlesize': font_size,
        'figure.figsize': [20, 7],
        'figure.facecolor': "#384151",
        'legend.facecolor': "#384151",
        "axes.labelcolor" : "w",
        "axes.edgecolor" : "w"
    }
    plt.rcParams.update(dark_theme_config)

sys.path.append("..")

import xai
import xai.data

In [None]:
categorical_cols = ["gender", "workclass", "education", "education-num", "marital-status",
                   "occupation", "relationship", "ethnicity", "loan"]
csv_columns = ["age", "workclass", "fnlwgt", "education", "education-num", "marital-status",
                   "occupation", "relationship", "ethnicity", "gender", "capital-gain", "capital-loss",
                   "hours-per-week", "loan"]

Dataset description:
Listing of attributes: 

1. y >50K, <=50K. 
2. age	 continuous. 

3. workclass	 Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov,Without-pay, Never-worked. 

4. fnlwgt	 continuous. 

5. education	 Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool. 

6. education-num	 continuous. 

7. marital-status	 Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse. 

8. occupation	 Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces. 

9. relationship	 Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried. 

10. race	 White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black. 

11. sex	 Female, Male. 

12. capital-gain	 continuous. 

13. capital-loss	 continuous. 

14. hours-per-week	 continuous. 

15. native-country	 United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

In [None]:
df = xai.data.load_census()
df.tail()

In [None]:
target = "loan"
protected = ["ethnicity", "gender", "age"]

Here,we look at the gender imbalance,a trait that we definitely DO NOT want any model built on this data to carry and learn.
View class imbalances for protected columns

In [None]:

df_groups = xai.imbalance_plot(df, "gender", categorical_cols=categorical_cols)

To see how gender imbalance correlates with loan,

In [None]:
groups = xai.imbalance_plot(df, "gender", "loan", categorical_cols=categorical_cols)

In [None]:
bal_df = xai.balance(df, "gender", "loan", upsample=0.8, categorical_cols=categorical_cols)

In [None]:
_ = xai.correlations(df, include_categorical=True, plot_type="matrix")

In [None]:
proc_df = xai.normalize_numeric(bal_df)
proc_df = xai.convert_categories(proc_df)
x = proc_df.drop("loan", axis=1)
y = proc_df["loan"]

x_train, y_train, x_test, y_test, train_idx, test_idx = \
    xai.balanced_train_test_split(
            x, y, "gender", 
            min_per_group=300,
            max_per_group=300,
            categorical_cols=categorical_cols)

x_train_display = bal_df[train_idx]
x_test_display = bal_df[test_idx]

print("Total number of examples: ", x_test.shape[0])

df_test = x_test_display.copy()
df_test["loan"] = y_test

_= xai.imbalance_plot(df_test, "gender", "loan", categorical_cols=categorical_cols)

We are able to also analyse the interaction between inference results and input features. For this, we will train a single layer deep learning model.

In [None]:
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, mean_squared_error, roc_curve, auc

from keras.layers import Input, Dense, Flatten, \
    Concatenate, concatenate, Dropout, Lambda
from keras.models import Model, Sequential
from keras.layers.embeddings import Embedding

def build_model(X):
    input_els = []
    encoded_els = []
    dtypes = list(zip(X.dtypes.index, map(str, X.dtypes)))
    for k,dtype in dtypes:
        input_els.append(Input(shape=(1,)))
        if dtype == "int8":
            e = Flatten()(Embedding(X[k].max()+1, 1)(input_els[-1]))
        else:
            e = input_els[-1]
        encoded_els.append(e)
    encoded_els = concatenate(encoded_els)

    layer1 = Dropout(0.5)(Dense(100, activation="relu")(encoded_els))
    out = Dense(1, activation='sigmoid')(layer1)

    # train model
    model = Model(inputs=input_els, outputs=[out])
    model.compile(optimizer="adam", loss='binary_crossentropy', metrics=['accuracy'])
    return model


def f_in(X, m=None):
    """Preprocess input so it can be provided to a function"""
    if m:
        return [X.iloc[:m,i] for i in range(X.shape[1])]
    else:
        return [X.iloc[:,i] for i in range(X.shape[1])]

def f_out(probs, threshold=0.5):
    """Convert probabilities into classes"""
    return list((probs >= threshold).astype(int).T[0])


In [None]:
model = build_model(x_train)

model.fit(f_in(x_train), y_train, epochs=50, batch_size=512)

In [None]:
score = model.evaluate(f_in(x_test), y_test, verbose=1)
print("Error %.4f: " % score[0])
print("Accuracy %.4f: " % (score[1]*100))

In [None]:
probabilities = model.predict(f_in(x_test))
pred = f_out(probabilities)

In [None]:
_= xai.metrics_plot(
        y_test, 
        probabilities)

In [None]:
df.head()

Identify metric imbalances grouped by protected columns

In [None]:
_ = xai.metrics_plot(
    y_test, 
    probabilities, 
    df=x_test_display, 
    cross_cols=["gender", "ethnicity"],
    categorical_cols=categorical_cols)
#look at how recall for Black male is low but accuracy is high.


In [None]:
_ = [xai.metrics_plot(
    y_test, 
    probabilities, 
    df=x_test_display, 
    cross_cols=[p],
    categorical_cols=categorical_cols) for p in protected]

In [None]:
xai.confusion_matrix_plot(y_test, pred)

In [None]:
xai.confusion_matrix_plot(y_test, pred, scaled=False)

In [None]:
_ = xai.roc_plot(y_test, probabilities)

In [None]:
_ = [xai.roc_plot(
    y_test, 
    probabilities, 
    df=x_test_display, 
    cross_cols=[p],
    categorical_cols=categorical_cols) for p in protected]

In [None]:
_= xai.pr_plot(y_test, probabilities)

In [None]:
_ = [xai.pr_plot(
    y_test, 
    probabilities, 
    df=x_test_display, 
    cross_cols=[p],
    categorical_cols=categorical_cols) for p in protected]

In [None]:
d = xai.smile_imbalance(
    y_test, 
    probabilities)

In [None]:
d[["correct", "incorrect"]].sum().plot.bar()

In [None]:
d = xai.smile_imbalance(
    y_test, 
    probabilities,
    threshold=0.75,
    display_breakdown=True)

In [None]:
display_bars = ["true-positives", "true-negatives", 
                "false-positives", "false-negatives"]
d[display_bars].sum().plot.bar()

In [None]:
d = xai.smile_imbalance(
    y_test, 
    probabilities,
    bins=9,
    threshold=0.75,
    manual_review=0.00001,
    display_breakdown=False)

In [None]:
d[["correct", "incorrect", "manual-review"]].sum().plot.bar()

In [None]:
def get_avg(x, y):
    return model.evaluate(f_in(x), y, verbose=0)[1]

imp = xai.feature_importance(x_test, y_test, get_avg)

imp.head()