# Lab 4 - Detecting and Mitiating Bias in Machine Learning
Week 4 - PM034A Machine learning for socio-technical systems <br>

By <b> Nadia Metoui* </b> <br>
Faculty of Technology, Policy, and Management (TPM)<br>

<small>*Acknowledgement: Part of this lab is loosely based on the code developed by <i><b>Agathe Balayn</b></i> and <i><b>Seda Gürses</b></i>

***Learning Objectives***<br>
Examine the impact of ML-based solutions and interventions on individuals, organisations, and society.<br>
Apply ML <b>Operational Fairness tools</b> in real-world socio-technical examples.

<H2> </H2>

# Part I. Pre-processing.

In this part of the assignment, you will be exploring a use case where a Bank wants to develop an ML-based ADM (automate decision system) to decide whether to <b>grant</b> or <b>not to grant</b> a loan to a given applicant. To do so the Bank uses historical data containing multiple application records, characterized by information about the loan applicants (e.g., age, gender, personal situation) and information about the loan (e.g., amount, duration, purpose). Each application is labeled <i><b> good credit </b></i> if the loan had been reimbursed or <i><b>bad credit</b></i> if the loan has not been reimbursed or if there where several issues with the reimbursement.

To simulate this scenario we will build a classifier to disinguich between good and bad loans (or credits). We will train the classifier using the <i><b>German credit data</b></i> (you can information about the dataset and its attributes here: (https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.doc).<br>
And you can download the dataset here:
https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data<br>


- Step 1: Set-up (Provided)
- Step 2: Explore and familiarize with the dataset
- Step 3: Protected attributes, proxies, 
- Step 4: Representation Bias, Disparities and Skews.


<H3>Setp 1: Set-up</H3>

You first need to install the required libraries for this part.  The main libraries are the `aif360` and `sklearn` ones. We also recommend using `numpy` or `pandas` to easily manipulate and explore the data.

<div class="alert alert-block alert-danger">
<b>Note:</b> Uncomment and run the next cell if you have not previously installed the libraries.
</div>


<b>Installing required libraries</b>

In [1]:
# #uncomment if you need to install the libraries
!pip install aif360
!pip install fairlearn



Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting aif360
  Downloading aif360-0.5.0-py3-none-any.whl (214 kB)
[K     |████████████████████████████████| 214 kB 4.7 MB/s 
Installing collected packages: aif360
Successfully installed aif360-0.5.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting fairlearn
  Downloading fairlearn-0.8.0-py3-none-any.whl (235 kB)
[K     |████████████████████████████████| 235 kB 4.9 MB/s 
Installing collected packages: fairlearn
Successfully installed fairlearn-0.8.0


<b>Loading required libraries</b>

In [2]:
# Libraries for data processing and visualiztion 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from IPython.display import Markdown, display

#ML libraries
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

from sklearn.pipeline import make_pipeline

np.random.seed(0)

# Faireness Tool IBM AI Fairness 360
from aif360.datasets import GermanDataset



pip install 'aif360[LawSchoolGPA]'


<b>Download the German Credit Data set</b><br>
In the following we will download the data set and https://archive.ics.uci.edu its documentation from the website and place it in the correct folder to be accessed by aif360.

**Option 1 Google Colab:**<br>
Uncomment the following cell to download the dataset in google colab.

In [3]:
#Download the German Credit DataSet
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.doc
!cp german.data /usr/local/lib/python3.8/dist-packages/aif360/data/raw/german/german.data
!cp german.doc /usr/local/lib/python3.8/dist-packages/aif360/data/raw/german/german.doc

--2022-12-07 16:10:19--  https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 79793 (78K) [application/x-httpd-php]
Saving to: ‘german.data’


2022-12-07 16:10:20 (488 KB/s) - ‘german.data’ saved [79793/79793]

--2022-12-07 16:10:20--  https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.doc
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4679 (4.6K) [application/x-httpd-php]
Saving to: ‘german.doc’


2022-12-07 16:10:20 (96.7 MB/s) - ‘german.doc’ saved [4679/4679]



**Option 2: Local environment**<br>
<div class="alert alert-block alert-danger">
<b>Note:</b> If you are working on your local environment you will have to manually add the files "german.doc" and "german.data" to the folder 
"dist-packages/aif360/data/raw/german/" under your python path.<br>
You can find the files in the lab folder on github or download them from: <br>
<a href="https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data">https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data</a> <br>
<a href="https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.doc">https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.doc</a>
</div> 

<b>Loading the dataset</b>

Here, we will load the <i><b>German credit data</b></i> in a format that is compatible with the use of the <i><b>AIF360 toolkit</b></i>. For this, you need to make use of the already implemented class of the toolkit `GermanDataset()`.

Because the data available is encoded in a complex way, we provide you with the code to preprocess it, in the function `custom_preprocessing()`. We also provide you with an example on how to actually load the data using the `GermanDataset()` class, in `preproc_and_load_data_german()`. 

In [4]:
def preproc_and_load_data_german():
    """
    Load and pre-process german credit dataset.
    Args: -
    Returns:
        GermanDataset: An instance of GermanDataset with required pre-processing.
    """
    def custom_preprocessing(df):
        """ Custom pre-processing for German Credit Data
        """

        def group_credit_hist(x):
            if x in ['A30', 'A31', 'A32']:
                return 'None/Paid'
            elif x == 'A33':
                return 'Delay'
            elif x == 'A34':
                return 'Other'
            else:
                return 'NA'

        def group_employ(x):
            if x == 'A71':
                return 'Unemployed'
            elif x in ['A72', 'A73']:
                return '1-4 years'
            elif x in ['A74', 'A75']:
                return '4+ years'
            else:
                return 'NA'

        def group_savings(x):
            if x in ['A61', 'A62']:
                return '<500'
            elif x in ['A63', 'A64']:
                return '500+'
            elif x == 'A65':
                return 'Unknown/None'
            else:
                return 'NA'

        def group_status(x):
            if x in ['A11', 'A12']:
                return '<200'
            elif x in ['A13']:
                return '200+'
            elif x == 'A14':
                return 'None'
            else:
                return 'NA'
        
        def group_personal_status(x):
            if x in ['A91']:
                return 'divorced/separated'
            elif x in ['A92']:
                return 'divorced/separated/married'
            elif x in ['A93', 'A95']:
                return 'single'
            elif x in ['A94']:
                return 'married/widowed'
            else:
                return 'NA'

        def group_foreign_worker(x):
            if x in ['A201']:
                return 'yes'
            elif x in ['A202']:
                return 'no'
            else:
                return 'NA'

        #print(df)
        #print(df.shape)
        #print(df.isnull().sum().sum())
        #print(df.isin(['NA']).sum(axis=0))
        status_map = {'A91': 1.0, 'A93': 1.0, 'A94': 1.0,
                    'A92': 0.0, 'A95': 0.0}
        
        df['sex'] = df['personal_status'].replace(status_map)
        

        # group credit history, savings, and employment
        df['credit_history'] = df['credit_history'].apply(lambda x: group_credit_hist(x))
        df['savings'] = df['savings'].apply(lambda x: group_savings(x))
        df['employment'] = df['employment'].apply(lambda x: group_employ(x))
        #df['age'] = df['age'].apply(lambda x: np.float(x >= 26))
        df['status'] = df['status'].apply(lambda x: group_status(x))
        df['personal_status'] = df['personal_status'].apply(lambda x: group_personal_status(x))
        df['foreign_worker'] = df['foreign_worker'].apply(lambda x: group_foreign_worker(x))
        group_foreign_worker
        #print(df.isin(['NA']).sum(axis=0))
        
        # print(df)
        # uncomment if you want to save a version of the processed data
        #df.to_csv("german_credit_data_processed.csv")
        return df

    # Feature partitions
    XD_features = ['number_of_credits', 'telephone',
                     'foreign_worker', 'people_liable_for', 'skill_level', 'credit_history', 'installment_plans', 'residence_since', 'property', 'other_debtors', 'purpose', 'savings', 'employment', 'sex', 'age', 'personal_status', 'month']
    D_features = ['sex', 'age'] 
    Y_features = ['credit']
    X_features = list(set(XD_features)-set(D_features))
    categorical_features = ['installment_plans', 'telephone',
                     'foreign_worker', 'skill_level', 'credit_history', 'property', 
                            'other_debtors', 'purpose', 'savings', 'employment', 'personal_status']

    # privileged classes
    all_privileged_classes = {"sex": [1.0],
                              "age": lambda x: x > 25}

    # protected attribute maps
    all_protected_attribute_maps = {"sex": {1.0: 'Male', 0.0: 'Female'},
                                    "age": {1.0: 'Old', 0.0: 'Young'}}

    return GermanDataset(
        label_name=Y_features[0],
        favorable_classes=[1],
        protected_attribute_names=D_features,
        privileged_classes=[all_privileged_classes[x] for x in D_features],
        instance_weights_name=None,
        categorical_features=categorical_features,
        features_to_keep=X_features+Y_features+D_features,
        features_to_drop=[],
        metadata={ 'label_maps': [{1.0: 'Good Credit', 2.0: 'Bad Credit'}],
                   'protected_attribute_maps': [all_protected_attribute_maps[x]
                                for x in D_features]},
        custom_preprocessing=custom_preprocessing)

<br><br>

<H3>Step 2: Explore and familiarize with the dataset</H3>

<b>Q1: Analyse the dataset and answer the following:</b> 
- What is the number of records?
- What is the number of attributes present with the preprocessing we provided? 
- What is the list of attribute names?
- Are there missing values that could create biases?

In [5]:
# Instanciating the German credit dataset
dataset_gcredit = preproc_and_load_data_german()

<div class="alert alert-block alert-info">
<b>Tip:</b> The documentation of "AIF360 - German credit data" dataset  can be found <a href="https://aif360.readthedocs.io/en/latest/modules/generated/aif360.datasets.GermanDataset.html">[HERE]</a>. </div> 


Take a look at documentation of AIF360 and use existing methods to explore the dataset instance how to access the features with:<br> `dataset_gcredit.features`. 

You are also free to transform the dataset into a pandas dataframe to extract the needed information.
Use <br>
    `pd_gdata = pd.DataFrame(dataset_gcredit.features, columns=dataset_gcredit.feature_names)` <br>
    to create the pandas dataframe
</div> 

- 

In [6]:
### Some possible explorations ###
# Number of records:

# Number of features:

# Feature names:

# Number of missing values for each attribute ...

<br><br>

<H3>Step 3: Pre-processing: Protected attributes, proxies.</H3>

<b>Q2: Identification of protected attributes</b>

a) Study the dataset and its documentation and identify which attributes that might raise unfairness concerns and should be considered protected (according to the law). Explain, in your opinion, why are these attributes protected provide exaples of bias or unfaireness for each identified attribute. 

<div class="alert alert-block alert-info">
<b>Tip:</b> 

Take a look at the following documents<br>
<a href="https://www.equalityhumanrights.com/en/equality-act/protected-characteristics">(1) 
Protected characteristics | Equality and Human Rights Commission (UK, 2021)</a><br>
<a href="https://rm.coe.int/discrimination-artificial-intelligence-and-algorithmic-decision-making/1680925d73">(2) Discrimination, Artificial Intelligence, and Algorithmic Decision-Making (2018)</a><br>
<a href="http://ec.europa.eu/social/BlobServlet?docId=1691&langId=en&usg=AOvVaw3vI30bO3jisairH2Z7-nSl">(3) Age discrimination and European Law (2005)</a>. 
<div> 



-

b) Study the dataset and its documentation and identify any further "non-protect" attributes that could cause  unfairenesses. Explain your reasoning. provide examples of bias or unfairenesse related to each attribut.

-

<b>Q3:  Identification of "spurious" proxies </b>

a) Find the proxies for the attribute "sex".

b) Find proxies for one additional protected attribut you identified in Q2-a.

c) In your opinion, why do we want to identify proxies for protected attributes in a dataset? How should you handle the proxies?

<div class="alert alert-block alert-info">
<b> Tip: </b>A proxy attribute <i>Ap</i>  is an attribute that has a similar distribution as another attribute <i>Ax</i>, so having access to the proxy attribute <i>Ap</i> provides a good knowledge of the other attribute <i>Ax</i>. For instance, in the US the zipcode is a powerful proxy for race and education, the zipcode combined with websites visited is an even more powerful proxy, names in certain languages are strong proxies for gender, etc.<br>

The simplest way to identify proxy attributes for a protected attribute <i>Ax</i> is to compute the correlation of <i>Ax</i>  with each other attributes in the dataset. The higher the corrolation (absolute value of the corrolation) the higher the likelihood an attribute is a proxy of <i>Ax</i> <br>

You can use the `corr()` function of the pandas library to compute the correlation between two attributes
</div> 


   

<H3>Step 4: Representation Bias, Disparities and Skews.</H3>

<br>
<b>Q4: Representation biases: Representation Disparity</b>

a) Is the dataset we are working with representative of the German population with regard to age. Add any needed code or analysis to briefely justify your answer<br>
b) Is the dataset we are working with representative of the German population with regard to gender. Add any needed code or analysis to briefely justify your answer

<b>(Optional)</b><br>
c) Look at the joint distribution of the attributes for sex and personal_status=divorced/separated/married. Does the dataset seem to be representative of the German population?<br>
d) Similarly, look at the distribution of foreign workers. Does the dataset seem to be representative of the German population?<br>

<div class="alert alert-block alert-info">
<b> Tip: </b> You can find demographic information from Wikipedia <a href=https://en.wikipedia.org/wiki/Demographics_of_Germany>[Here]</a>
    
Go to section <b><i>Demographic statistics</i></b> take a closer look at the most racent  <b><i>Age structure</i></b> data (it should be from 2018). Use this data to build a distribution of german population across age, then across gender and compare it to the distributions from <b><i>the German credit data</i></b> we are working with.

It is up to you how you want to justify your answer, however using visualizations will provide more points (i.e., plots and diagram)
</div>
    

In [7]:
# write you code here 
# 


Don't forget to analyse your findings

<b>Q5: Representation Bias: Outcome Skews </b> 

Is there a skew towards certain groups:<br>
a) Analyse the dataset, and report the numbers of male / female with bad/good credit. Do the same for "old" / " young" people in the datset. Normalize these numbers respectively over the total number of male/female, "old"/"young" for a fair comparison. For that, you can consider having 50 individuals for each of these groups.

b) Brieflt describe your findings and explain the impacts (on faireness) of using this dataset as training data (if any)

<div class="alert alert-block alert-info">
<b> Tip: </b> We provide a function for Normalised count per attribut and lable you are free to use it or implement your own method 
    
`getNormalizedCount(pd_train_data, protected_attribute, label)`
</div>
    

In [2]:
# Normalised count per attribut and lable 
def getNormalizedCount(pd_train_data, protected_attribute, label):
    unnormalized_count = pd_train_data[[protected_attribute, label]].value_counts()
    counts = {}
    for attribute_value in pd_train_data[[protected_attribute]].value_counts().keys():
        counts[attribute_value[0]] = pd_train_data[[protected_attribute]].value_counts()[attribute_value]
    normalized_count = unnormalized_count[:]
    for attribute_value, credit_value in pd_train_data[[protected_attribute, label]].value_counts().keys():
        normalized_count[attribute_value, credit_value] = normalized_count[attribute_value, credit_value] * (50 / counts[attribute_value])
    return normalized_count

# add the credit labels to the data set.
pd_gdata["credit"] = dataset_gcredit.labels

In [3]:
### YOUR ANSWER HERE ###
# ADD code here to print the AGE-CREDIT distribution


# ADD code here to print the SEX-CREDIT distribution


# ADD code here to visualise the results for both you can use stacked bar plots from pandas toolkit
#<your dataframe>.size().unstack().plot(kind='bar', stacked=True)


Don't forget to analyse your findings

<br><br>

# Part II. Observational Faireness Metrics.



Step 1: Re-process the data. <br>
Step 2: Building a Classifier. <br>
Step 3: Step 3: Classification threshold and fairness constraint

<H3>Step 1: Re-process the data.<H3>

**Option 1: Re-process the data for age fairness**<br>
uncomment the following block to re-process the data and set age as a protected attribute

In [None]:
# def preproc_and_load_data_german():
#     """
#     Load and pre-process german credit dataset.
#     Args: -
#     Returns:
#         GermanDataset: An instance of GermanDataset with required pre-processing.
#     """
#     def custom_preprocessing(df):
#         """ Custom pre-processing for German Credit Data
#         """

#         def group_credit_hist(x):
#             if x in ['A30', 'A31', 'A32']:
#                 return 'None/Paid'
#             elif x == 'A33':
#                 return 'Delay'
#             elif x == 'A34':
#                 return 'Other'
#             else:
#                 return 'NA'

#         def group_employ(x):
#             if x == 'A71':
#                 return 'Unemployed'
#             elif x in ['A72', 'A73']:
#                 return '1-4 years'
#             elif x in ['A74', 'A75']:
#                 return '4+ years'
#             else:
#                 return 'NA'

#         def group_savings(x):
#             if x in ['A61', 'A62']:
#                 return '<500'
#             elif x in ['A63', 'A64']:
#                 return '500+'
#             elif x == 'A65':
#                 return 'Unknown/None'
#             else:
#                 return 'NA'

#         def group_status(x):
#             if x in ['A11', 'A12']:
#                 return '<200'
#             elif x in ['A13']:
#                 return '200+'
#             elif x == 'A14':
#                 return 'None'
#             else:
#                 return 'NA'
        
#         def group_personal_status(x):
#             if x in ['A91']:
#                 return 'divorced/separated'
#             elif x in ['A92']:
#                 return 'divorced/separated/married'
#             elif x in ['A93', 'A95']:
#                 return 'single'
#             elif x in ['A94']:
#                 return 'married/widowed'
#             else:
#                 return 'NA'

#         def group_foreign_worker(x):
#             if x in ['A201']:
#                 return 'yes'
#             elif x in ['A202']:
#                 return 'no'
#             else:
#                 return 'NA'

#         #print(df)
#         #print(df.shape)
#         #print(df.isnull().sum().sum())
#         #print(df.isin(['NA']).sum(axis=0))
#         status_map = {'A91': 1.0, 'A93': 1.0, 'A94': 1.0,
#                     'A92': 0.0, 'A95': 0.0}
        
#         df['sex'] = df['personal_status'].replace(status_map)
        

#         # group credit history, savings, and employment
#         df['credit_history'] = df['credit_history'].apply(lambda x: group_credit_hist(x))
#         df['savings'] = df['savings'].apply(lambda x: group_savings(x))
#         df['employment'] = df['employment'].apply(lambda x: group_employ(x))
#         #df['age'] = df['age'].apply(lambda x: np.float(x >= 26))
#         df['status'] = df['status'].apply(lambda x: group_status(x))
#         df['personal_status'] = df['personal_status'].apply(lambda x: group_personal_status(x))
#         df['foreign_worker'] = df['foreign_worker'].apply(lambda x: group_foreign_worker(x))
#         group_foreign_worker
#         #print(df.isin(['NA']).sum(axis=0))
        
#         # print(df)
#         # uncomment if you want to save a version of the processed data
#         #df.to_csv("german_credit_data_processed.csv")
#         return df

#     # Feature partitions
#     XD_features = ['number_of_credits', 'telephone',
#                      'foreign_worker', 'people_liable_for', 'skill_level', 'credit_history',\
#                    'installment_plans', 'residence_since', 'property', 'other_debtors', \
#                    'purpose', 'savings', 'employment', 'sex', 'age', 'month']
#     D_features = ['age'] 
#     Y_features = ['credit']
#     X_features = list(set(XD_features)-set(D_features))
#     categorical_features = ['installment_plans', 'telephone',
#                      'foreign_worker', 'skill_level', 'credit_history', 'property',\
#                             'other_debtors', 'purpose', 'savings', 'employment']

#     # privileged classes
#     all_privileged_classes = {"age": lambda x: x > 25}

#     # protected attribute maps
#     all_protected_attribute_maps = {"age": {1.0: 'Old', 0.0: 'Young'}}

#     return GermanDataset(
#         label_name=Y_features[0],
#         favorable_classes=[1],
#         protected_attribute_names=D_features,
#         privileged_classes=[all_privileged_classes[x] for x in D_features],
#         instance_weights_name=None,
#         categorical_features=categorical_features,
#         features_to_keep=X_features+Y_features+D_features,
#         features_to_drop=["sex", "personal_status=divorced/separated/married"],
#         metadata={ 'label_maps': [{1.0: 'Good Credit', 2.0: 'Bad Credit'}],
#                    'protected_attribute_maps': [all_protected_attribute_maps[x]
#                                 for x in D_features]},
#         custom_preprocessing=custom_preprocessing)



In [None]:
# dataset_orig = preproc_and_load_data_german()

<b>Q6: Option 1</b>

a) Set the privilege and unprivilaged age group based on the findings of question Q5-a (answer in the text then add the variables 0 or 1 to the code below). Provide a very brief justification for your answer


In [None]:
# Add the code for question a) here
# code = 1: is old above 25
# code = 0: is young under 25

#privileged_code = #add the write code here
#unprivileged_code = #add the write code here

In [None]:
# # We start by defining the privilaged and unprivileged 
# privileged_groups = [{'age': privileged_code}] 
# unprivileged_groups = [{'age': unprivileged_code}] 

**Option 2: Re-process the data for gender fairness**<br>
uncomment the following block to re-process the data and set sex as a protected attribute

In [None]:
def preproc_and_load_data_german():
    """
    Load and pre-process german credit dataset.
    Args: -
    Returns:
        GermanDataset: An instance of GermanDataset with required pre-processing.
    """
    def custom_preprocessing(df):
        """ Custom pre-processing for German Credit Data
        """

        def group_credit_hist(x):
            if x in ['A30', 'A31', 'A32']:
                return 'None/Paid'
            elif x == 'A33':
                return 'Delay'
            elif x == 'A34':
                return 'Other'
            else:
                return 'NA'

        def group_employ(x):
            if x == 'A71':
                return 'Unemployed'
            elif x in ['A72', 'A73']:
                return '1-4 years'
            elif x in ['A74', 'A75']:
                return '4+ years'
            else:
                return 'NA'

        def group_savings(x):
            if x in ['A61', 'A62']:
                return '<500'
            elif x in ['A63', 'A64']:
                return '500+'
            elif x == 'A65':
                return 'Unknown/None'
            else:
                return 'NA'

        def group_status(x):
            if x in ['A11', 'A12']:
                return '<200'
            elif x in ['A13']:
                return '200+'
            elif x == 'A14':
                return 'None'
            else:
                return 'NA'
        
        def group_personal_status(x):
            if x in ['A91']:
                return 'divorced/separated'
            elif x in ['A92']:
                return 'divorced/separated/married'
            elif x in ['A93', 'A95']:
                return 'single'
            elif x in ['A94']:
                return 'married/widowed'
            else:
                return 'NA'

        status_map = {'A91': 1.0, 'A93': 1.0, 'A94': 1.0,
                    'A92': 0.0, 'A95': 0.0}
        
        df['sex'] = df['personal_status'].replace(status_map)
        

        # group credit history, savings, and employment
        df['credit_history'] = df['credit_history'].apply(lambda x: group_credit_hist(x))
        df['savings'] = df['savings'].apply(lambda x: group_savings(x))
        df['employment'] = df['employment'].apply(lambda x: group_employ(x))
        #df['age'] = df['age'].apply(lambda x: np.float(x >= 26))
        df['status'] = df['status'].apply(lambda x: group_status(x))
        df['personal_status'] = df['personal_status'].apply(lambda x: group_personal_status(x))
        
        return df

    # Feature partitions
    XD_features = ['number_of_credits', 'telephone',
                     'foreign_worker', 'people_liable_for', 'skill_level', 'credit_history',\
                   'installment_plans', 'residence_since', 'property', 'other_debtors', \
                   'purpose', 'savings', 'employment', 'sex', 'age', 'month']
    D_features = ['sex'] 
    Y_features = ['credit']
    X_features = list(set(XD_features)-set(D_features))
    categorical_features = ['installment_plans', 'telephone',
                     'foreign_worker', 'skill_level', 'credit_history', 'property',\
                            'other_debtors', 'purpose', 'savings', 'employment']

    # privileged classes
    all_privileged_classes = {"sex": [1.0]}

    # protected attribute maps
    all_protected_attribute_maps = {"sex": {1.0: 'Male', 0.0: 'Female'}}

    return GermanDataset(
        label_name=Y_features[0],
        favorable_classes=[1],
        protected_attribute_names=D_features,
        privileged_classes=[all_privileged_classes[x] for x in D_features],
        instance_weights_name=None,
        categorical_features=categorical_features,
        features_to_keep=X_features+Y_features+D_features,
        features_to_drop=[],
        metadata={ 'label_maps': [{1.0: 'Good Credit', 2.0: 'Bad Credit'}],
                   'protected_attribute_maps': [all_protected_attribute_maps[x]
                                for x in D_features]},
        custom_preprocessing=custom_preprocessing)



In [None]:
dataset_orig = preproc_and_load_data_german()

<b>Q6: Option 2</b>

a) Set the privilege and unprivilaged age group based on the findings of question Q5-a (answer in the text then add the variables 0 or 1 to the code below). Provide a very brief justification for your answer


-

In [None]:
# Add the code for question a) here
# code = 1: male
# code = 0: female

#privileged_code = #add the write code here
#unprivileged_code = #add the write code here

In [None]:
# # We start by defining the privilaged and unprivileged 
privileged_groups = [{'sex': privileged_code}]
unprivileged_groups = [{'sex': unprivileged_code}]

#### Preparation for training a classifier.
As we will learn a classifier, we need to divide the data into a training, validation and test sets.
We define them to use respectively 60%, 20% and 20% of the whole data.
We will use the following code to do so.

In [None]:
dataset_orig_train, dataset_orig_val, dataset_orig_test = \
    dataset_orig.split([0.6, 0.8], shuffle=True, seed=1)

#### Training a classifier.
As we want to automate the decision process, we need to learn a classifier. We make the choice of using a logistic regression classifier, that we train with the following code.

In [None]:
model = make_pipeline(StandardScaler(),
                      LogisticRegression(solver='liblinear', random_state=1))
fit_params = {'logisticregression__sample_weight': dataset_orig_train.instance_weights}

lr_orig = model.fit(dataset_orig_train.features, dataset_orig_train.labels.ravel(), **fit_params)

#### Q7: Decision threshold and accuracy
#### *Determine the decision threshold to use for this logistic regression classifier, explain your method and report the threshold. Then, report the test accuracy and this average score for the test set for this threshold.* 
*Hint: Because this dataset might be class imbalanced, instead of using the accuracy, you should use the average of the true positive and true negative ratios.*

*Hint: We provide you with the "test()" function in order to compute various metrics on a dataset, for different thresholds. You can instantiate these thresholds with "thresh_arr = np.linspace(0.01, 0.99, 100)".*

In [None]:
from collections import defaultdict

def test(dataset, model, thresh_arr):
    dataset_pred = dataset.copy(deepcopy=True)
    pos_ind = np.where(model.classes_ == dataset.favorable_label)[0][0]
    dataset_pred.scores = model.predict_proba(dataset_pred.features)[:,pos_ind].reshape(-1,1)
    
    metric_arrs = defaultdict(list)
    for thresh in thresh_arr:
        fav_inds = dataset_pred.scores > thresh
        dataset_pred.labels[fav_inds] = dataset_pred.favorable_label
        dataset_pred.labels[~fav_inds] = dataset_pred.unfavorable_label

        # Computation of various metrics:
        metric = ClassificationMetric(
                dataset, dataset_pred,
                unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)
        metric_arrs['acc'].append(metric.accuracy())
        metric_arrs['bal_acc'].append((metric.true_positive_rate()
                                     + metric.true_negative_rate()) / 2)
        metric_arrs['avg_odds_diff'].append(metric.average_odds_difference())
        metric_arrs['disp_imp'].append(metric.disparate_impact())
        metric_arrs['stat_par_diff'].append(metric.statistical_parity_difference())
        metric_arrs['eq_opp_diff'].append(metric.equal_opportunity_difference())
        metric_arrs['theil_ind'].append(metric.theil_index())
        metric_arrs['precision_prot'].append(metric.precision(False))
        metric_arrs['precision_unprot'].append(metric.precision(True))
        metric_arrs['recall_prot'].append(metric.recall(False))
        metric_arrs['recall_unprot'].append(metric.recall(True))
        metric_arrs['num_TP'].append(metric.num_true_positives())
        metric_arrs['num_FP'].append(metric.num_false_positives())
        metric_arrs['num_TN'].append(metric.num_true_negatives())
        metric_arrs['num_FN'].append(metric.num_false_negatives())
        metric_arrs['positive_predictive_value_prot'].append(metric.positive_predictive_value(False))
        metric_arrs['positive_predictive_value_unprot'].append(metric.positive_predictive_value(True))
        metric_arrs['negative_predictive_value_prot'].append(metric.negative_predictive_value(False))
        metric_arrs['negative_predictive_value_unprot'].append(metric.negative_predictive_value(True))
        metric_arrs['false_negative_rate_difference'].append(metric.false_negative_rate_difference())
        metric_arrs['false_positive_rate_difference'].append(metric.false_positive_rate_difference())
        metric_arrs['disparate_impact'].append(metric.disparate_impact())
        metric_arrs['statistical_parity_difference'].append(metric.statistical_parity_difference())
        
        metric_arrs['proba_positive_prot'].append(metric.num_pred_positives(True)/metric.num_instances(True))
        metric_arrs['proba_positive_unprot'].append(metric.num_pred_positives(False)/metric.num_instances(False))

        metric_arrs['manual_statistical_parity_difference'].append(metric.num_pred_positives(False)/metric.num_instances(False)-metric.num_pred_positives(True)/metric.num_instances(True))
        metric_arrs['manual_disparate_impact'].append((metric.num_pred_positives(False)/metric.num_instances(False)) / (metric.num_pred_positives(True)/metric.num_instances(True)))
    return metric_arrs

thresh_arr = np.linspace(0.01, 0.99, 100)

In [None]:
# Code for Q7:


Take a minut to observe the results



#### Q8: *Confusion matrix*
The confusion matrix for this problem looks like the following:

    repay        | TP                     | FN

    did not repay | FP                     | TN

                  | good credit (low risk) | bad credit (high risk)
                      
#### *a) Give the number of true positive (TP), true negative (TN), false positive (FP) and false negative (FN) for the test set and a threshold of  0.68. Hint: you can use the AIF360 ClassificationMetric class to get these numbers, or make use of certain outputs of the "test()" function.*


**Optional**
Different stakeholders related to the system might want to check that different scores hold, and these scores can be expressed with the numbers in the confusion matrix. That is what we study in the following questions.

#### *b) The bank which gives loan might want to check the likelihood for those who are classified as good credit by the system (i.e. those who will be granted a loan), how many will indeed repay. How would you express this ratio in terms of TP, FP, FN, TN? Compute it.*

#### *c) The bank might also want to know out of those labeled bad credit, how many would actually repay. How would you express this ratio in terms of TP, FP, FN, TN? Compute it.*

#### *d) A bank client would like to know the probability that he/she would be incorrectly classified high risk while they are not. How would you express this ratio in terms of TP, FP, FN, TN? Compute it.*

#### *e) A classifier is often evaluated in terms of precision and recall. Compute the two scores. Reminder: precision is expressed in terms of TP / (TP + FP), recall is expressed in terms of TP / (TP + FN). To what extent are these scores satisfying?*

In [None]:
# Code for Q8 

### Q9: Fairness metrics

In Q8, we evaluated the classifer as it is usually done in machine learning. It appears more or less satisfying depending on the metric. But what happens if we now evaluate the classifier separately for the protected and non-protected groups? Reminder: In all the following questions, the protected group will be female, and the non-protected group male.

Most group fairness metrics consist in (1) computing one or several of these scores  (and their average) separately for the protected and non protected groups, and then (2) computing the difference or ratio of these scores. In this paper, you can find a summary of the main fairness metrics that exist https://fairware.cs.umass.edu/papers/Verma.pdf.

We now take a look at some of these metrics. Hint: for these questions, you can either again make use of the outputs of the "test()" function, or draw the confusion matrices for the two groups separately and compute them manually.

#### *a) We are interested in the positive predictive value ( TP / (TP + FP)). Compute the difference between the one for the protected group and non-protected group. Provide an interpretation of this metric.*

#### *b) We are interested in the negative predictive value ( FN / (FN + TN)). Compute the difference between the one for the protected group and non-protected group. Provide an interpretation of this metric.*

#### *c) We are interested in one of the error rate balance measures, relying on false negatives ( FN / (TP + FN)). Compute the difference between the one for the protected group and non-protected group. Provide an interpretation of this metric.*

#### *d) We are interested in one of the error rate balance measures, relying on false positives ( FP / (TN + FP)). Compute the difference between the one for the protected group and non-protected group. Provide an interpretation of this metric, especially in comparison with the metric in c).*

Disparate impact and statistical parity difference are two other very connected fairness metrics, relying on the probabilities of getting a positive outcome (i.e. getting a good credit label in our scenario) for the protected and non-protected groups. Disparate impact is computed by calculating the ratio of these two numbers, while statistical parity difference consists in computing their difference.
#### *e) Provide the measure for these two scores, explain how they relate to the confusion matrix (how are they computed) and explain how you would interpret them, especially in which cases someone might choose to focus on these metrics.*

#### *f) Looking at this new information compared to Q8, would you use the classifier in practice? why?*

In [None]:
# Q9  code


Answers Q9 Here

<H3>Step 3: Classification threshold and fairness constraint</H3>
The choice of decision threshold does not only impact measures of accuracy, but can also impact the fairness of the classifier. That is what we study here.


#### Q10: Disparate impact
Ideally, the disparate impact is equal to 1, and it can take values above and under 1. The value is lower than 1 when it advantages the non protected group, and above 1 when it advantages the protected one. Because of that, we can not directly compare it to a measure of accuracy since their interpretation is different (the ranges are different). 
Hence a more easily interpretable measure of unfairness would be the distance of the score to 1. 
However, due to the nature of this ratio of probabilities, values above 1 are not directly comparable to the values under 1 - values above 1 "overcorrect" the ratio (the two groups would not be treated similarly). For that reason, we don't directly study the disparate impact but 1−min(disparate impact,1/disparate impact) to correct this issue.

<i>Disparet impact simplified formula</i><br>

𝑃𝑟(𝑌̂ =pos_label|𝐷=unprivileged)/𝑃𝑟(𝑌̂ =pos_label|𝐷=privileged)<br>

#### *a) We plot for the validation dataset  1−min(disparate impact,1/disparate impact) and the balanced accuracy for thresholds between 0.01 and 0.99. Where would you set a threshold in relation to this new metric?  (you don't need to report a very specific threshold but simply give an approximate number) Explain your reasoning.*

#### *b) Recall the threshold you had set earlier. Do these thresholds match? Reflect on that.*

#### *c) Append to this plot the two curves for the two scores that compose the ratio in the calculation of the disparate impact. <br> Hint: you can find these scores in the outputs of the test() function.*

#### *Assuming we want a balanced accuracy above 0.65, where would you set thresholds observing these metrics (you don't need to report very specific thresholds but simply give  approximate numbers)? Explain your reasoning. <br>Hint: you might want to set different thresholds in the case where you maximize each of the two metric, and in the case where you focus on disparate impact.*

#### *d) Do the thresholds match for the different cases in c)? Reflect on that.*

Add as many cells as needed

In [None]:
# Code 

amswers

---------------

<H3>To Go further: Existing tools and Metrics</H3>



**FairLearning Framwork**<br>
Provides a ritch Observational Fairness Library of Metrics and Mitigation Approaches.<br>
Provides many tools to measure, mitigate bias and to explain ML outputs
For more information check the website and tutorials of <a href=https://fairlearn.org/><b>FairLearning</b></a>.
<br><br>

**IBM AI Fairness 360** <br>
Provides many tools to measure, mitigate bias and to explain ML outputs.<br>
For more information make use of the documentation of <a href=https://aif360.readthedocs.io/en/latest/index.html><b>AIF360</b></a>.

<br><br>