The subgroups that MDSS identifies:
- Priviliged groups: `Non-caucasian`, `less than 25`, `Male`
- Unpriviliged groups:  `Non-caucasian`, `Female`

In this exercise we will be comparing the bias between:

1. Priviliged by MDSS vs. Opposite in one attribute:
- Non-caucasian, less than 25, Male vs. Non-caucasian, less than 25, Female
- Non-caucasian, less than 25, Male vs. Caucasian, less than 25, Male
1. Unpriviliged by MDSS vs. Opposite in one attribute:
- Non-caucasian, Female vs. Caucasian, Female
- Non-caucasian, Female vs. Non-caucasian, Male


In [130]:
import itertools

import numpy as np
import pandas as pd

from aif360.metrics import BinaryLabelDatasetMetric, MDSSClassificationMetric
from aif360.detectors import bias_scan

from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_compas
from aif360.datasets import StandardDataset


In [131]:
non_cocausian_less25_male = [{'race': 0, 'age_cat': 0, 'sex':0}]
non_cocausian_less25_female = [{'race': 0, 'age_cat': 0, 'sex': 1}]
cocausian_less25_male = [{'race': 1, 'age_cat': 0, 'sex': 0}]

non_cocausian_female = [{'race': 0, 'sex': 1}]
cocausian_female = [{'race': 1, 'sex': 1}]
non_cocausian_male = [{'race': 0, 'sex': 0}]

## Non-caucasian, less than 25, Male And Non-caucasian, less than 25, Female


In [132]:
dataset_orig = load_preproc_data_compas()
dataset_orig_df = pd.DataFrame(dataset_orig.features, columns=dataset_orig.feature_names)

age_cat = np.argmax(dataset_orig_df[['age_cat=Less than 25', 'age_cat=25 to 45',
                                     'age_cat=Greater than 45']].values, axis=1).reshape(-1, 1)
priors_count = np.argmax(dataset_orig_df[['priors_count=0', 'priors_count=1 to 3',
                                          'priors_count=More than 3']].values, axis=1).reshape(-1, 1)
c_charge_degree = np.argmax(dataset_orig_df[['c_charge_degree=M', 'c_charge_degree=F']].values, axis=1).reshape(-1, 1)

features = np.concatenate((dataset_orig_df[['sex', 'race']].values, age_cat, priors_count,
                           c_charge_degree, dataset_orig.labels), axis=1)
feature_names = ['sex', 'race', 'age_cat', 'priors_count', 'c_charge_degree']

df = pd.DataFrame(features, columns=feature_names + ['two_year_recid'])

dataset = StandardDataset(df, label_name='two_year_recid', favorable_classes=[0],
                 protected_attribute_names=['sex', 'race', 'age_cat'],
                 privileged_classes=[[1], [1], [1]],
                 instance_weights_name=None)

In [133]:
dataset_orig_train, dataset_orig_test = dataset.split([0.7], shuffle=True, seed=0)

metric_train = BinaryLabelDatasetMetric(dataset_orig_train,
                             unprivileged_groups=non_cocausian_less25_female,
                             privileged_groups=non_cocausian_less25_male)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_train.mean_difference())
metric_test = BinaryLabelDatasetMetric(dataset_orig_test,
                             unprivileged_groups=non_cocausian_less25_female,
                             privileged_groups=non_cocausian_less25_male)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_test.mean_difference())


Train set: Difference in mean outcomes between unprivileged and privileged groups = 0.148845
Test set: Difference in mean outcomes between unprivileged and privileged groups = 0.328246


In [134]:
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(solver='lbfgs', C=1.0, penalty='l2', random_state=0)
clf.fit(dataset_orig_train.features, dataset_orig_train.labels.flatten())

In [135]:
dataset_bias_test_prob = clf.predict_proba(dataset_orig_test.features)[:, 0]

In [136]:
df = pd.DataFrame(dataset_orig_test.features, columns=dataset_orig_test.feature_names)
df['observed'] = pd.Series(dataset_orig_test.labels.flatten(), index=df.index)
df['probabilities'] = pd.Series(dataset_bias_test_prob, index=df.index)
df.head()

Unnamed: 0,sex,race,age_cat,priors_count,c_charge_degree,observed,probabilities
0,1.0,1.0,2.0,2.0,1.0,1.0,0.552951
1,1.0,0.0,1.0,0.0,1.0,0.0,0.740959
2,0.0,1.0,0.0,1.0,1.0,0.0,0.374728
3,0.0,0.0,2.0,2.0,1.0,1.0,0.444487
4,0.0,1.0,1.0,1.0,0.0,1.0,0.584908


In [137]:
dataset_bias_test = dataset_orig_test.copy()
dataset_bias_test.scores = dataset_bias_test_prob
dataset_bias_test.labels = dataset_orig_test.labels

In [138]:
test_df = dataset_bias_test.convert_to_dataframe()[0]
test_df['model_not_recid'] = dataset_bias_test.scores.flatten()
test_df['observed_not_recid'] = 1 - test_df['two_year_recid']

In [139]:
# Non-Caucasian, less than 25, male prediction vs observation 
test_df[(test_df['sex'] == 0) & (test_df['race'] == 0) & (test_df['age_cat'] == 0)][['model_not_recid','observed_not_recid']].mean()

model_not_recid       0.426994
observed_not_recid    0.322917
dtype: float64

In [140]:
# Non-Caucasian, less than 25, female prediction vs observation 
test_df[(test_df['sex'] == 1) & (test_df['race'] == 0) & (test_df['age_cat'] == 0)][['model_not_recid','observed_not_recid']].mean()

model_not_recid       0.537471
observed_not_recid    0.651163
dtype: float64

In [141]:
mdss_classified_1 = MDSSClassificationMetric(dataset_orig_test, dataset_bias_test,
                                           unprivileged_groups=non_cocausian_less25_male,
                                           privileged_groups=non_cocausian_less25_female)
group1_unprivileged_score = mdss_classified_1.score_groups(privileged=False)
print(group1_unprivileged_score)
group1_privileged_score = mdss_classified_1.score_groups(privileged=True)
print(group1_privileged_score)

-0.0
-0.0


Based on the results (0.0 for both scores), there is no evidence that `Non-caucasian, less than 25, male` is unpriviliged, and also no evidence that `Non-caucasian, less than 25, female` is priviliged

In [142]:
mdss_classified_2 = MDSSClassificationMetric(dataset_orig_test, dataset_bias_test,
                                           unprivileged_groups=non_cocausian_less25_female,
                                           privileged_groups=non_cocausian_less25_male)
group1_unprivileged_score = mdss_classified_2.score_groups(privileged=False)
print(group1_unprivileged_score)
group1_privileged_score = mdss_classified_2.score_groups(privileged=True)
print(group1_privileged_score)

1.2296
4.6526


Based on the results, there is evidence that `Non-caucasian, less than 25, male` is priviliged, and also evidence that `Non-caucasian, less than 25, female` is unpriviliged

## Non-caucasian, less than 25, Male And Caucasian, less than 25, Male


In [158]:
dataset_orig_train, dataset_orig_test = dataset.split([0.7], shuffle=True, seed=0)

metric_train = BinaryLabelDatasetMetric(dataset_orig_train,
                             unprivileged_groups=cocausian_less25_male,
                             privileged_groups=non_cocausian_less25_male)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_train.mean_difference())
metric_test = BinaryLabelDatasetMetric(dataset_orig_test,
                             unprivileged_groups=cocausian_less25_male,
                             privileged_groups=non_cocausian_less25_male)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_test.mean_difference())


Train set: Difference in mean outcomes between unprivileged and privileged groups = 0.095904
Test set: Difference in mean outcomes between unprivileged and privileged groups = 0.145438


In [144]:
# Non-Caucasian, less than 25, male prediction vs observation 
test_df[(test_df['sex'] == 0) & (test_df['race'] == 0) & (test_df['age_cat'] == 0)][['model_not_recid','observed_not_recid']].mean()

model_not_recid       0.426994
observed_not_recid    0.322917
dtype: float64

In [145]:
# Caucasian, less than 25, male prediction vs observation 
test_df[(test_df['sex'] == 0) & (test_df['race'] == 1) & (test_df['age_cat'] == 0)][['model_not_recid','observed_not_recid']].mean()

model_not_recid       0.462076
observed_not_recid    0.468354
dtype: float64

In [146]:
mdss_classified_2 = MDSSClassificationMetric(dataset_orig_test, dataset_bias_test,
                                           unprivileged_groups=non_cocausian_less25_male,
                                           privileged_groups=cocausian_less25_male)
group1_unprivileged_score = mdss_classified_2.score_groups(privileged=False)
print(group1_unprivileged_score)
group1_privileged_score = mdss_classified_2.score_groups(privileged=True)
print(group1_privileged_score)

-0.0
-0.0


Based on the results (0.0 for both scores), there is no evidence that `Non-caucasian, less than 25, male` is unpriviliged, and also no evidence that `Caucasian, less than 25, male` is priviliged

In [147]:
mdss_classified = MDSSClassificationMetric(dataset_orig_test, dataset_bias_test,
                                           unprivileged_groups=cocausian_less25_male,
                                           privileged_groups=non_cocausian_less25_male)
group1_unprivileged_score = mdss_classified.score_groups(privileged=False)
print(group1_unprivileged_score)
group1_privileged_score = mdss_classified.score_groups(privileged=True)
print(group1_privileged_score)

0.0067
4.6526


Based on the results, there is evidence that `Non-caucasian, less than 25, male` is priviliged. As for `Caucasian, less than 25, male` the score is very low (close to 0) which doesn't reflect a high bias against this group, it is not unpriviliged in this case.

## Non-caucasian, Female And Non-caucasian, Male


In [148]:
dataset_orig_train, dataset_orig_test = dataset.split([0.7], shuffle=True, seed=0)

metric_train = BinaryLabelDatasetMetric(dataset_orig_train,
                             unprivileged_groups=non_cocausian_female,
                             privileged_groups=non_cocausian_male)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_train.mean_difference())
metric_test = BinaryLabelDatasetMetric(dataset_orig_test,
                             unprivileged_groups=non_cocausian_female,
                             privileged_groups=non_cocausian_male)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_test.mean_difference())


Train set: Difference in mean outcomes between unprivileged and privileged groups = 0.164657
Test set: Difference in mean outcomes between unprivileged and privileged groups = 0.232482


In [149]:
# Non-Caucasian, female prediction vs observation 
test_df[(test_df['sex'] == 1) & (test_df['race'] == 0)][['model_not_recid','observed_not_recid']].mean()

model_not_recid       0.563478
observed_not_recid    0.668639
dtype: float64

In [150]:
# Non-caucasian, male prediction vs observation 
test_df[(test_df['sex'] == 0) & (test_df['race'] == 0)][['model_not_recid','observed_not_recid']].mean()

model_not_recid       0.469250
observed_not_recid    0.436157
dtype: float64

In [152]:
mdss_classified = MDSSClassificationMetric(dataset_orig_test, dataset_bias_test,
                                           unprivileged_groups=non_cocausian_male,
                                           privileged_groups=non_cocausian_female)
group1_unprivileged_score = mdss_classified.score_groups(privileged=False)
print(group1_unprivileged_score)
group1_privileged_score = mdss_classified.score_groups(privileged=True)
print(group1_privileged_score)

-0.0
-0.0


Based on the results (0.0 for both scores), there is no evidence that `Non-caucasian, male` is unpriviliged, and also no evidence that `Non-caucasian, female` is priviliged

In [151]:
mdss_classified = MDSSClassificationMetric(dataset_orig_test, dataset_bias_test,
                                           unprivileged_groups=non_cocausian_female,
                                           privileged_groups=non_cocausian_male)
group1_unprivileged_score = mdss_classified.score_groups(privileged=False)
print(group1_unprivileged_score)
group1_privileged_score = mdss_classified.score_groups(privileged=True)
print(group1_privileged_score)

4.3036
1.9281


Based on the results, there is evidence that `Non-caucasian, female` is priviliged. And also evidence that `Non-caucasian, male` is unpriviliged.

## Non-caucasian, Female And Caucasian, Female


In [161]:
dataset_orig_train, dataset_orig_test = dataset.split([0.7], shuffle=True, seed=0)

metric_train = BinaryLabelDatasetMetric(dataset_orig_train,
                             unprivileged_groups=non_cocausian_female,
                             privileged_groups=cocausian_female)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_train.mean_difference())
metric_test = BinaryLabelDatasetMetric(dataset_orig_test,
                             unprivileged_groups=non_cocausian_female,
                             privileged_groups=cocausian_female)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_test.mean_difference())


Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.035810
Test set: Difference in mean outcomes between unprivileged and privileged groups = 0.025282


In [162]:
# Non-Caucasian, female prediction vs observation 
test_df[(test_df['sex'] == 1) & (test_df['race'] == 0)][['model_not_recid','observed_not_recid']].mean()

model_not_recid       0.563478
observed_not_recid    0.668639
dtype: float64

In [155]:
# Caucasian, female prediction vs observation 
test_df[(test_df['sex'] == 1) & (test_df['race'] == 1)][['model_not_recid','observed_not_recid']].mean()

model_not_recid       0.681477
observed_not_recid    0.643357
dtype: float64

In [165]:
mdss_classified = MDSSClassificationMetric(dataset_orig_test, dataset_bias_test,
                                           unprivileged_groups=cocausian_female,
                                           privileged_groups=non_cocausian_female)
group1_unprivileged_score = mdss_classified.score_groups(privileged=False)
print(group1_unprivileged_score)
group1_privileged_score = mdss_classified.score_groups(privileged=True)
print(group1_privileged_score)

-0.0
-0.0


Based on the results (0.0 for both scores), there is no evidence that `Caucasian, Female` is unpriviliged, and also no evidence that `Non-caucasian, Female` is priviliged

In [166]:
mdss_classified = MDSSClassificationMetric(dataset_orig_test, dataset_bias_test,
                                           unprivileged_groups=non_cocausian_female,
                                           privileged_groups=cocausian_female)
group1_unprivileged_score = mdss_classified.score_groups(privileged=False)
print(group1_unprivileged_score)
group1_privileged_score = mdss_classified.score_groups(privileged=True)
print(group1_privileged_score)

4.3036
0.5258


Based on the results, there is evidence that `Caucasian, female` is priviliged. And there is no evidence that `Non-caucasian, Female` is unpriviliged.