# Integration demo

This file demonstrates interconnection between the functionality of the aequitas-lib and the context-service.
The context-service is the connection point between the different tools of the AEQUITAS project holding information about
current projects and their elements.

This interconnection is handled by the aequitas.gateway.
In order to test this interconnection, a context-service needs to be present and running.
Since the context service is not part of the D3.1/D4.1 demonstrator (but will be part of D3.2/D4.2),
a mockup context service placeholder is provided in:
https://github.com/aequitas-aod/prototype-api

### Usage:
* clone git repo
* (create venv)
* pip install -r requirements
* python aeq-api/server.py

This will start a context-service mockup on http://localhost:6060

### Running the examples without the context-service

The gateway also provides a file-system mode by passing filesystem=True to the gateway.
This will persist element data to a filesystem structure and json-files instead of using the context service.



In [2]:
import sys
sys.path.append("../")

In [3]:
import numpy as np
import pandas as pd
pd.set_option('display.width', 500)
from aequitas.engine import Aequitas, NpEncoder
import aequitas.tools.data_manip as dm
import aequitas.tools as tools

In [4]:
from aequitas.gateway import Gateway

In [5]:
#Import dataset
dataset_name="Census_Income_Dataset.csv"
dataset_directory="../datasets/"+dataset_name
dataset = pd.read_csv(dataset_directory)

In [6]:
# Dataset Pre-Processing

# remove fnlwgt column (per instructions)
dataset = dataset.drop('fnlwgt', axis=1)

# remove education column since there is an educution_num
dataset = dataset.drop('education', axis=1)

# impute the missing values
num_data = dataset.shape[0]
col_names = dataset.columns
for c in col_names:
	dataset[c] = dataset[c].replace("?", np.NaN)
dataset = dataset.apply(lambda x:x.fillna(x.value_counts().index[0]))

In [7]:
# Regroup race feature to White and Minorities
groups = [['White'], ['Black','Asian-Pac-Islander','Other','Amer-Indian-Eskimo']]
labels=['White','Minority']
dataset["race"]=dm.merge_values(dataset["race"],groups,labels)
print("Unique values: ",dataset["race"].unique())

Unique values:  ['Minority' 'White']


In [8]:
# We are going to demonstrate a few examples of parameter files for an Aequitas Object

# Empty parameters file (Example 1)
parameters={
}
Aeq_dataset=Aequitas(dataset,parameters)
Aeq_dataset.structure(verbose=True)

Dataset:
        Column Name Data Type Column Type (suggestion)  Number_Values                                             Values
0               age     int64               Continuous             74                                                  -
1         workclass      text      Categorical/Ordinal              8  [Private, Local-gov, Self-emp-not-inc, Federal...
2   educational-num     int64      Categorical/Ordinal             16  [7, 9, 12, 10, 6, 15, 4, 13, 14, 16, 3, 11, 5,...
3    marital-status      text      Categorical/Ordinal              7  [Never-married, Married-civ-spouse, Widowed, D...
4        occupation      text      Categorical/Ordinal             14  [Machine-op-inspct, Farming-fishing, Protectiv...
5      relationship      text      Categorical/Ordinal              6  [Own-child, Husband, Not-in-family, Unmarried,...
6              race      text                   Binary              2                                  [Minority, White]
7            gender    

In [9]:
#Basic parameters file (Example 2)
parameters={
    "class_attribute":{
        "name": 'income',
    },
}
Aeq_dataset=Aequitas(dataset,parameters)
Aeq_dataset.descriptive_stats(verbose=True)

Proportions: (income)
              0
<=50K  0.760718
>50K   0.239282



In [10]:
# A parameters file without any expectations of privileged groups  (Example 3)
parameters={
    "class_attribute":{
        "name": 'income',
        "positive_value":'>50K'
    },
    "sensitive_attributes":
    [
        {
            "name": 'gender',
        },
        {
            "name": 'race',
        }
    ]
}
Aeq_dataset=Aequitas(dataset,parameters)
Aeq_dataset.descriptive_stats(verbose=True)

Proportions: (income)
              0
<=50K  0.760718
>50K   0.239282

Proportions: (gender)
               0
Male    0.668482
Female  0.331518

Proportions: (race)
                 0
White     0.855043
Minority  0.144957

Outcome distribution by group:
           <=50K      >50K
Female  0.890749  0.109251
Male    0.696233  0.303767

Outcome distribution by group:
             <=50K      >50K
Minority  0.847458  0.152542
White     0.746013  0.253987


Association between gender and race.
Contingency Table:
race    Minority  White
gender                 
Female      3165  13027
Male        3915  28735

Chi-squared statistic: 497.9678182429906
Cramer's V: 0.10087228311688282
Degrees of Freedom: 1
p-value: 2.6310785315092373e-110
There is a statistically significant association between gender and race.

Association between gender and income.
Contingency Table:
income  <=50K  >50K
gender             
Female  14423  1769
Male    22732  9918

Chi-squared statistic: 2248.847679013691
Cramer's

In [11]:
#### Mattias Addition
gw = Gateway('demonstrator', host='http://localhost:6060/')
fs_only = False

gw.save_element(Aeq_dataset.parameters, element_key="dataset", filesystem=fs_only)

In [12]:
# Lets go forward with a more detailed fairness analysis

#Lets split the dataset into training and test samples
training_sample,test_sample = dm.split_dataset(dataset,ratio=0.3, random_state=123)

In [13]:
# Define a parameters file with privileged groups
parameters={
    "class_attribute":{
        "name": 'income',
        "positive_value":'>50K'
    },
    "sensitive_attributes":
    [
        {
            "name": 'gender',
            "privileged_group":'Male'
        },
    ]
}
# Define two Aequitas Objects
Aeq_training=Aequitas(training_sample,parameters)
Aeq_test=Aequitas(test_sample,parameters)

In [14]:
# Get data on Aeq_training object
Aeq_training.structure()
Aeq_training.descriptive_stats()

# you can use the folowing techniques without defining privileged groups. in that case the results will be displayed
# as if all values could be privileged.
Aeq_training.statistical_parity(verbose=True)
Aeq_training.disparate_impact(verbose=True)

Probabilities:
          Male    Female
>50K  0.303156  0.106205

Statistical/Demographic Parity:
Outcome:  >50K
      Male    Female
Male   0.0  0.196951


Probabilities:
          Male    Female
>50K  0.303156  0.106205

Disparate Impact:
Outcome:  >50K
      Male   Female
Male   1.0  0.35033




In [15]:
# Define appropriate transformations for dataset
transform_dictionary = {
    "income": {
        "encode": "labeling",
        "labels": {
            "<=50K": 0,
            ">50K": 1, 
        }
    },
    "gender": {
        "encode": "labeling",
        "labels": {
            "Female": 0,
            "Male": 1, 
        }
    },
    "race": {
        "encode": "labeling",
        "labels": {
            "Minority": 0,
            "White": 1, 
        } 
    },
    "workclass": {
        "encode": "labeling",
        "scaling": "min-max"
    },
    "marital-status": {
        "encode": "labeling",
        "scaling": "min-max"
    },
    "occupation": {
        "encode": "labeling", 
        "scaling": "min-max"
    },
    "relationship": {
        "encode": "labeling", 
        "scaling": "min-max"
    },
    "native-country": {
        "encode": "labeling", 
        "scaling": "min-max"
    },
    "age":{
        "scaling": "standard"
    },
    "educational-num":{
        "scaling": "min-max"
    },
    "capital-gain":{
        "scaling": "standard"
    },
    "capital-loss":{
        "scaling": "standard"
    },
    "hours-per-week":{
        "scaling": "standard"
    }
}

# add transform instructions for techniques
Aeq_training.transform_instructions(transform_dictionary)
Aeq_test.transform_instructions(transform_dictionary)

In [16]:

# mitigate bias on data (massaging) / you can use also uniform_sampling or preferential_sampling
Aeq_training_unbiased=Aeq_training.mitigation(method='massaging', sensitive_attribute='gender')

# check statistical parity on new unbiased object
Aeq_training_unbiased.statistical_parity(verbose=True)

Probabilities:
          Male    Female
>50K  0.237753  0.237881

Statistical/Demographic Parity:
Outcome:  >50K
      Male    Female
Male   0.0 -0.000128




In [17]:
# Lets do a classification to see the results on the test sample

# transform object's dataset to numeric values
Aeq_training_unbiased.transform()
Aeq_test.transform()

# define classifier parameters
classifier_type="Decision_Tree"
classifier_params={
    "random_state":42, 
    "min_samples_leaf":10
}
class_attribute=Aeq_training_unbiased.parameters["class_attribute"]["name"]

# Train a classifier on training sample
clf=tools.train_classifier(Aeq_training_unbiased.dataset,class_attribute,classifier_type,classifier_params)

# Test classifier on test sample
predicted_test_sample, _, _, _= tools.test_classifier(clf,Aeq_test.dataset,class_attribute,verbose=True)

# Inverse transform the predicted test sample and the test sample
Aeq_training_unbiased.inverse_transform()
Aeq_test.inverse_transform()

Classifier Accuracy: 0.80


In [18]:
# define a new prediction test sample
Aeq_predicted_test=Aeq_test.copy()
Aeq_predicted_test.set_dataset(predicted_test_sample)
Aeq_predicted_test.inverse_transform()

# check statistical parity on new prediction test sample
Aeq_predicted_test.statistical_parity(verbose=True)

Probabilities:
          Male    Female
>50K  0.204854  0.221832

Statistical/Demographic Parity:
Outcome:  >50K
      Male    Female
Male   0.0 -0.016979




In [19]:
prediction=np.array(Aeq_predicted_test.dataset[class_attribute])

# check equal opportunity / equal odds on test sample
Aeq_test.equal_opportunity(prediction,verbose=True)
Aeq_test.equal_odds(prediction,verbose=True)

Confusion Metrics:  (Positive_outcome='>50K')
          Female         Male
TP    404.000000  1461.000000
TN   3611.000000  6266.000000
FP    671.000000   548.000000
FN    160.000000  1532.000000
TPR     0.716312     0.488139
TNR     0.843298     0.919577
FPR     0.156702     0.080423
FNR     0.283688     0.511861
FDR     0.624186     0.272773
FOR     0.042429     0.196461
PPV     0.375814     0.727227
NPV     0.957571     0.803539
RPP     0.221832     0.204854
RNP     0.778168     0.795146
ACC     0.828518     0.787907
Equality of Opportunity:  (Positive_outcome='>50K')
        Female  Male
Male -0.228173   0.0

Confusion Metrics:  (Positive_outcome='>50K')
          Female         Male
TP    404.000000  1461.000000
TN   3611.000000  6266.000000
FP    671.000000   548.000000
FN    160.000000  1532.000000
TPR     0.716312     0.488139
TNR     0.843298     0.919577
FPR     0.156702     0.080423
FNR     0.283688     0.511861
FDR     0.624186     0.272773
FOR     0.042429     0.196461
PPV

In [20]:
#display parameters file
Aeq_training_unbiased.display()

#save paramenters file to remote server
gw.save_element(Aeq_training_unbiased.parameters, element_key="dataset", version="unbiased")

Aequitas Dataset parameters:
{
    "class_attribute": {
        "name": "income",
        "positive_value": ">50K"
    },
    "sensitive_attributes": [
        {
            "name": "gender",
            "privileged_group": "Male"
        }
    ],
    "Mitigation": "True",
    "Mitigation_technique": "massaging",
    "transform_dictionary": {
        "income": {
            "encode": "labeling",
            "labels": {
                "<=50K": 0,
                ">50K": 1
            }
        },
        "gender": {
            "encode": "labeling",
            "labels": {
                "Female": 0,
                "Male": 1
            }
        },
        "race": {
            "encode": "labeling",
            "labels": {
                "Minority": 0,
                "White": 1
            }
        },
        "workclass": {
            "encode": "labeling",
            "scaling": "min-max"
        },
        "marital-status": {
            "encode": "labeling",
            "scali

In [21]:
#display parameters file
Aeq_training.display()

#save paramenters file to remote server
gw.save_element(Aeq_training_unbiased.parameters, element_key="dataset", version="training")

Aequitas Dataset parameters:
{
    "class_attribute": {
        "name": "income",
        "positive_value": ">50K"
    },
    "sensitive_attributes": [
        {
            "name": "gender",
            "privileged_group": "Male"
        }
    ],
    "Mitigation": "False",
    "proportions": {
        "income": {
            "<=50K": 0.7622042177308491,
            ">50K": 0.2377957822691509
        },
        "gender": {
            "Male": 0.668138875076779,
            "Female": 0.3318611249232209
        }
    },
    "outcome_distribution_by_group": {
        "income/gender": {
            "Female": {
                "<=50K": 0.8937951701040014,
                ">50K": 0.1062048298959986
            },
            "Male": {
                "<=50K": 0.6968436720220637,
                ">50K": 0.30315632797793635
            }
        }
    },
    "contingency": [
        {
            "attribute1": "gender",
            "attribute2": "income",
            "contingency_table": {
   

In [22]:
#display parameters file
Aeq_test.display()

#save paramenters file to remote server
gw.save_element(Aeq_training_unbiased.parameters, element_key="dataset", version="test")

Aequitas Dataset parameters:
{
    "class_attribute": {
        "name": "income",
        "positive_value": ">50K"
    },
    "sensitive_attributes": [
        {
            "name": "gender",
            "privileged_group": "Male"
        }
    ],
    "Mitigation": "False",
    "transform_dictionary": {
        "income": {
            "encode": "labeling",
            "labels": {
                "<=50K": 0,
                ">50K": 1
            }
        },
        "gender": {
            "encode": "labeling",
            "labels": {
                "Female": 0,
                "Male": 1
            }
        },
        "race": {
            "encode": "labeling",
            "labels": {
                "Minority": 0,
                "White": 1
            }
        },
        "workclass": {
            "encode": "labeling",
            "scaling": "min-max"
        },
        "marital-status": {
            "encode": "labeling",
            "scaling": "min-max"
        },
        "occup