# Statistical parity difference

$P(Y = 1 | D = \text{unpriviledged}) - P(Y = 1 | D = \text{priviledged})$

The example dataset is the German credit dataset. The protected attribute is "age", with older than or equal to 25 and younger than 25 being the values for the privileged and unprivileged groups, respectively. The target of the prediction is the "credit" columns, where 1 is indicating that credit is handed out, 0 that it is not.



In [3]:
import pandas as pd

column_names = ['status', 'month', 'credit_history',
    'purpose', 'credit_amount', 'savings', 'employment',
    'investment_as_income_percentage', 'personal_status',
    'other_debtors', 'residence_since', 'property', 'age',
    'installment_plans', 'housing', 'number_of_credits',
    'skill_level', 'people_liable_for', 'telephone',
    'foreign_worker', 'credit']
filepath = '/Users/hkromer/01_Projects/27.Fairness_Bias/AIF360/aif360/data/raw/german/german.data'
df = pd.read_csv(filepath, sep=' ', header=None, names=column_names)

m = df['age'] >= 25 # filter mask: age >= 25 is considered priviledged group
priviledged = df[ m ] 
unpriviledged = df[ ~m ] 

priviledged.shape, unpriviledged.shape

((851, 21), (149, 21))

In [4]:
def statistical_parity_difference(df_priviledged, df_unpriviledged, favorable_class):
    r"""
    Inputs
    ------------
    df_priviledged : df
        Dataframe containing the dataset of the priviledged group. 
    df_unpriviledged : df
        Datafraem containing the dataset of the unpriviledged group.
    favorable_class : tuple
        Tuple with index 0: name of the column of the dataframe that contains the favorable target class 
        (i.e., the positive prediction)
        1: value for the target class (i.e., the value of the positive prediction)
    .. math::
       Pr(Y = 1 | D = \text{unprivileged})
       - Pr(Y = 1 | D = \text{privileged})
    """
    
    # filter for the priviledged class favorable
    mask = df_priviledged[favorable_class[0]] == favorable_class[1]
    ratio_priviledged = df_priviledged[ mask ].shape[0] / df_priviledged.shape[0]
    
    # filter for the unpriviledged class favorable
    mask = df_unpriviledged[favorable_class[0]] == favorable_class[1]
    ratio_unpriviledged = df_unpriviledged[ mask ].shape[0] / df_unpriviledged.shape[0]
    
    return ratio_unpriviledged - ratio_priviledged


In [5]:
favorable_class = ('credit', 1.0)
statistical_parity_difference(priviledged, unpriviledged, favorable_class)

-0.12854990969960323

The privileged group was getting 13% more positive outcomes in the training dataset.