# Fairness Checking: Linear Program (Statistical Parity)
This notebook uses PuLP to solve the linear program outlined in the "Fairness Checking" document. NOTE: we took the definition of pi_a and pi_a' to be typos, as we would not have a linear objective function otherwise. We took pi_a = count of occurrences of A = a, so we took out the multiplication by w_i on each term (likewise for pi_a').

In [42]:
import pandas as pd
import numpy as np
import pulp

## Import Predicted Data
We take A (the protected attribute) to be the 'race' variable, and a = 0 while a' = 1. The last column, 'prediction,' is our f(X) variable, which can either be 0 or 1.

In [43]:
df = pd.read_csv('predicted_dataset_scores.csv')

In [44]:
df.head(5)

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,sex,age,race,juv_fel_count,juv_misd_count,juv_other_count,priors_count,two_year_recid,c_charge_degree_F,c_charge_degree_M,risk_recid,prediction
0,0,0,1,69,0,0,0,0,0,0,1,0,0,0
1,1,1,1,34,1,0,0,0,0,1,1,0,0,0
2,2,2,1,24,1,0,0,1,4,1,1,0,0,1
3,3,5,1,44,0,0,0,0,0,0,0,1,0,0
4,4,6,1,41,0,0,0,0,14,1,1,0,1,1


In [45]:
# Our protected variable A is race, and a = 0 or a = 1. 
a = df['race']
np.asarray(a)
a_0_indices = []
a_1_indices = []

for i in range(len(a)):
    if a[i] == 0:
        a_0_indices.append(i)
    elif a[i] == 1:
        a_1_indices.append(i)

In [46]:
# Our pi variables are simply the occurrences of a_0 or a_1 
pi_0 = len(a_0_indices)
pi_1 = len(a_1_indices)

In [47]:
print(pi_0)
print(pi_1)

2987
3172


In [48]:
# Our predicted variable f(X) is under 'prediction,' with f(X) = 0 or f(X) = 1
f_X = df['prediction']
np.asarray(f_X)

array([0, 0, 1, ..., 0, 0, 1])

## Create Linear Program
We use the PuLP package to create our linear program.

In [49]:
# Our w variable in the objective
# Lower bound constraint set here with lowBound=0 option
w = pulp.LpVariable.dicts("w", (i for i in range(len(f_X))),lowBound=0, cat='Continuous')

In [50]:
# Define the linear program as a maximization problem
model = pulp.LpProblem("Statistical Parity Fairness Checking", pulp.LpMaximize)

In [51]:
# Objective Function
model += pulp.lpSum(
    [(1./pi_0) * w[index] * f_X[index] for index in a_0_indices] +
    [- (1./pi_1) * w[index] * f_X[index] for index in a_1_indices]
)

In [52]:
# Constraint that the w's all sum to 1
model += pulp.lpSum([w[i] for i in range(len(f_X))]) == 1

In [53]:
# Solve the linear program
model.solve()

1

In [54]:
pulp.LpStatus[model.status]

'Optimal'

In [57]:
for i in range(len(f_X)):
    if w[i].varValue != 0:
        print(w[i])

w_4695


In [58]:
# The final value of the objective, optimized
pulp.value(model.objective)

0.00033478406427854036