# Lab 3: Exploring Fairness When Training Models

In this lab, we will detect bias that may be introduced while training classifiers. We will mitigate this bias via pre-processing and post-processing.  

This notebook has four stages in which we will: 
1. Import and split the data into train/test sets.
2. Train a classifier to predict credit using original data with or without sensitive features.
3. Preprocess the data using the reweighting algorithm and train a classifier using the reweighted data.
4. Post-process the predictions using the calibrated equality of odds algorithm. For each prediction from step 2, 3, and 4, we will measure bias using fairness metrics including mean outcomes, disparate impact, false positive rate, and false negative rate. 


In [1]:
!pip install numba==0.48
!pip install aif360==0.2.2

Collecting numba==0.48
  Downloading numba-0.48.0-1-cp37-cp37m-manylinux2014_x86_64.whl (3.5 MB)
[K     |████████████████████████████████| 3.5 MB 30.3 MB/s 
[?25hCollecting llvmlite<0.32.0,>=0.31.0dev0
  Downloading llvmlite-0.31.0-cp37-cp37m-manylinux1_x86_64.whl (20.2 MB)
[K     |████████████████████████████████| 20.2 MB 4.5 MB/s 
Installing collected packages: llvmlite, numba
  Attempting uninstall: llvmlite
    Found existing installation: llvmlite 0.34.0
    Uninstalling llvmlite-0.34.0:
      Successfully uninstalled llvmlite-0.34.0
  Attempting uninstall: numba
    Found existing installation: numba 0.51.2
    Uninstalling numba-0.51.2:
      Successfully uninstalled numba-0.51.2
Successfully installed llvmlite-0.31.0 numba-0.48.0
Collecting aif360==0.2.2
  Downloading aif360-0.2.2-py2.py3-none-any.whl (56.4 MB)
[K     |████████████████████████████████| 56.4 MB 1.2 MB/s 
Installing collected packages: aif360
Successfully installed aif360-0.2.2


In [2]:
import matplotlib.pyplot as plt 
%matplotlib inline

import random
random.seed(6)

import sys
import warnings

import numpy as np
import pandas as pd

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, roc_auc_score

from aif360.algorithms.preprocessing import Reweighing
from aif360.datasets import GermanDataset, StandardDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.algorithms.postprocessing import EqOddsPostprocessing


## Step 1: Load the data

The German Credit Risk dataset contains 1000 entries with 20 categorial/symbolic attributes prepared by Prof. Hofmann. In this dataset, each entry represents a person who takes a credit by a bank. Each person is classified as good or bad credit risks according to the set of attributes. The original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

### 1.1 Read in the aif360 dataset

In [3]:
dataset_orig = GermanDataset(protected_attribute_names=['age'],           
                             privileged_classes=[lambda x: x >= 25], 
                             features_to_drop=['personal_status', 'sex'])      # age >=25 is considered privileged

# Store definitions of priviledged and unpriviledged groups
privileged_groups = [{'age': 1}]
unprivileged_groups = [{'age': 0}]

### 1.2 Split into train/test sets

In [4]:
# Split original data into train and test data
train_orig, test_orig = dataset_orig.split([0.8], shuffle=True, seed=10)

# Convert to dataframes
train_orig_df, _ = train_orig.convert_to_dataframe()
test_orig_df, _ = test_orig.convert_to_dataframe()

print("Train set: ", train_orig_df.shape)
print("Test set: ", test_orig_df.shape)

Train set:  (800, 58)
Test set:  (200, 58)


In [5]:
train_orig_df.head()

Unnamed: 0,month,credit_amount,investment_as_income_percentage,residence_since,age,number_of_credits,people_liable_for,status=A11,status=A12,status=A13,status=A14,credit_history=A30,credit_history=A31,credit_history=A32,credit_history=A33,credit_history=A34,purpose=A40,purpose=A41,purpose=A410,purpose=A42,purpose=A43,purpose=A44,purpose=A45,purpose=A46,purpose=A48,purpose=A49,savings=A61,savings=A62,savings=A63,savings=A64,savings=A65,employment=A71,employment=A72,employment=A73,employment=A74,employment=A75,other_debtors=A101,other_debtors=A102,other_debtors=A103,property=A121,property=A122,property=A123,property=A124,installment_plans=A141,installment_plans=A142,installment_plans=A143,housing=A151,housing=A152,housing=A153,skill_level=A171,skill_level=A172,skill_level=A173,skill_level=A174,telephone=A191,telephone=A192,foreign_worker=A201,foreign_worker=A202,credit
841,21.0,2993.0,3.0,2.0,1.0,2.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0
956,30.0,3656.0,4.0,4.0,1.0,2.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0
544,12.0,1255.0,4.0,4.0,1.0,2.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0
173,8.0,1414.0,4.0,2.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0
759,12.0,691.0,4.0,3.0,1.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,2.0


As a reminder of what we did last week, let's calculate two fairness metrics (mean_difference and disparate_impact) on the training data (hint, use `BinarylabelDatasetMetric`):

In [6]:
# your code here

train_orig_metrics = BinaryLabelDatasetMetric(train_orig, unprivileged_groups=unprivileged_groups, 
                                              privileged_groups=privileged_groups)
print("Mean Difference = %f" % train_orig_metrics.mean_difference())
print("Disparate Impact = %f" % train_orig_metrics.disparate_impact())

Mean Difference = -0.175661
Disparate Impact = 0.760365


## Step 2: Train a classifier to predict credit using the original data

We will be training a logistic regression model to predict good/bad credit risk, then fine-tuning the model over a set of hyperparameters. Then, we'll see how well this basic model does on some fairness metrics. 

### 2.1 Training and evaluating a logistic regression model 
First, we need to split our data up into the explantory variables (x) and the outcome variable (y). We will recode the outcome so that the values are 0 (= bad credit) and 1 (= good credit). This is the format that the sklearn logistic regression function expects.

In [7]:
x_train = train_orig_df.drop("credit", axis=1)
y_train = train_orig_df.credit.replace({2:0}) 

x_test = test_orig_df.drop("credit", axis=1)
y_test = test_orig_df.credit.replace({2:0})

print("Outcomes: ")
y_train.value_counts()

Outcomes: 


1.0    565
0.0    235
Name: credit, dtype: int64

Let's specify the logistic regression model:

In [8]:
# Set up the logistic regression model
initial_lr = LogisticRegression(C=0.5, penalty="l1", solver='liblinear')

Next, we can fit our model using the training data:

In [9]:
# your code here
initial_lr = initial_lr.fit(x_train , y_train, sample_weight=None)

Now that we have a trained model, we should evaluate it. For now, we'll look at the AUC as well as accuracy when we use a cutoff of 0.5 (that is, predicted values over 0.5 are interpreted as good credit, and vice versa).

In [10]:
y_pred = initial_lr.predict(x_test)

accuracy = accuracy_score(y_test, [pred_prob >= 0.5 for pred_prob in y_pred])
auc = roc_auc_score(y_test, y_pred)

print("accuracy: ", accuracy)
print("AUC", auc)

accuracy:  0.725
AUC 0.6487179487179487


### 2.2 Evaluating bias in our predictions

Let's put our data back into a aif360 dataset format, so that we can use all of the fairness metrics provided by the package. For now, we'll evaluate bias on the training data. This mimics the development process we'd use in any real application.

First, we'll get predicted values using the best model and attach them as a new column in the data frame. We'll use 0.5 as the threshold as before.

In [11]:
# Copy the dataset
train_preds_df = train_orig_df.copy()
# Calculate predicted values
train_preds_df['credit'] = initial_lr.predict(x_train)
# Recode the predictions so that they match the format that the dataset was originally provided in 
# (1 = good credit, 2 = bad credit)
train_preds_df['credit'] = train_preds_df.credit.replace({0:2})

Then we'll create an object of the aif360 StandardDataset class. You can read more about this in the documentation:
https://aif360.readthedocs.io/en/latest/modules/generated/aif360.datasets.StandardDataset.html

In [12]:
orig_aif360 = StandardDataset(train_orig_df,
                              label_name='credit',
                              protected_attribute_names=['age'], 
                              privileged_classes=[[1]], favorable_classes=[1])
preds_aif360 = StandardDataset(train_preds_df,
                               label_name='credit',
                               protected_attribute_names=['age'], 
                               privileged_classes=[[1]], favorable_classes=[1])

Now, let's calculate some fairness metrics for `orig_aif360` and `preds_aif360`. Calculate the mean difference and disparate impact below (again using `BinaryLabelDatasetMetric`):

In [13]:
# your code here
pred_metrics = BinaryLabelDatasetMetric(preds_aif360,
                                        unprivileged_groups=unprivileged_groups, 
                                        privileged_groups=privileged_groups)
print("Mean Difference = %f" % pred_metrics.mean_difference())
print("Disparate Impact = %f" % pred_metrics.disparate_impact())

Mean Difference = -0.302021
Disparate Impact = 0.638215


Recall from last week that we identified bias in the training data. We should therefore not find it surprising that we have bias in a model trained on that data.

Now, since we have true values and predicted values, let's compare the true positive rate and false positive rate by group. This is similar to the analysis ProPublica did. We can use the `ClassificationMetric` function to do this.

Note that aif360 is pretty picky about what goes into this `ClassificationMetric` class, which is the reason for all the inefficient copying of datasets above. 

In [14]:
orig_vs_preds_metrics = ClassificationMetric(orig_aif360, preds_aif360,
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("\nError rate difference (unprivileged error rate - privileged error rate)= %f" % orig_vs_preds_metrics.error_rate_difference())

print("\nFalse negative rate for privileged groups = %f" % orig_vs_preds_metrics.false_negative_rate(privileged=True))
print("False negative rate for unprivileged groups = %f" % orig_vs_preds_metrics.false_negative_rate(privileged=False))
print("False negative rate ratio = %f" % orig_vs_preds_metrics.false_negative_rate_ratio())

print("\nFalse positive rate for privileged groups = %f" % orig_vs_preds_metrics.false_positive_rate(privileged=True))
print("False positive rate for unprivileged groups = %f" % orig_vs_preds_metrics.false_positive_rate(privileged=False))
print("False positive rate ratio = %f" % orig_vs_preds_metrics.false_positive_rate_ratio())


Error rate difference (unprivileged error rate - privileged error rate)= 0.095314

False negative rate for privileged groups = 0.072435
False negative rate for unprivileged groups = 0.294118
False negative rate ratio = 4.060458

False positive rate for privileged groups = 0.580110
False positive rate for unprivileged groups = 0.314815
False positive rate ratio = 0.542681


This confirms it: our model is even *more* biased than the original credit scores.  

Let's try to fix that.

## Step 3: Train a classifier to predict credit using the original data, excluding the sensitive feature

We've talked several times in class about how removing a sensitive attribute is not enough. Let's see if that's true in action.

In [15]:
x_train_noage = x_train.drop("age", axis=1)
x_test_noage  = x_test.drop("age", axis=1)

lr_noage = initial_lr.fit(x_train_noage,
                          y_train,
                          sample_weight=None)

y_pred_noage = lr_noage.predict(x_test_noage)

accuracy = accuracy_score(y_test, [pred_prob >= 0.5 for pred_prob in y_pred_noage])
auc = roc_auc_score(y_test, y_pred_noage)

print("accuracy: ", accuracy)
print("AUC", auc)

accuracy:  0.74
AUC 0.6717948717948719


Note that the accuracy of our model is *slightly* better than before: by excluding a feature, we've gained some accuracy.

Now let's check the same bias metrics again. See the above code cell where we created `orig_vs_preds_metrics`. 

In [16]:
preds_df_noage = train_orig_df.copy()
preds_df_noage['credit'] = lr_noage.predict(x_train.drop('age', axis=1))
preds_df_noage['credit'] = preds_df_noage.credit.replace({0:2})

In [26]:
# your code here
preds_df_noage = train_orig_df.copy()
preds_df_noage['credit'] = lr_noage.predict(x_train.drop('age', axis=1))
preds_df_noage['credit'] = preds_df_noage.credit.replace({0:2})

noage_preds_aif360 = StandardDataset(preds_df_noage, label_name = "credit", protected_attribute_names=["age"],
                                     privileged_classes=[[1]], favorable_classes=[1])

noage_preds_metrics = BinaryLabelDatasetMetric(noage_preds_aif360,
                                               unprivileged_groups=unprivileged_groups,
                                               privileged_groups=privileged_groups)
                                           

print("Mean Difference = %f" % noage_preds_metrics.mean_difference())
print("Disparate Impact = %f" % noage_preds_metrics.disparate_impact())


orig_vs_noage_preds_metrics = ClassificationMetric(orig_aif360,
                                                   noage_preds_aif360,
                                                   unprivileged_groups=unprivileged_groups,
                                                   privileged_groups=privileged_groups)

print("Error rate difference (unprivileged error rate - privileged error rate) = %f" %orig_vs_noage_preds_metrics.error_rate_difference())
print()

print("False negative rate for privileged groups = %f" %orig_vs_noage_preds_metrics.false_negative_rate(privileged=True))
print("False negative rate for unprivileged groups = %f" %orig_vs_noage_preds_metrics.false_negative_rate(privileged=False))
print("False negative ratio = %f" %orig_vs_noage_preds_metrics.false_negative_rate_ratio())

print()
print("False positive rate for privileged groups = %f" %orig_vs_noage_preds_metrics.false_positive_rate(privileged=True))
print("False positive rate for unprivileged groups = %f" %orig_vs_noage_preds_metrics.false_positive_rate(privileged=False))
print("False positive ratio = %f" %orig_vs_noage_preds_metrics.false_positive_rate_ratio())

Mean Difference = -0.146453
Disparate Impact = 0.821090
Error rate difference (unprivileged error rate - privileged error rate) = 0.157116

False negative rate for privileged groups = 0.080483
False negative rate for unprivileged groups = 0.220588
False negative ratio = 2.740809

False positive rate for privileged groups = 0.541436
False positive rate for unprivileged groups = 0.537037
False positive ratio = 0.991875


Scroll up -- how do these numbers for our model that doesn't use age compare to the model that *does* use it?

**Write your comparison in this text cell:**
The models that doesn't use age has a higher disparate impact, which means that we have mitigated bias to a certain extent.

## Step 4: Preprocess the data using the reweighing algorithm, then train a classifier to predict credit using the re-weighted data

In [27]:
# Fit the weights to our training data
RW = Reweighing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)
RW_fit = RW.fit(train_orig)

In [28]:
# Pull the actual values of the weights for the training data
train_reweighed = RW_fit.transform(train_orig)
training_weights = train_reweighed.instance_weights
training_weights[:10]

array([0.96345573, 0.96345573, 0.96345573, 0.96345573, 1.1003453 ,
       0.96345573, 0.96345573, 1.1003453 , 0.66365741, 1.1003453 ])

In [29]:
lr_weights = initial_lr.fit(x_train_noage,
                            y_train,
                            sample_weight=training_weights)

y_pred_weights = lr_weights.predict(x_test_noage)

accuracy = accuracy_score(y_test, [pred_prob >= 0.5 for pred_prob in y_pred_weights])
auc = roc_auc_score(y_test, y_pred_weights)

print("accuracy: ", accuracy)
print("AUC", auc)

accuracy:  0.755
AUC 0.6868945868945869


Our accuracy and AUC are slightly higher again. Following the process above, let's see if the fairness metrics changed. 

In [30]:
train_preds_df_weights = train_orig_df.copy()
train_preds_df_weights['credit'] = lr_weights.predict(x_train.drop('age', axis=1))
train_preds_df_weights['credit'] = train_preds_df_weights.credit.replace({0:2})

In [31]:
# your code here
train_preds_df_weights = train_orig_df.copy()
train_preds_df_weights['credit'] = lr_weights.predict(x_train.drop('age', axis=1))
train_preds_df_weights['credit'] = train_preds_df_weights.credit.replace({0:2})

preds_weights_aif360 = StandardDataset(train_preds_df_weights, label_name = "credit", protected_attribute_names=["age"],
                                     privileged_classes=[[1]], favorable_classes=[1])

preds_weights_metrics = BinaryLabelDatasetMetric(preds_weights_aif360,
                                               unprivileged_groups=unprivileged_groups,
                                               privileged_groups=privileged_groups)
                                           

print("Mean Difference = %f" % preds_weights_metrics.mean_difference())
print("Disparate Impact = %f" % preds_weights_metrics.disparate_impact())


orig_vs_preds_weights_metrics = ClassificationMetric(orig_aif360,
                                                   preds_weights_aif360,
                                                   unprivileged_groups=unprivileged_groups,
                                                   privileged_groups=privileged_groups)

print("Error rate difference (unprivileged error rate - privileged error rate) = %f" %orig_vs_preds_weights_metrics.error_rate_difference())
print()

print("False negative rate for privileged groups = %f" %orig_vs_preds_weights_metrics.false_negative_rate(privileged=True))
print("False negative rate for unprivileged groups = %f" %orig_vs_preds_weights_metrics.false_negative_rate(privileged=False))
print("False negative ratio = %f" %orig_vs_preds_weights_metrics.false_negative_rate_ratio())

print()
print("False positive rate for privileged groups = %f" %orig_vs_preds_weights_metrics.false_positive_rate(privileged=True))
print("False positive rate for unprivileged groups = %f" %orig_vs_preds_weights_metrics.false_positive_rate(privileged=False))
print("False positive ratio = %f" %orig_vs_preds_weights_metrics.false_positive_rate_ratio())


Mean Difference = -0.062189
Disparate Impact = 0.924573
Error rate difference (unprivileged error rate - privileged error rate) = 0.151869

False negative rate for privileged groups = 0.074447
False negative rate for unprivileged groups = 0.132353
False negative ratio = 1.777822

False positive rate for privileged groups = 0.546961
False positive rate for unprivileged groups = 0.629630
False positive ratio = 1.151141


How do these numbers compare to the numbers above?
This model has an even higher disparate impact than the previous models, meaning that we have managed to mitigate even more biases.

## Step 5: Post-process the predictions from the model that we trained using weights by using the calibrated equality of odds algorithm 

The equality of odds algorithm is a method for adjusting predicted probabilities to ensure that the false negative rate is equal for the privileged and unprivileged groups. (This also ensures that the true positive rate is equal.) To do so, the algorithm uses the predicted probabilities and determines *two* threshold probabilities for each group. Above the upper threshold, all members of the group are assigned to the positive class, and below the lower threshold, all members of the group are assigned to the negative class. But between the two thresholds, individuals are randomly assigned a class. 

For details, see M. Hardt, E. Price, and N. Srebro, “Equality of Opportunity in Supervised Learning,” Conference on Neural Information Processing Systems, 2016.

Which definitions of fairness does this post-processing algorithm contradict?

This contradicts the statistical definition of fairness.

In [32]:
# Transform our predictions using the aif360 implementation of the equality of odds algorithm
eq_odds = EqOddsPostprocessing(unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups, seed=47)
preds_weights_eq_odds_aif360 = eq_odds.fit_predict(orig_aif360, preds_weights_aif360)

  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)


Again, calculate fairness metrics:



In [33]:
# Calculate fairness metrics
preds_weights_eq_odds_metrics = BinaryLabelDatasetMetric(
    preds_weights_eq_odds_aif360,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

orig_vs_preds_weights_eq_odds_metrics = ClassificationMetric(
    orig_aif360,
    preds_weights_eq_odds_aif360,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

print("Mean difference = %f" % preds_weights_eq_odds_metrics.mean_difference())
print("Disparate Impact = %f" % preds_weights_eq_odds_metrics.disparate_impact())

print("\nError rate difference (unprivileged error rate - privileged error rate)= %f" % orig_vs_preds_weights_eq_odds_metrics.error_rate_difference())

print("\nFalse negative rate for privileged groups = %f" % orig_vs_preds_weights_eq_odds_metrics.false_negative_rate(privileged=True))
print("False negative rate for unprivileged groups = %f" % orig_vs_preds_weights_eq_odds_metrics.false_negative_rate(privileged=False))
print("False negative rate ratio = %f" % orig_vs_preds_weights_eq_odds_metrics.false_negative_rate_ratio())

print("\nFalse positive rate for privileged groups = %f" % orig_vs_preds_weights_eq_odds_metrics.false_positive_rate(privileged=True))
print("False positive rate for unprivileged groups = %f" % orig_vs_preds_weights_eq_odds_metrics.false_positive_rate(privileged=False))
print("False positive rate ratio = %f" % orig_vs_preds_weights_eq_odds_metrics.false_positive_rate_ratio())


Mean difference = 0.006722
Disparate Impact = 5.557377

Error rate difference (unprivileged error rate - privileged error rate)= -0.165990

False negative rate for privileged groups = 0.997988
False negative rate for unprivileged groups = 1.000000
False negative rate ratio = 1.002016

False positive rate for privileged groups = 0.000000
False positive rate for unprivileged groups = 0.018519
False positive rate ratio = inf


  return metric_fun(privileged=False) / metric_fun(privileged=True)


What's changed in these metrics? How could the algorithm have caused that? 

In [34]:
# Test how accuracy has changed
print("\nAccuracy (on training data) before equality of odds algorithm = %f" % orig_vs_preds_weights_metrics.accuracy())
print("\nAccuracy (on training data) after equality of odds algorithm = %f" % orig_vs_preds_weights_eq_odds_metrics.accuracy())


Accuracy (on training data) before equality of odds algorithm = 0.776250

Accuracy (on training data) after equality of odds algorithm = 0.293750


**Write your answer in this text cell:**
The false negative rate for privileged and unprivileged groups is almost equal, and disparate impact has increase even further. This is due to the equality of odds algorithm, which uses the predicted probabilities and determines *two* threshold probabilities for each group. Above the upper threshold, all members of the group are assigned to the positive class, and below the lower threshold, all members of the group are assigned to the negative class. But between the two thresholds, individuals are randomly assigned a class. 
