# Debugging Biosbias
- **Task**: For Biosbias, the task is predicting the occupation of a given bio paragraph, i.e., whether the person is 'a surgeon' (class 0) or 'a nurse' (class 1).
- **Problem**: Due to the gender imbalance in each occupation, a classifier usually exploits gender information when making predictions. As a result, bios of female surgeons and male nurses are often misclassified. We quantify the bias of the model using two metrics: **FPED and FNED** (For details, please see [Dixon et al., 2018](https://dl.acm.org/doi/pdf/10.1145/3278721.3278729)). 
- **Solution**: To reduce the model's bias, we use our framework to identify the features which detect gender information rather than occupation and disable such features.

In [1]:
# Notebook setup
%matplotlib inline

import pickle
import os
import datetime
import random
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
sess = InteractiveSession(config=config)

plt.rcParams['figure.figsize'] = [14, 7]
os.environ['PYTHONHASHSEED'] = '0'

# Set random seed to create reproducable results
the_seed = 1234
np.random.seed(the_seed)
random.seed(the_seed)
from keras import backend as K
tf.set_random_seed(the_seed)
K.set_session(sess)

Using TensorFlow backend.


In [2]:
import find

## Settings

- GloVe word embeddings: Please replace the string in the second line with a path to your GloVe embeddings file which can be download [here](http://nlp.stanford.edu/data/glove.6B.zip)

In [3]:
EMBEDDING_DIM = 300
EMBEDDING_PATH = f"GLoVe/glove.6B.{EMBEDDING_DIM}d.txt" # Path to your glove embeddings

- Dataset

In [4]:
DATA_PATH = 'preprocessed_data/'
MAIN_DATASET = 'Biosbias2'
SECOND_DATASET = None
THIRD_DATASET = None
GENDER_BIAS = True

- Model

In [5]:
MODEL_PATH = 'trained_models/'
MODEL_ARCH = 'CNN'
MAXLEN = 150
FILTERS = [(10, 2), (10, 3), (10, 4)] # Ten filters of each window size [2,3,4]
BATCH_SIZE = 128

## Model creation and training

In [6]:
# 0. Load GloVe embeddings
embedding_matrix, vocab_size, index2word, word2index = find.get_embedding_matrix(EMBEDDING_PATH, EMBEDDING_DIM, pad_initialisation = "zeros")

Loading Glove Model


400000it [00:28, 13845.05it/s]


Done. 400000  words loaded!


In [7]:
# 1. Load datasets and prepare inputs
# 1.1 Main dataset
data_1 = pickle.load(open(DATA_PATH + f'all_data_{MAIN_DATASET}.pickle', 'rb'))
class_names = data_1['class_names']
X_train_1, X_validate_1, X_test_1 = find.get_data_matrix(data_1['text_train'], word2index, MAXLEN), \
                                    find.get_data_matrix(data_1['text_validate'], word2index, MAXLEN), \
                                    find.get_data_matrix(data_1['text_test'], word2index, MAXLEN)
y_test_1 = data_1['y_test']
gender_test_1 = data_1['gender_test'] if GENDER_BIAS else None

# 1.2 Second dataset
if SECOND_DATASET is not None:
    data_2 = pickle.load(open(DATA_PATH + f'all_data_{SECOND_DATASET}.pickle', 'rb'))
    X_test_2, y_test_2 = find.get_data_matrix(data_2['text_test'], word2index, MAXLEN), data_2['y_test']
    gender_test_2 = data_2['gender_test'] if GENDER_BIAS else None
else:
    X_test_2, y_test_2, gender_test_2 = None, None, None

# 1.3 Third dataset
if THIRD_DATASET is not None:
    data_3 = pickle.load(open(DATA_PATH + f'all_data_{THIRD_DATASET}.pickle', 'rb'))
    X_test_3, y_test_3 = find.get_data_matrix(data_3['text_test'], word2index, MAXLEN), data_3['y_test']
    gender_test_3 = data_3['gender_test'] if GENDER_BIAS else None
else:
    X_test_3, y_test_3, gender_test_2  = None, None, None

100%|██████████| 3832/3832 [00:01<00:00, 3520.74it/s]
100%|██████████| 1277/1277 [00:00<00:00, 3828.27it/s]
100%|██████████| 1278/1278 [00:00<00:00, 4087.22it/s]


In [8]:
# 2. Create the result directory
if not os.path.exists(MODEL_PATH):
    os.makedirs(MODEL_PATH)
result_folder = MAIN_DATASET + '_' + MODEL_ARCH + '_' + datetime.datetime.now().strftime("%Y%m%d%H%M%S") + '/'
result_path = MODEL_PATH + result_folder
os.mkdir(result_path)

In [9]:
# 3. Create a model
if MODEL_ARCH == 'CNN':
    model = find.get_CNN_model(vocab_size, EMBEDDING_DIM, embedding_matrix, MAXLEN, class_names, FILTERS)
else:
    assert False, f"Unsupported model architecture: {MODEL_ARCH}"










__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, None)         0                                            
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, 150, 300)     120000600   input_1[0][0]                    
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, 149, 10)      6010        embedding_1[0][0]                
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, 148, 10)      9010        embedding_1[0][0]                
__________________________________________________________________________________________________
c

In [10]:
# 4. Train the model
history = find.model_train(model, result_path + f'trained_{MODEL_ARCH}.h5', X_train_1, data_1['y_train'], X_validate_1, data_1['y_validate'], BATCH_SIZE, epochs = 300)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where



Train on 3832 samples, validate on 1277 samples
Epoch 1/300
 - 1s - loss: 0.6753 - acc: 0.6931 - val_loss: 0.2848 - val_acc: 0.8998

Epoch 00001: val_loss improved from inf to 0.28477, saving model to trained_models/Biosbias2_CNN_20220522173805/trained_CNN.h5
Epoch 2/300
 - 0s - loss: 0.1922 - acc: 0.9324 - val_loss: 0.1528 - val_acc: 0.9475

Epoch 00002: val_loss improved from 0.28477 to 0.15276, saving model to trained_models/Biosbias2_CNN_20220522173805/trained_CNN.h5
Epoch 3/300
 - 0s - loss: 0.1255 - acc: 0.9564 - val_loss: 0.1264 - val_acc: 0.9601

Epoch 00003: val_loss improved from 0.15276 to 0.12636, saving model to trained_models/Biosbias2_CNN_20220522173805/trained_CNN.h5
Epoch 4/300
 - 0s - loss: 0.1023 - acc: 0.9650 - val_loss: 0.1165 - val_acc: 0.9577

Epoch 00004: val_loss improved from 0.12636 to 0.11650, saving model to trained_models/Biosbias2_CNN_20220522173805/trained_CN

In [11]:
# 5. Evaluate the model
if not GENDER_BIAS:
    find.evaluate_all(model, class_names, BATCH_SIZE, X_test_1, y_test_1, X_test_2, y_test_2, X_test_3, y_test_3, result_path = result_path, model_name = 'original')
else:
    find.evaluate_all_gender(model, class_names, BATCH_SIZE, X_test_1, y_test_1, gender_test_1, X_test_2, y_test_2, gender_test_2, result_path = result_path, model_name = 'original')

Evaluate with the original test set:
{'per_class': {0: {'all_positive': 722,
                   'all_true': 731,
                   'class_f1': 0.9607708189951822,
                   'class_name': 'surgeon',
                   'class_precision': 0.9667590027700831,
                   'class_recall': 0.9548563611491108,
                   'true_positive': 698},
               1: {'all_positive': 556,
                   'all_true': 547,
                   'class_f1': 0.9483227561196738,
                   'class_name': 'nurse',
                   'class_precision': 0.9406474820143885,
                   'class_recall': 0.9561243144424132,
                   'true_positive': 523}},
 'total': {'accuracy': 0.9553990610328639,
           'macro_f1': 0.9545959536911285,
           'macro_precision': 0.9537032423922358,
           'macro_recall': 0.955490337795762,
           'micro_f1': 0.9553990610328639,
           'micro_precision': 0.9553990610328639,
           'micro_recall': 0.95539906

## Model understanding and debugging

In [12]:
# 6. Generate wordclouds
settings = {
    'model_arch': MODEL_ARCH,
    'filters': FILTERS,
    'maxlen': MAXLEN,
    'result_path': result_path,
    'index2word': index2word,
    'embedding_dim': EMBEDDING_DIM,
    'batch_size': BATCH_SIZE
}
all_wordclouds = find.generate_wordclouds(model, X_train_1, settings, max_examples = 2000)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
embedded_text_input (InputLayer (None, 150, 300)     0                                            
__________________________________________________________________________________________________
conv1d_4 (Conv1D)               (None, 149, 10)      6010        embedded_text_input[0][0]        
__________________________________________________________________________________________________
conv1d_5 (Conv1D)               (None, 148, 10)      9010        embedded_text_input[0][0]        
__________________________________________________________________________________________________
conv1d_6 (Conv1D)               (None, 147, 10)      12010       embedded_text_input[0][0]        
__________________________________________________________________________________________________
global_max

100%|██████████| 16/16 [00:00<00:00, 35.68it/s]
 20%|██        | 6/30 [00:01<00:07,  3.40it/s]



 27%|██▋       | 8/30 [00:02<00:06,  3.48it/s]



 63%|██████▎   | 19/30 [00:05<00:03,  3.45it/s]



100%|██████████| 30/30 [00:08<00:00,  3.36it/s]
100%|██████████| 30/30 [00:05<00:00,  5.39it/s]


- Get input from a human

In [13]:
is_feature_enabled = [True for i in range(find.num_all_filters(FILTERS))]

In [14]:
# UI components from ipywidgets
import ipywidgets as wgt

def update_screen(feature_idx):
    show_action_panel(feature_idx)
    wordcloud = all_wordclouds[feature_idx]
    f, ax = plt.subplots()
    plt.rcParams['figure.figsize'] = [14, 7]
    ax.imshow(wordcloud, interpolation='bilinear')
    ax.axis("off")
    
    W = model.layers[-1].get_weights()[0] # For the final layer
    weight_plot = find.visualize_weights(W, feature_idx, class_names, show = False)
    plt.show()

def update_action(action):
    global feature_radio_button, is_feature_enabled
    feature_idx = feature_radio_button.value
    if action == 'enabled':
        print('enable')
        is_feature_enabled[feature_idx] = True
    elif action == 'disabled':
        print('disable')
        is_feature_enabled[feature_idx] = False
    else:
        assert False
    
def show_action_panel(feature_idx):
    global action_radio_button
    action_radio_button.description = f'Current status of feature {feature_idx}:'
    action_radio_button.value = 'enabled' if is_feature_enabled[feature_idx] else 'disabled'
    
feature_radio_button = wgt.RadioButtons(options=list(range(30)), value=0, description='Feature:', disabled=False)
action_radio_button = wgt.RadioButtons(options=['enabled', 'disabled'],
    value = 'enabled' if is_feature_enabled[feature_radio_button.value] else 'disabled',
    description = f'Current status of feature {feature_radio_button.value}:',
    style = {'description_width': 'initial'},
    disabled = False
)

wgt.interactive_output(update_action, {'action':action_radio_button})
out = wgt.interactive_output(update_screen, {'feature_idx':feature_radio_button})

In [15]:
# 7. Get input from a human 
# Please investigate word clouds of these features and disable some irrelevant features using the radio-buttons under the bar plot.
# In particular, to reduce the model's bias, we should disable the features which detect gender information rather than occupation.
# Once you are happy, please then proceed to the next cell.
display(wgt.HBox([feature_radio_button, wgt.VBox([out, action_radio_button])]))

HBox(children=(RadioButtons(description='Feature:', options=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…

In [20]:
print(f"Total: {len(is_feature_enabled)} features \nEnabled: {sum(is_feature_enabled)} features \nDisabled: {len(is_feature_enabled)-sum(is_feature_enabled)} features")
print(f"Disabled features: {[i for i,s in enumerate(is_feature_enabled) if not s]}")

Total: 30 features 
Enabled: 19 features 
Disabled: 11 features
Disabled features: [0, 3, 4, 5, 8, 9, 13, 17, 18, 20, 21]


## Creating and fine-tuning an improved classifier

In [21]:
# 8. Create an improved model
# 8.1 Copy the existing CNN features
model_improved = find.get_CNN_model(vocab_size, EMBEDDING_DIM, embedding_matrix, MAXLEN, class_names, 
                                    FILTERS, trainable_filters = False)
model_improved.set_weights(model.get_weights()) 

# 8.2 Apply human decisions to disable irrelevant features
for idx, enable in enumerate(is_feature_enabled):
    if not enable:
        model_improved.layers[-1].disable_mask(idx)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_3 (InputLayer)            (None, None)         0                                            
__________________________________________________________________________________________________
embedding_3 (Embedding)         (None, 150, 300)     120000600   input_3[0][0]                    
__________________________________________________________________________________________________
conv1d_10 (Conv1D)              (None, 149, 10)      6010        embedding_3[0][0]                
__________________________________________________________________________________________________
conv1d_11 (Conv1D)              (None, 148, 10)      9010        embedding_3[0][0]                
__________________________________________________________________________________________________
conv1d_12 

In [22]:
# 9. Fine-tuning the improved model
history = find.model_train(model_improved, result_path + f'trained_{MODEL_ARCH}_improved.h5', X_train_1, data_1['y_train'], X_validate_1, data_1['y_validate'], BATCH_SIZE, epochs = 300)

Train on 3832 samples, validate on 1277 samples
Epoch 1/300
 - 0s - loss: 0.1082 - acc: 0.9564 - val_loss: 0.1360 - val_acc: 0.9397

Epoch 00001: val_loss improved from inf to 0.13604, saving model to trained_models/Biosbias2_CNN_20220522173805/trained_CNN_improved.h5
Epoch 2/300
 - 0s - loss: 0.0617 - acc: 0.9919 - val_loss: 0.1193 - val_acc: 0.9538

Epoch 00002: val_loss improved from 0.13604 to 0.11925, saving model to trained_models/Biosbias2_CNN_20220522173805/trained_CNN_improved.h5
Epoch 3/300
 - 0s - loss: 0.0569 - acc: 0.9940 - val_loss: 0.1182 - val_acc: 0.9546

Epoch 00003: val_loss improved from 0.11925 to 0.11820, saving model to trained_models/Biosbias2_CNN_20220522173805/trained_CNN_improved.h5
Epoch 4/300
 - 0s - loss: 0.0543 - acc: 0.9943 - val_loss: 0.1172 - val_acc: 0.9546

Epoch 00004: val_loss improved from 0.11820 to 0.11718, saving model to trained_models/Biosbias2_CNN_20220522173805/trained_CNN_improved.h5
Epoch 5/300
 - 0s - loss: 0.0518 - acc: 0.9943 - val_los

In [23]:
# 10. Evaluate the improved model
if not GENDER_BIAS:
    find.evaluate_all(model_improved, class_names, BATCH_SIZE, X_test_1, y_test_1, X_test_2, y_test_2, X_test_3, y_test_3, result_path = result_path, model_name = 'debugged')
else:
    find.evaluate_all_gender(model_improved, class_names, BATCH_SIZE, X_test_1, y_test_1, gender_test_1, X_test_2, y_test_2, gender_test_2, result_path = result_path, model_name = 'debugged')

Evaluate with the original test set:
{'per_class': {0: {'all_positive': 717,
                   'all_true': 731,
                   'class_f1': 0.9530386740331491,
                   'class_name': 'surgeon',
                   'class_precision': 0.9623430962343096,
                   'class_recall': 0.9439124487004104,
                   'true_positive': 690},
               1: {'all_positive': 561,
                   'all_true': 547,
                   'class_f1': 0.9386281588447652,
                   'class_name': 'nurse',
                   'class_precision': 0.9269162210338681,
                   'class_recall': 0.9506398537477148,
                   'true_positive': 520}},
 'total': {'accuracy': 0.94679186228482,
           'macro_f1': 0.9459510539058925,
           'macro_precision': 0.9446296586340888,
           'macro_recall': 0.9472761512240626,
           'micro_f1': 0.94679186228482,
           'micro_precision': 0.94679186228482,
           'micro_recall': 0.9467918622848