# Integrated gradients and attribution score comparison for BERT finetuned 

This notebook contains the used code for computing the attribution scores and visualisation with the integrated gradients method for the BERT finetuned classifier. The following models can be analysed for bert finetuned: nic, nic+ and nic+equalizer. For these models, we used the same test sets (image id's) where all models were trained on seed 0.

The **weights and prediction files** (equal to the test set) for the BERT fine-tuned models can be extracted from: https://drive.google.com/drive/folders/1IulefU8uuaS-RcA8hOT7kFXj2oBb-hN-?usp=sharing and have to be placed in the local folder bias_data_for_ig/BERT_ft/.

The visualisations might not look as expected in a local jupyter notebook so google colab might be necessary for visualisation. If using google colab, it's possible to add a shortcut to the drive folder and mount your drive by uncommenting the code below. Finally, the path should be changed when loading in the models. 

Credits to Ruben Winastwan for providing a tutorial on how to use Captum for BERT Models. https://towardsdatascience.com/interpreting-the-prediction-of-bert-model-for-text-classification-5ab09f8ef074

In [1]:
%%capture
#!pip install transformers
#!pip install captum

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
import torch.nn.utils
import numpy as np
import pickle
import pandas as pd
from captum.attr import LayerIntegratedGradients
from captum.attr import visualization as viz


import transformers as tf
from transformers import BertTokenizer
from transformers import PYTORCH_PRETRAINED_BERT_CACHE
from transformers import BertConfig, WEIGHTS_NAME, CONFIG_NAME
from transformers import AdamW, get_linear_schedule_with_warmup

import torch.utils.data as data
from transformers import BertModel
from transformers import BertPreTrainedModel



In [3]:
# google.colab import drive
#drive.mount('/content/drive')

Mounted at /content/drive


# Tokenization

In [7]:
# Instantiate tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

# Model architecture from model.py file of Hirota et al. 

(https://github.com/rebnej/lick-caption-bias )

In [5]:
class BERT_Classifier(nn.Module):

    def __init__(self, args, cls_hid_dim):

        super(BERT_Classifier, self).__init__()
        hid_size = 256

        mlp = []
        mlp.append(nn.BatchNorm1d(cls_hid_dim))
        mlp.append(nn.Linear(cls_hid_dim, hid_size, bias=True))
        mlp.append(nn.BatchNorm1d(hid_size))
        mlp.append(nn.LeakyReLU())
        mlp.append(nn.Linear(hid_size, 2, bias=True))

        self.mlp = nn.Sequential(*mlp)

    def forward(self, input):
        return self.mlp(input)



class BERT_GenderClassifier(nn.Module):

    def __init__(self, args, tokenizer):
        
        super(BERT_GenderClassifier, self).__init__()
        self.lang_model = BertModel.from_pretrained('bert-base-uncased') 
        self.classifier = BERT_Classifier(args, 768)
        self.tokenizer = tokenizer


    def forward(self, input_ids, attention_mask, token_type_ids=None):

        """Forward

        return: logits, not probs
        """
      
        outputs = self.lang_model(input_ids, attention_mask=attention_mask, output_hidden_states=True)
        last_hidden_states = outputs.last_hidden_state
        cls_hidden_state = last_hidden_states[:, 0, :] #(batchsize, hid_size)
        logits = self.classifier(cls_hidden_state)
  
        return logits

# Load caption model's Parameters 

All models are trained with seed 0. 

Load the NIC+Equalizer trained model. 


In [None]:
args = None
nic_equalizer_model = BERT_GenderClassifier(args, tokenizer)
nic_equalizer_model.load_state_dict(torch.load('/bias_data_for_ig/BERT_ft/BERT_ft_nic_equalizer_seed0.pt', map_location=torch.device('cpu')))
nic_equalizer_model.eval()

Load the NIC+ trained model

In [None]:
nic_plus_model = BERT_GenderClassifier(args, tokenizer)
nic_plus_model.load_state_dict(torch.load('/bias_data_for_ig/BERT_ft/BERT_ft_nic_plus_seed0.pt', map_location=torch.device('cpu')))
nic_plus_model.eval()

Load the NIC trained model

In [None]:
nic_model = BERT_GenderClassifier(args, tokenizer)
nic_model.load_state_dict(torch.load('/bias_data_for_ig/BERT_ft/BERT_ft_nic_seed0.pt', map_location=torch.device('cpu')))
nic_model.eval()

# Define model input and output

The input for integrated gradients is the embedding of the captions. Embeddings are necessary since text is non-differentiable. The outputs are the softmax probabilities for the male and female output label. 

In [11]:
# Define nic+ model output
def nic_plus_model_output(inputs, attention_mask = None):
  return torch.softmax(nic_plus_model(inputs, attention_mask = attention_mask), dim=-1)

# Define nic+ model input
nic_plus_model_input = nic_plus_model.lang_model.embeddings

In [12]:
# Define NIC+equalizer model output
def nic_equalizer_model_output(inputs, attention_mask = None):
  return torch.softmax(nic_equalizer_model(inputs, attention_mask = attention_mask), dim=-1)
  
# Define caption model input
nic_equalizer_model_input = nic_equalizer_model.lang_model.embeddings


In [13]:
# Define nic model output
def nic_model_output(inputs, attention_mask = None):
  return torch.softmax(nic_model(inputs, attention_mask = attention_mask), dim=-1)

# Define nic model input
nic_model_input = nic_model.lang_model.embeddings


# Instantiate Integrated Gradients Method

In [14]:
NIC_plus_lig = LayerIntegratedGradients(nic_plus_model_output, nic_plus_model_input)
NIC_equalizer_lig = LayerIntegratedGradients(nic_equalizer_model_output, nic_equalizer_model_input)
NIC_lig = LayerIntegratedGradients(nic_model_output, nic_model_input)

# Construct Original and Baseline Input

Same encoding steps are used as in bias_dataset.py.
We use the padding token as baseline since padding is expected to be neutral in terms of prediction. 

In [23]:
def new_construct_input_and_baseline(text):
    '''
    Method to preprocess given text and create baseline sentences.
    '''

    # Hirota default max_length = 64
    max_length = 64
    baseline_token_id = tokenizer.pad_token_id 
    sep_token_id = tokenizer.sep_token_id 
    cls_token_id = tokenizer.cls_token_id 

    # remove [CLS] and [SEP] from prediction string
    text = text.split()[1:-1]
    # method in bias_dataset.py of Hirota et al.
    encoded_dict = tokenizer.encode_plus(text, max_length=max_length, padding='max_length', return_attention_mask=True, return_tensors='pt', truncation=True, add_special_tokens=True)

    text_ids = encoded_dict['input_ids']
    attention_mask = encoded_dict['attention_mask']

    text_ids = text_ids.view(max_length)
    attention_mask = attention_mask.view(max_length)
 
    # token list for visualisation
    token_list = tokenizer.convert_ids_to_tokens(text_ids)

    
    text_ids = text_ids.unsqueeze(0)

    # baseline is [CLS] + padding tokens * length of tekst + [SEP] + padding (to match max_length)
    baseline_input_ids = [cls_token_id] + [baseline_token_id] * len(text) + [sep_token_id] 
    baseline_input_ids += [baseline_token_id] * (max_length - len(baseline_input_ids))
    baseline_input_ids = torch.tensor([baseline_input_ids], device='cpu')
   
    return text_ids, baseline_input_ids, token_list, attention_mask

# Compute Attribution for Each Token

In [15]:
def summarize_attributions(attributions):
    # attribution per token is sum over all embedding dimensions
    attributions = attributions.sum(dim=-1).squeeze(0)
    # normalized
    attributions = attributions / torch.norm(attributions)
    
    return attributions 

In [16]:
def interpret_nic_equalizer_text(text, true_class, attributions_dicc):
    '''
    Works for NIC+equalizer model
    Method to calculate attributions on given text.
    Also requires a dictionary containing all previous attributes for analysis.
    '''
    #Process input text and create corresponding baseline
    input_ids, baseline_input_ids, all_tokens, attention_mask= new_construct_input_and_baseline(text)
    length = len(text.split())   
    attention_mask = attention_mask.unsqueeze(0)
    #Get prediction of NIC+equalizer
    predicted = torch.argmax(torch.softmax(nic_equalizer_model(input_ids, attention_mask), dim=-1))
    #Compute attributions
    attributions, delta = NIC_equalizer_lig.attribute(inputs= input_ids[0][:length].unsqueeze(0),
                                    baselines= baseline_input_ids[0][:length].unsqueeze(0),
                                    return_convergence_delta=True,
                                    n_steps = 50,
                                    additional_forward_args=(attention_mask[0][:length].unsqueeze(0)),
                                    target=1
                                    )
    attributions_sum = summarize_attributions(attributions)
    #Visualisation method 
    score_vis = viz.VisualizationDataRecord(
                        word_attributions = attributions_sum[:length],
                        pred_prob = torch.max(torch.softmax(nic_equalizer_model(input_ids, attention_mask), dim=-1)),
                        pred_class = torch.argmax(torch.softmax(nic_equalizer_model(input_ids, attention_mask), dim=-1)),   
                        true_class = true_class,
                        attr_class = text,
                        attr_score = attributions_sum.sum(),       
                        raw_input_ids = all_tokens[:length],
                        convergence_score = delta)
    #vis_data_records_caption.append(score_vis)
    viz.visualize_text([score_vis])


    #If prediction is corrected add attribution to the dictionary for analysis
    if true_class == predicted:
      attributions_dicc[str(true_class)].append(np.abs(float(attributions_sum.sum())))
      # print("attribution dict NIC+Equalizer",attributions_dicc)

In [17]:
def interpret_NIC_plus_text(text, true_class, attributions_dicc):
    '''
    Works for NIC+ model
    Method to calculate attributions on given text.
    Also requires a dictionary containing all previous attributes for analysis.
    '''
    input_ids, baseline_input_ids, all_tokens, attention_mask = new_construct_input_and_baseline(text)
    length = len(text.split()) 
    attention_mask = attention_mask.unsqueeze(0)
    predicted = torch.argmax(torch.softmax(nic_plus_model(input_ids, attention_mask), dim=-1))
    attributions, delta = NIC_plus_lig.attribute(inputs= input_ids[0][:length].unsqueeze(0),
                                    baselines= baseline_input_ids[0][:length].unsqueeze(0),
                                    return_convergence_delta=True,
                                    additional_forward_args=(attention_mask[0][:length].unsqueeze(0)),
                                    n_steps = 50,
                                    target=1,
                                    )
    attributions_sum = summarize_attributions(attributions)
    length = len(text.split())   
    score_vis = viz.VisualizationDataRecord(
                        word_attributions = attributions_sum[:length],
                        pred_prob = torch.max(torch.softmax(nic_plus_model(input_ids, attention_mask), dim=-1)),
                        pred_class = torch.argmax(torch.softmax(nic_plus_model(input_ids, attention_mask), dim=-1)),
                        true_class = true_class,
                        attr_class = text,
                        attr_score = attributions_sum.sum(),       
                        raw_input_ids = all_tokens[:length],
                        convergence_score = delta)
    viz.visualize_text([score_vis])
    if true_class == predicted:
      attributions_dicc[str(true_class)].append(np.abs(float(attributions_sum.sum())))
      # print("attribution dict NIC+",attributions_dicc)

In [18]:
def interpret_NIC_text(text, true_class, attributions_dicc):
    '''
    Works for NIC model
    Method to calculate attributions on given text.
    Also requires a dictionary containing all previous attributes for analysis.
    '''
    input_ids, baseline_input_ids, all_tokens, attention_mask = new_construct_input_and_baseline(text)
    length = len(text.split()) 
    attention_mask = attention_mask.unsqueeze(0)
    predicted = torch.argmax(torch.softmax(nic_model(input_ids, attention_mask), dim=-1))
    attributions, delta = NIC_lig.attribute(inputs= input_ids[0][:length].unsqueeze(0),
                                    baselines= baseline_input_ids[0][:length].unsqueeze(0),
                                    return_convergence_delta=True,
                                    additional_forward_args=(attention_mask[0][:length].unsqueeze(0)),
                                    n_steps = 50,
                                    target=1,
                                    )
    attributions_sum = summarize_attributions(attributions)
    length = len(text.split())   
    score_vis = viz.VisualizationDataRecord(
                        word_attributions = attributions_sum[:length],
                        pred_prob = torch.max(torch.softmax(nic_model(input_ids, attention_mask), dim=-1)),
                        pred_class = torch.argmax(torch.softmax(nic_model(input_ids, attention_mask), dim=-1)),
                        true_class = true_class,
                        attr_class = text,
                        attr_score = attributions_sum.sum(),       
                        raw_input_ids = all_tokens[:length],
                        convergence_score = delta)
    viz.visualize_text([score_vis])
    if true_class == predicted:
      attributions_dicc[str(true_class)].append(np.abs(float(attributions_sum.sum())))
      # print("attribution dict NIC+",attributions_dicc)

# Interpret Test Caption

Create dictionaries storing all attributions scores for female and male prediction for analysis

In [20]:
#1: female class 0: male class
attributions_nic_plus_model = {
   '1':[],
   '0':[],
}

attributions_equalizer_model = {
   '1':[],
   '0':[],
}

attributions_nic_model = {
   '1':[],
   '0':[],
}

In [21]:
predictions_nic_equalizer= pickle.load(open('/content/drive/MyDrive/bias_project/BERT_ft/BERT_ft_predictions_nicequalizer_seed0.pkl', 'rb'))
predictions_nic_plus = pickle.load(open('/content/drive/MyDrive/bias_project/BERT_ft/BERT_ft_predictions_nic_plus_seed0.pkl', 'rb'))
predictions_nic = pickle.load(open('/content/drive/MyDrive/bias_project/BERT_ft/BERT_ft_predictions_nic_seed0.pkl', 'rb'))

### **Warning**: the visualisations of feature attributions below might not look as expected in a local jupyter notebook. Use of Google colab is recommended. 



# NIC+equalizer examples


In [30]:
num_examples = 5
count = 0
for entry in predictions_nic_equalizer:
    text = entry['input_sent']
    true_class = entry['target'].item()
    interpret_nic_equalizer_text(text, true_class, attributions_equalizer_model)
    count += 1
    if count == num_examples:
      break

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.78),[CLS] a [MASK] sitting on a bed with a laptop . [SEP],1.04,[CLS] a [MASK] sitting on a bed with a laptop . [SEP]
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.93),[CLS] a [MASK] sitting on a bed with a teddy bear . [SEP],2.19,[CLS] a [MASK] sitting on a bed with a teddy bear . [SEP]
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,0 (0.95),[CLS] a train on a track near a platform . [SEP],-1.65,[CLS] a train on a track near a platform . [SEP]
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
0.0,0 (0.64),[CLS] a group of people riding on the backs of elephants . [SEP],-0.85,[CLS] a group of people riding on the backs of elephants . [SEP]
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.84),[CLS] a [MASK] brushing [MASK] teeth in a bathroom . [SEP],1.49,[CLS] a [MASK] brushing [MASK] teeth in a bathroom . [SEP]
,,,,


In [31]:
num_examples = 5
count = 0
for entry in predictions_nic_plus:
    text = entry['input_sent']
    true_class = entry['target'].item()
    interpret_NIC_plus_text(text, true_class, attributions_nic_plus_model)
    count += 1
    if count == num_examples:
      break

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.94),[CLS] a [MASK] sitting on a bed in a room . [SEP],1.47,[CLS] a [MASK] sitting on a bed in a room . [SEP]
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.72),[CLS] a [MASK] sitting on a couch holding a nintendo wii controller . [SEP],0.9,[CLS] a [MASK] sitting on a couch holding a nintendo wii controller . [SEP]
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,0 (0.64),[CLS] a [MASK] standing next to a train on a train track . [SEP],-1.75,[CLS] a [MASK] standing next to a train on a train track . [SEP]
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
0.0,0 (0.52),[CLS] a group of people riding on the backs of elephants . [SEP],-0.52,[CLS] a group of people riding on the backs of elephants . [SEP]
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.87),[CLS] a [MASK] in a bathroom brushing [MASK] teeth . [SEP],1.42,[CLS] a [MASK] in a bathroom brushing [MASK] teeth . [SEP]
,,,,


In [32]:
num_examples = 5
count = 0
for entry in predictions_nic:
    text = entry['input_sent']
    true_class = entry['target'].item()
    interpret_NIC_text(text, true_class, attributions_nic_model)
    count += 1
    if num_examples == count:
      break

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.63),[CLS] a [MASK] sitting on a couch with a laptop computer . [SEP],0.38,[CLS] a [MASK] sitting on a couch with a laptop computer . [SEP]
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.82),[CLS] a [MASK] is holding a banana in [MASK] hands . [SEP],1.58,[CLS] a [MASK] is holding a banana in [MASK] hands . [SEP]
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,0 (0.95),[CLS] a train is pulling into a train station . [SEP],-1.89,[CLS] a train is pulling into a train station . [SEP]
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
0.0,0 (0.57),[CLS] a [MASK] and a [MASK] are riding an elephant [SEP],-0.49,[CLS] a [MASK] and a [MASK] are riding an elephant [SEP]
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.86),[CLS] a [MASK] is holding a toothbrush in [MASK] mouth . [SEP],1.33,[CLS] a [MASK] is holding a [UNK] in [MASK] mouth . [SEP]
,,,,


# Calculating totoal attribution scores

The methods below visualise the results of the attribution scores for each caption model discussed. These results were used to obtain the average attribution scores for bert-ft. Visible in table 9 in the appendix.

In [40]:
def visualise_caption_model(attributions_dict, model_predictions, examples=100):
  """
  Compute attributions nic+equalizer
  """ 
  
  i = 0
  count = 0
  for entry in model_predictions:
    i += 1
    image = entry['image_id'].item()
    print("MSCOCO image id of caption below", image)
    url = "https://cocodataset.org/#explore?id=" + str(image)
    print(url)
    model_caption = entry['input_sent']
    true_class = entry['target'].item()
    input_ids, baseline_input_ids, all_tokens, attention_mask = new_construct_input_and_baseline(model_caption)
    attention_mask = attention_mask.unsqueeze(0)
    interpret_nic_equalizer_text(model_caption, true_class, attributions_dict)
    if i == examples:
     break
    

def visualise_nic_plus_model(attributions_dict, model_predictions, examples=100):
  """
  Compute attributions nic+
  """ 
  
  i = 0
  count = 0
  for entry in model_predictions:
    i += 1
    image = entry['image_id'].item()
    print("MSCOCO image id of caption below", image)
    url = "https://cocodataset.org/#explore?id=" + str(image)
    print(url)
    model_caption = entry['input_sent']
    true_class = entry['target'].item()
    input_ids, baseline_input_ids, all_tokens, attention_mask = new_construct_input_and_baseline(model_caption)
    attention_mask = attention_mask.unsqueeze(0)
    interpret_NIC_plus_text(model_caption, true_class, attributions_dict)
    if i == examples:
     break

def visualise_nic_model(attributions_dict, model_predictions, examples=100):
  """
  Compute attributions nic
  """ 
  
  i = 0
  count = 0
  for entry in model_predictions:
    i += 1
    model_caption = entry['input_sent']
    true_class = entry['target'].item()
    image = entry['image_id'].item()
    print("MSCOCO image id of caption below", image)
    url = "https://cocodataset.org/#explore?id=" + str(image)
    print(url)
    input_ids, baseline_input_ids, all_tokens, attention_mask = new_construct_input_and_baseline(model_caption)
    attention_mask = attention_mask.unsqueeze(0)
    interpret_NIC_text(model_caption, true_class, attributions_dict)
    if i == examples:
     break



In [42]:
#Run and save dictionary to pickle NIC+equalizer  
print("###############################################################################")

print('NIC+equalizer model')
visualise_caption_model(attributions_equalizer_model, predictions_nic_equalizer, examples=3)
with open('nic_plus_equalizer_attributions.pickle', 'wb') as handle:
    pickle.dump(attributions_equalizer_model, handle, protocol=pickle.HIGHEST_PROTOCOL)

print("###############################################################################")
print('NIC+ model')

#Run and save dictionary to pickle NIC+ 
visualise_nic_plus_model(attributions_nic_plus_model, predictions_nic_plus, examples=3)
with open('nic_plus_attributions.pickle', 'wb') as handle:
    pickle.dump(attributions_nic_plus_model, handle, protocol=pickle.HIGHEST_PROTOCOL)

print("###############################################################################")
print('NIC model')
#Run and save dictionary to pickle NIC
visualise_nic_model(attributions_nic_model, predictions_nic, examples=3)
with open('nic_attributions.pickle', 'wb') as handle:
    pickle.dump(attributions_nic_model, handle, protocol=pickle.HIGHEST_PROTOCOL)

###############################################################################
NIC+equalizer model
MSCOCO image id of caption below 238799
https://cocodataset.org/#explore?id=238799


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.78),[CLS] a [MASK] sitting on a bed with a laptop . [SEP],1.04,[CLS] a [MASK] sitting on a bed with a laptop . [SEP]
,,,,


MSCOCO image id of caption below 577277
https://cocodataset.org/#explore?id=577277


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.93),[CLS] a [MASK] sitting on a bed with a teddy bear . [SEP],2.19,[CLS] a [MASK] sitting on a bed with a teddy bear . [SEP]
,,,,


MSCOCO image id of caption below 172021
https://cocodataset.org/#explore?id=172021


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,0 (0.95),[CLS] a train on a track near a platform . [SEP],-1.65,[CLS] a train on a track near a platform . [SEP]
,,,,


###############################################################################
NIC+ model
MSCOCO image id of caption below 238799
https://cocodataset.org/#explore?id=238799


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.94),[CLS] a [MASK] sitting on a bed in a room . [SEP],1.47,[CLS] a [MASK] sitting on a bed in a room . [SEP]
,,,,


MSCOCO image id of caption below 577277
https://cocodataset.org/#explore?id=577277


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.72),[CLS] a [MASK] sitting on a couch holding a nintendo wii controller . [SEP],0.9,[CLS] a [MASK] sitting on a couch holding a nintendo wii controller . [SEP]
,,,,


MSCOCO image id of caption below 172021
https://cocodataset.org/#explore?id=172021


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,0 (0.64),[CLS] a [MASK] standing next to a train on a train track . [SEP],-1.75,[CLS] a [MASK] standing next to a train on a train track . [SEP]
,,,,


###############################################################################
NIC model
MSCOCO image id of caption below 238799
https://cocodataset.org/#explore?id=238799


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.63),[CLS] a [MASK] sitting on a couch with a laptop computer . [SEP],0.38,[CLS] a [MASK] sitting on a couch with a laptop computer . [SEP]
,,,,


MSCOCO image id of caption below 577277
https://cocodataset.org/#explore?id=577277


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.82),[CLS] a [MASK] is holding a banana in [MASK] hands . [SEP],1.58,[CLS] a [MASK] is holding a banana in [MASK] hands . [SEP]
,,,,


MSCOCO image id of caption below 172021
https://cocodataset.org/#explore?id=172021


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,0 (0.95),[CLS] a train is pulling into a train station . [SEP],-1.89,[CLS] a train is pulling into a train station . [SEP]
,,,,


# Average attribution scores towards male and female.

This calculation is based on the number of captions (examples) calculated above. 

In [54]:
print('NIC+equalizer: average male attribution score', round(np.mean(attributions_equalizer_model['0']), 3))
print('NIC+equalizer: average female male attribution score', round(np.mean(attributions_equalizer_model['1']), 3))


NIC+equalizer: average male attribution score 0.852
NIC+equalizer: average female male attribution score 1.6


In [53]:
print('NIC+: average male attribution score', round(np.mean(attributions_nic_plus_model['0']), 3))
print('NIC+: average female male attribution score', round(np.mean(attributions_nic_plus_model['1']), 3))


NIC+: average male attribution score 0.517
NIC+: average female male attribution score 1.218


In [52]:
print('NIC: average male attribution score', round(np.mean(attributions_nic_model['0']), 3))
print('NIC: average female male attribution score', round(np.mean(attributions_nic_model['1']), 3))

NIC: average male attribution score 0.49
NIC: average female male attribution score 0.956
