# Threshold Analysis

## Reasoning for Adjusting Dataset

As mentioned in [this notebook](), the CrowS-Pairs dataset has been criticised.

While fixing the structure and intention of some of the sentence pairs, I also found there to be another flaw in the theory behind the biasness scores provided by this dataset. 

The theory of this score, at a surface level, is logical. The language models assign a probability to any given sentence, which is relative to how likely the sequence of words are to appear in that sentence together. If a language model is assigns a higher probability to 'The doctor and his patient' than to 'The doctor and her patient', it is fair to assume that the model holds some stereotypical views on sexism.

However, the model is unable to deal with sentences that score very similar probabilities or times when a model assigns a much higher probability to the non-stereotype sentence.

The introduction of thresholds was required to ensure that the true biasness of a language model could be found accurately.

## Background

In implementing the code provided by the [CrowS-Pairs GitHub](https://github.com/nyu-mll/crows-pairs), I discovered two edge cases for which CrowS-Pairs does not handle well. These edge cases occur when language models either assign an almost identical probability to each sentence, or if they assign a much higher probability to the non-stereotyped sentence. 

This dataset is making language models choose one sentence or the other, when the ideal scenario is that the model assigns both with equal probability. Two sentences being assigned the exact same probability is hugely unlikely as the formulas used to calculate these probabilities are so large, resulting probabilities being so miniscule that they have to be stored as log probabilities. This means that there could be some sentences that could be given probabilities within 0.001% of each other, but the CrowS-Pairs theory will mean that if the stereotype sentence scored slightly higher, it will be considered stereotype behaviour from the language model.

The CrowS-Pairs benchmark also does not take into account what happens when a model assigns a substantially higher probability to the non-stereotype sentence. For example, if a language model has the two sentences: 'The doctor and his patient' and 'The doctor and her patient', the stereotypical behavior would be for the model to choose the male pronoun. However, how would CrowS-Pairs deal with language models that score a much higher probability for the female pronoun, it is not following a stereotype but it is promoting unfair gender associations.

To counteract these issues, I have implemented a series of thresholds that are applied on top of CrowS-Pairs dataset. This allows for sentences that are very similar in probabilities to be considered neutral. This new neutral measure is the true metric for how un-biased a language model is, as it does not include the promotion of unjust associations even if they're not a stereotype.

## Importing Relevant Packages/Datasets and Assigning Functions

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

from functions import *
from variables import *

In [3]:
master_dataframe = pd.read_csv('master_output_file.csv')

In [4]:
def get_list_of_scores(list_of_models, score_type, threshold=0, bias_type=None):
    
    scores = []
    
    for model in list_of_models:
        
        # Raising an exception if there is an unrecognised model
        if model not in all_models:
            raise ValueError("List of models provided contains unrecognised model")
        
        # Raising an error if there is an unrecognised score type
        if score_type not in score_types:
            if score_type + '_score' in score_types:
                score_type += '_score'
            else:
                raise ValueError("Score type must be 'neutral', 'bias', 'nonbias', 'stereotype' or 'antistereotype'")
                
        # Rasing an error if there is an unrecognised bias_type
        if bias_type:
            if bias_type not in bias_types:
                raise ValueError("Bias type must be one of: 'race-color', 'gender', 'socioeconomic', 'nationality', 'religion', 'age', 'sexual-orientation', 'physical-appearance', 'disability'")

        # Filtering the master dataset to only include results obtained by *this* model
        model_dataframe = master_dataframe[master_dataframe['model_name']==model]
        
        # Filtering the model dataframe to only include a particular bias type if specified
        if bias_type:
            model_dataframe = model_dataframe[model_dataframe['bias_type']==bias_type]

        # Getting the score for all models
        score = calculate_all_scores(model_dataframe, threshold)[score_type]
        scores.append(score)
        
    return scores

In [5]:
def get_list_of_all_scores(list_of_models, threshold=0, bias_type=None):
    
    scores = {
        'neutral_score' : [],
        'bias_score' : [],
        'nonbias_score' : [],
        'stereotype_score' : [],
        'antistereotype_score' : []
    }
    
    for model in list_of_models:
        
        # Raising an exception if there is an unrecognised model
        if model not in all_models:
            raise ValueError("List of models provided contains unrecognised model")

        # Rasing an error if there is an unrecognised bias_type
        if bias_type:
            if bias_type not in bias_types:
                raise ValueError("Bias type must be one of: 'race-color', 'gender', 'socioeconomic', 'nationality', 'religion', 'age', 'sexual-orientation', 'physical-appearance', 'disability'")

        # Filtering the master dataset to only include results obtained by *this* model
        model_dataframe = master_dataframe[master_dataframe['model_name']==model]
        
        # Filtering the model dataframe to only include a particular bias type if specified
        if bias_type:
            model_dataframe = model_dataframe[model_dataframe['bias_type']==bias_type]

        # Getting all scores for all models
        all_scores = calculate_all_scores(model_dataframe, threshold)
        scores['neutral_score'].append(all_scores['neutral_score'])
        scores['bias_score'].append(all_scores['bias_score'])
        scores['nonbias_score'].append(all_scores['nonbias_score'])
        scores['stereotype_score'].append(all_scores['stereotype_score'])
        scores['antistereotype_score'].append(all_scores['antistereotype_score'])
        
    return scores

In [6]:
def get_bias_type_scores(model, threshold=0.05):
    
    model_dataframe = master_dataframe[master_dataframe['model_name']==model]
    
    scores = {
        'neutral_score' : [],
        'bias_score' : [],
        'nonbias_score' : [],
        'stereotype_score' : [],
        'antistereotype_score' : []
    }
    
    for bias_type in bias_types:
        bias_model_dataframe = model_dataframe[model_dataframe['bias_type']==bias_type]
        all_scores = calculate_all_scores(bias_model_dataframe, threshold)
        scores['neutral_score'].append(all_scores['neutral_score'])
        scores['bias_score'].append(all_scores['bias_score'])
        scores['nonbias_score'].append(all_scores['nonbias_score'])
        scores['stereotype_score'].append(all_scores['stereotype_score'])
        scores['antistereotype_score'].append(all_scores['antistereotype_score'])
    
    return scores

In [11]:
def slider_chart(list_of_models=all_models, score_type='neutral_score', bias_type=None, step_size=0.01, threshold_range=0.4):

    fig = go.Figure()

    # Add traces, one for each slider step
    for step in np.arange(0, threshold_range + step_size, step_size):

        # Assigning the Threshold
        threshold = step

        # Calculating Neutral Scores at Threshold
        scores = get_list_of_scores(list_of_models, score_type, threshold, bias_type)

        fig.add_trace(
            go.Bar(
                visible=False,
                name='Score at Threshold ' + str(round(threshold*100, 2)), 
                x=[display_model_names[model] for model in list_of_models], 
                y=[score/100 for score in scores]))

    # Make 10th trace visible
    fig.data[1].visible = True

    # Create and add slider
    steps = []
    label_format = "{:.0%}"
    for i in range(len(fig.data)):
        step = dict(
            method="update",
            args=[{"visible": [False] * len(fig.data)},
                  {"title": display_score_types[score_type] + "s for " + str(len(list_of_models)) + " Models at Threshold:" + label_format.format(i * step_size)}], # layout attribute
            label = label_format.format(i * step_size))
        step["args"][0]["visible"][i] = True  # Toggle i'th trace to "visible"
        steps.append(step)

    sliders = [dict(
        active=1,
        currentvalue={"prefix": "Threshold: "},
        pad={"t": 150},
        steps=steps
    )]

    fig.update_layout(
        sliders=sliders,
        title= display_score_types[score_type] + "s for " + str(len(list_of_models)) + " Models at Various Thresholds",
        yaxis_title="Scores",
        yaxis_tickformat = ".0%",
        xaxis_title="Language Models"
    )

    fig.update_yaxes(range=[0,1])

    fig.show()

# Graphing Slider Bar Charts for Each Metric

### 1: Neutral Score

In [12]:
slider_chart(score_type='neutral_score')

### 2: Bias Score

In [16]:
slider_chart(score_type='bias_score')

### 3: Nonbias Score

In [17]:
slider_chart(score_type='nonbias_score')