## Debiasing Word Embeddings - Gender Bias

In this notebook, I will proceed to debias word embeddings as it is described in the paper [1]

**Problem**: Embeddings trained on human-generated corpora have been demonstrated to inherit strong gender stereotypes that reflect social constructs

**Challenge**: Preserve gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence

[1] *Bolukbasi, T., Chang, K. W., Zou, J., Saligrama, V., & Kalai, A. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in Neural Information Processing Systems, 4356–4364 *

### Steps in this notebook

- [Load Glove Pre-trained vectors](#step1)
- [Solving word analogies](#step2)
- [Check gender bias in word vectors](#step3)
- [Bias neutralization](#step4)

### Libraries

In [23]:
# Importing important Libraries.
import numpy as np

## Step 1 - Load Glove pre-trained vectors

<a id='step1'>Load Glove Pre-trained vectors </a>

The GloVe word embeddings are taken from Jeffrey Pennington, Richard Socher, and Christopher D. Manning. (https://nlp.stanford.edu/projects/glove/).

We take a pre-trained word vector of Wikipedia 2014. This model is used to map word to vectors with semantic meaning

#### Auxiliar functions

In [35]:
# This function returns words and python map/dictionary containing word embeddings.
def glove_vecs(glove_file):
    """Function Parameters: Path to glove vector.txt file"""

    with open(glove_file, encoding="utf8") as f:
        # Bag of words
        words = set()
        # Dictionary to store word:vector pair
        word_to_vec_map = {}
        
        # Reading file line by line separated with space in between. 
        for line in f:
            try:
                # strip to remove spaces | splits the string in separate elements with a space delimiter
                line = line.strip().split() 
                # First element is the word 
                curr_word = line[0]
                # Adding the word to the set words
                words.add(curr_word)
                # Adding current word and it's vector in the dictionary.
                word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)
            except Exception:
                pass  # bad formatted line
            
    return words, word_to_vec_map

### Read Glove pre-trained vectors 

In [50]:
words, word_to_vec_map = glove_vecs('datasets/glove.6B.50d.txt') # Read Glove pre-trained vectors as Python dictonary word_to_vec_map

The following have been loaded:
- `words`: set of words in the vocabulary.
- `word_to_vec_map`: dictionary mapping words to their GloVe vector representation.

## Step 2 - Solving word analogies

<a id='step2'>Solving word analogies </a>

**Problem**: Given the following analogy (eg. man:woman :: boy: ?.) Find the best word from the dictionary that can fit in place of '?'.

$$
similarity=cos(\theta) = \frac{\vec{a} \cdot \vec{b}}{||\vec{a}|| ||\vec{b}||}
$$

where $\vec{a}$ and $\vec{b}$ are vectors whose similarity is returned.  Cosine similarity is often not a perfect distance metric, as it doesn't work on negative data, and violates the triangle inequality.  However for certain problems (as shown below) it is a solid choice

#### Auxiliar functions

In [38]:
# Finding cosine similarity between 2 vectors.
def cosine_similarity(a , b ):
    """Function Parameters: a , b are 2 different vectors whose cosine similarity is to be found."""
    
    cos_similarity = np.dot(a , b) / (np.linalg.norm(a) * np.linalg.norm(b))
        
    return cos_similarity

In [39]:
# Finding distance between 2 vectors using L-2 Norm.
def L2distance(a , b):
    """Function Parameters: a , b are 2 different vectors whose cosine similarity is to be found."""
    
    l2_distance = np.linalg.norm(a - b)
    
    return l2_distance

In [40]:
def word_analogy(a, b, c, word_to_vec_map):
    """Function Parameters: a , b , c are 3 words
       words_to_vec_map: dictionary of word vectors."""
    
    # Converting a,b and c to lower case.
    a = a.lower()
    b = b.lower()
    c = c.lower()
    
    # Finding the vectors for the given words:
    a_vec = word_to_vec_map[a]
    b_vec = word_to_vec_map[b]
    c_vec = word_to_vec_map[c]
    
    # Getting all the words from the dictionary
    words = word_to_vec_map.keys()
    
    # Setting maximum cosine similarity to large negative number
    maximum_cosine_similarity = -500
    
    # Looping over all the words to find the best fit for the analogy
    for w in words:
        
        # Skip a , b and c
        if w in [a,b,c]:
            continue
            
        cos_similarity = cosine_similarity(b_vec - a_vec , word_to_vec_map[w] - c_vec ) 
        
        if cos_similarity > maximum_cosine_similarity:
            # Overiting maximum_cosine_similarity 
            maximum_cosine_similarity = cos_similarity
            # Saving the best word giving maximum_cosine_similarity
            best_word = w
    
    
    return best_word

### Compute word Analogy

In [41]:
word_analogy('italy', 'italian', 'america', word_to_vec_map)

'american'

## Step 3 - Check gender bias in Word Vectors

We look for gender-stereotpical pairs

<a id='step3'>Check gender bias in word vectors </a>

A vector $g = e_{woman}-e_{man}$ is computed, where $e_{woman}$ represents the word vector corresponding to the word *woman*, and $e_{man}$ corresponds to the word vector corresponding to the word *man*. The resulting vector $g$ roughly encodes the concept of "gender". 

In [54]:
x = word_to_vec_map['woman'] - word_to_vec_map['man']
x

array([-0.087144  ,  0.2182    , -0.40986   , -0.03922   , -0.1032    ,
        0.94165   , -0.06042   ,  0.32988   ,  0.46144   , -0.35962   ,
        0.31102   , -0.86824   ,  0.96006   ,  0.01073   ,  0.24337   ,
        0.08193   , -1.02722   , -0.21122   ,  0.695044  , -0.00222   ,
        0.29106   ,  0.5053    , -0.099454  ,  0.40445   ,  0.30181   ,
        0.1355    , -0.0606    , -0.07131   , -0.19245   , -0.06115   ,
       -0.3204    ,  0.07165   , -0.13337   , -0.25068714, -0.14293   ,
       -0.224957  , -0.149     ,  0.048882  ,  0.12191   , -0.27362   ,
       -0.165476  , -0.20426   ,  0.54376   , -0.271425  , -0.10245   ,
       -0.32108   ,  0.2516    , -0.33455   , -0.04371   ,  0.01258   ])

#### Gender specific names

In [61]:
# Let us see similarity between some gender specific names and the vector'x'
names = ['ronaldo' , 'jack', 'marie' , 'priya']
for name in names:
    print(name + ': ' + str(cosine_similarity(word_to_vec_map[name], x)))

print("\nWe see that male names have negative similarity and female names have positive similarity. That's OK because\nthe vector x is woman - man")

ronaldo: -0.31244796850329437
jack: -0.16566299861636427
marie: 0.3155979353960729
priya: 0.17632041839009396
 
We see that male names have negative similarity and female names have positive similarity. That's OK because
the vector x is woman - man


#### Profession specific names

In [62]:
# Let us see similarity between some words that should be non-gender specific.
common_words = ['technology' , 'engineer' , 'doctor','grandfather','grandmother','literature']
for word in common_words:
    print(word + ': ' + str(cosine_similarity(word_to_vec_map[word], x)))

print("\nWe see that words like technology , engineer are inclined towards man while literature is inclined towards woman.")

technology: -0.13193732447554293
engineer: -0.0803928049452407
doctor: 0.11895289410935043
grandfather: 0.02362979845086787
grandmother: 0.38460143637418603
literature: 0.06472504433459927

We see that words like technology , engineer are inclined towards man while literature is inclined towards woman.


## Step 4 - Bias Neutralization

<a id='step4'>Bias Neutralization </a>

In [45]:
def neutralize(word , x , word_to_vec_map):
    # Extracting word vector from the dictionary.
    w = word_to_vec_map[word]
    
    # Finding the bias direction
    bias_direction = np.dot(w,x) * x /np.square((np.linalg.norm(x)))
    
    w_unbiased = w - bias_direction
    
    return w_unbiased

### Bias direction

In [46]:
w = "literature"

print("cosine similarity between " + w + " and x, before neutralizing: ", cosine_similarity(word_to_vec_map["literature"], x))

e_unbiased = neutralize("literature", x, word_to_vec_map)

print("cosine similarity between " + w + " and x, after neutralizing: ", cosine_similarity(e_unbiased, x))

cosine similarity between literature and x, before neutralizing:  0.06472504433459927
cosine similarity between literature and x, after neutralizing:  2.9721586407425153e-17


### Equalize word pairs

In [47]:
def equalize(w1, w2, bias_axis, word_to_vec_map):
    # Extracting vectors from dictionary.
    w1_vec = word_to_vec_map[w1]
    
    w2_vec = word_to_vec_map[w2]
    
    # The equations implemented below are described in the paper in the given link
    mu = (w1_vec + w2_vec) / 2
    
    # Projection of mu over bias_axis and the orthogonal axis
    mu_B = np.dot(mu,bias_axis) * bias_axis / np.square(np.linalg.norm(bias_axis))
    mu_orth = mu - mu_B
    
    w1_vecB = np.dot(w1_vec,bias_axis) * bias_axis / np.square(np.linalg.norm(bias_axis))
    w2_vecB = np.dot(w2_vec,bias_axis) * bias_axis / np.square(np.linalg.norm(bias_axis))
    
    w1_vecB_corrected = np.sqrt(np.absolute(1 - np.square(np.linalg.norm(mu_orth)))) * (w1_vecB - mu_B) / np.absolute((w1_vec - mu_orth) - mu_B) 
    w2_vecB_corrected = np.sqrt(np.absolute(1 - np.square(np.linalg.norm(mu_orth)))) * (w2_vecB - mu_B) / np.absolute((w2_vec - mu_orth) - mu_B)
    
    e1 = w1_vecB_corrected  + mu_orth
    e2 = w2_vecB_corrected  + mu_orth
    
    return e1 , e2

In [48]:
print("cosine similarities before equalizing:")
print("cosine_similarity(word_to_vec_map[\"man\"], gender) = ", cosine_similarity(word_to_vec_map["man"], x))
print("cosine_similarity(word_to_vec_map[\"woman\"], gender) = ", cosine_similarity(word_to_vec_map["woman"], x))
print()
e1, e2 = equalize("man", "woman", x, word_to_vec_map)
print("cosine similarities after equalizing:")
print("cosine_similarity(e1, gender) = ", cosine_similarity(e1, x))
print("cosine_similarity(e2, gender) = ", cosine_similarity(e2, x))

cosine similarities before equalizing:
cosine_similarity(word_to_vec_map["man"], gender) =  -0.11711095765336832
cosine_similarity(word_to_vec_map["woman"], gender) =  0.3566661884627037

cosine similarities after equalizing:
cosine_similarity(e1, gender) =  -0.7165727525843935
cosine_similarity(e2, gender) =  0.7396596474928909
