# Introduction

In this lab we explore approaches for reducing bias in word embeddings.  This lab is closely based on the article "Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings" by Bolukbasi et al., and is adapted from a lab in the Deep Learning course on Coursera.

### Module imports

In [1]:
%tensorflow_version 1.x

from keras import layers
from keras import models
from keras import optimizers
import os
import time
from google.colab import drive
import matplotlib.pyplot as plt
import pickle
import math
import numpy as np

TensorFlow 1.x selected.


Using TensorFlow backend.


## Set up -- getting the data and word embeddings
I have shared the necessary files with you in the same Google drive folder we used for lab 9.  To get the data into colab, run the code cell below and click on the link that is displayed.  It will pop up a new browser tab where you have to authorize Colab to access your google drive.  Then, copy the sequence of numbers and letters that is displayed and paste it in the space that shows up in the code cell below.


In [2]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
os.mkdir("/content/stat344ne_glove/")

FileExistsError: ignored

In [0]:
!unzip -uq "/content/drive/My Drive/stat344ne_imdb/glove.6B.50d.txt.zip" -d "/content/stat344ne_glove/glove/"

### Load word embeddings

We are working here with the GloVe (**Glo**bal **Ve**ctors for word representation) word embeddings.  The code for loading them is the same as in labs 9 and 10.

In [5]:
glove_dir = "/content/stat344ne_glove/glove"

embeddings_index = {}
f = open(os.path.join(glove_dir, 'glove.6B.50d.txt'))
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

print('Found %s word vectors.' % len(embeddings_index))

Found 400000 word vectors.


### Cosine Similarity

We're going to explore how bias shows up in word embeddings as evidenced by analogies.  Here's a utility function to calculate cosine similarity, from lab 9.

In [0]:
def cos_similarity(v, w):
  '''
  Calculate cosine similarity of vectors v and w

  Arguments:
   - v: column vector of shape (d, 1)
   - w: column vector of shape (d, 1)
  
  Return:
   - cosine similarity of v and w
  '''
  # add your calculation here.  You can add more lines if it's helpful
  result = np.dot(v.T, w) / (np.linalg.norm(v) * np.linalg.norm(w))
  return(result)

Based on this function, we found analogies by examining the cosine similarity of differences between pairs of words.  For example, the analogy "paris is to france as rome is to italy" is represented by the fact that the cosine similarity of the difference $(e_{paris} - e_{rome})$ and the difference $(e_{rome} - e_{italy})$ is large:

In [7]:
cos_similarity(
    embeddings_index.get('paris') - embeddings_index.get('france'),
    embeddings_index.get('rome') - embeddings_index.get('italy')
)

0.67514807

#### 1. Function to complete analogies

Suppose we want to find analogies.  For example, what word completes the analogy "Paris is to France as Sydney is to \_\_\_\_\_"?  More generally, we will seek the best word to fill in the blank in the analogy "\<`word_a`\> is to \<`word_b`\> as \<`word_c`\> is to \_\_\_\_\_".

One way to find an answer is with the following algorithm:

1. Initialize `best_word = None` and `best_score = -100` (any number less than -1 would do)
2. For each `word_d` in the `embeddings_index`,

  a. If the `word_d` is one of `word_a`, `word_b`, or `word_c`, `continue` (skip the rest of this iteration of the loop; we don't want to complete the analogy using one of the input words.)

  b. Compute `current_score` as the cosine similarity of $(e_{<word\_a>} - e_{<word\_b>})$ and $(e_{<word_c>} - e_{<word\_d>})$

  c. If the `current_score` is greater than `best_score`, set `best_word = word_d` and `best_score = current_score`.

3. Return `best_word`

Fill in the missing steps in the `complete_analogy` function below to implement this algorithm:

In [0]:
def complete_analogy(word_a, word_b, word_c, embeddings_index):
    """
    Performs the word analogy task as explained above: a is to b as c is to ____. 
    
    Arguments:
     - word_a: a word, string
     - word_b: a word, string
     - word_c: a word, string
     - embeddings_index: dictionary that maps words to their corresponding vectors. 
    
    Returns:
    best_word: the word such that e_a - e_b is close to e_d - e_best_word, as
      measured by cosine similarity
    """
    
    # convert words to lowercase
    word_a, word_b, word_c = word_a.lower(), word_b.lower(), word_c.lower()
    
    # Get the word embeddings e_a, e_b and e_c (≈1-3 lines)
    e_a = embeddings_index.get(word_a)
    e_b = embeddings_index.get(word_b)
    e_c = embeddings_index.get(word_c)
    
    # Initialize
    words = embeddings_index.keys() # all possible words in the embeddings index
    best_score = -100               # Initialize best_score to a large negative number
    best_word = None                # Initialize best_word to None

    # loop over the whole word vector set
    for word_d in words:        
        # to avoid best_word being one of the input words, skip the input words
        # there are many ways to do this; the simplest is to compare word_d to
        # each of the other words one at a time with ==, and check all three
        # using or.
        if word_d == word_a or word_d == word_b or word_d == word_c:
            continue
        
        # Find the embedding for word_d
        e_d = embeddings_index.get(word_d)

        # Compute cosine similarity between the vector (e_b - e_a) and the
        # vector e_c - e_d
        current_score = cos_similarity(e_a - e_b, e_c - e_d)
        
        # If the current_score is more than the best_score seen so far,
        # update the best score and best word
        if current_score > best_score:
            best_score = current_score
            best_word = word_d
    
    return best_word

#### 2. Analogy for locations
Use your function to complete the analogy "Paris is to France as Sydney is to \_\_\_\_\_."

In [9]:
complete_analogy('paris', 'france', 'sydney', embeddings_index)

'australia'

#### 3. Analogy for professions
Let's look at the problematic example of gendered professions we saw in the lecture: use your function to complete the analogy "man is to doctor as woman is to \_\_\_\_\_."  Also compute the cosine similarity for the word embedding differences in this analogy.

In [10]:
# complete the analogy
complete_analogy('man', 'doctor', 'woman', embeddings_index)

'nurse'

In [11]:
# find the cosine similarity for the word embedding differences in
# the completed analogy.
cos_similarity(
    embeddings_index.get('man') - embeddings_index.get('doctor'),
    embeddings_index.get('woman') - embeddings_index.get('nurse')
)

0.6831788

## A couple of useful features in Python
For the next problem, I want to encourage you to use two useful features in python: list comprehensions, and simultaneous loops over multiple vectors with `zip`.  Here is a brief introduction to those two features:

### List comprehensions

A list comprehension is basically a way to easily create a list with a for loop.  Here's an example of creating a list of the squares of integers from 0 to 9.

In [12]:
# Method 1: what you might usually do

# create an empty list
result1 = []

# in a for loop, append the results one at a time
for i in range(10):
  result1.append(i**2)

result1

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [13]:
# Method 2: using a list comprehension
# the format is [<quantity for one entry of the list> for <loop specification>]
result2 = [i**2 for i in range(10)]
result2

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

### Iterating over multiple lists with `zip`

Suppose I have two lists of length 5, and I want to write a for loop that processes the corresponding entries.  For example, the following code concatenates corresponding entries of two lists and prints the results out:

In [14]:
list1 = ['a', 'b', 'c', 'd', 'e']
list2 = ['v', 'w', 'x', 'y', 'z']

for i in range(5):
  print(list1[i] + "_" + list2[i])

a_v
b_w
c_x
d_y
e_z


The function zip lets us do this without introducing the new variable i, by directly referencing the letters we want to work with.  This doesn't necessarily make your code shorter, but it can make it more readable and easier to understand.  Behind the scenes, the way that this works is that zip creates a generator that generates tuples of corresponding values from its arguments.

In [15]:
list1 = ['a', 'b', 'c', 'd', 'e']
list2 = ['v', 'w', 'x', 'y', 'z']

for letter1, letter2 in zip(list1, list2):
  print(letter1 + "_" + letter2)

a_v
b_w
c_x
d_y
e_z


### Putting together list comprehensions and zip
These ideas can also be combined, as in the following example:

In [16]:
list1 = ['a', 'b', 'c', 'd', 'e']
list2 = ['v', 'w', 'x', 'y', 'z']

# use a list comprehension combined with zip to iterate over the two lists at once
result  = [letter1 + "_" + letter2 for letter1, letter2 in zip(list1, list2)]
result

['a_v', 'b_w', 'c_x', 'd_y', 'e_z']

#### 4. Identify a "gender direction" in the word embedding space.

In the original article, this was done by:

1. Finding the differences in embeddings between many gendered word pairs such as "woman" and "man", "mother" and "father", and so on.
2. Conducting Principal Component Analysis (PCA) to find a single direction in the word embedding space that captures most of the variability in these differences in word pairs.

Not everyone in this class has seen PCA, so we will simplify this second step by taking the mean instead; this will work nearly as well.  Here's our version of this procedure:

1. Create a list of the differences in embeddings between multiple gendered word pairs such as "woman" and "man", "mother" and "father", and so on.
2. Use `np.mean()` to compute the average across all word pair differences in the list.

For step 1, we will use the word pairs used in the article by Bolukbasi et al. (Figure 2), other than the pair 'Mary' and 'John', which are not both included in the GloVe word embedding.  These are defined in the arrays `female_words` and `male_words` below.

In [17]:
# Word pairs used to define the gender direction in Bolukbasi et al.
female_words = ['she', 'her', 'woman', 'herself', 'daughter', 'mother',
  'gal', 'girl', 'female']
male_words = ['he', 'his', 'man', 'himself', 'son', 'father',
  'guy', 'boy', 'male']

# Step 1. Create a list with the word embedding differences between
# corresponding word pairs.  For example, the first component of the list should
# be the difference between the embeddings for 'she' and 'he'.  There are many
# ways to do this, but the easiest is with a list comprehension based on the
# zipped female_words and male_words.  No matter how you do it, you will need to
# call embeddings_index.get() twice in each iteration of the loop (once for a
# 'female' word and once for a corresponding 'male' word).
word_diffs = [embeddings_index.get(f) - embeddings_index.get(m) for f, m in zip(female_words, male_words)]

# Step 2. Call np.mean() on your list of word differences
# You will need to specify axis = 0
g = np.mean(word_diffs, axis = 0)
g

array([ 0.14136212,  0.29730654, -0.10995344,  0.09161666, -0.20763223,
        0.6295574 ,  0.1969412 ,  0.02930557,  0.5208999 , -0.09071961,
        0.08637322, -0.47587886,  0.5303101 ,  0.19009914,  0.15338446,
        0.04164857, -0.47131667,  0.03828232,  0.51945066,  0.09632424,
        0.3348771 ,  0.47196   ,  0.18238276,  0.28140032,  0.21842666,
        0.25748554, -0.07429378,  0.17835577, -0.11268745, -0.29692107,
       -0.37787175,  0.20825645,  0.03342777,  0.06117866, -0.16547376,
       -0.12674703, -0.16060212, -0.18647201,  0.16859788, -0.27167857,
       -0.06561244, -0.24910797,  0.5795281 , -0.44391817,  0.15341921,
       -0.28378934,  0.28439152, -0.23881333,  0.08010322, -0.02405944],
      dtype=float32)

#### 5. Here are the cosine similarities of several names with the gender direction vector g you obtained in part 4.  Explain what a positive and negative cosine similarity indicates, and why.

In [18]:
name_list = ['john', 'marie', 'sophie', 'ronaldo', 'priya', 'rahul', 'danielle',
  'reza', 'katy', 'yasmin']

for w in name_list:
    print (w, cos_similarity(embeddings_index.get(w), g))

john -0.42817587
marie 0.2553097
sophie 0.35658687
ronaldo -0.3242363
priya 0.2692463
rahul -0.15097639
danielle 0.34718466
reza -0.10132656
katy 0.30913565
yasmin 0.2528625


Since we calculated the gender direction vector as the mean of the differences of word embeddings of 'female' words minus 'male' words, a positive cosine similarity indicates a word pointing in the direction of 'female' words and a negative cosine similarity idicates a word pointing in the direction of 'male' words.

#### 6. Compute the cosine similarities of your gender direction vector g and the words 'nurse' and 'doctor'.

In [19]:
cos_similarity(embeddings_index.get('nurse'), g)

0.2858767

In [20]:
cos_similarity(embeddings_index.get('doctor'), g)

0.003048606

#### 7. The model has learned a clear association between the word 'nurse' and gender, which we may prefer to neutralize.  Recall that this is done by taking the original word embedding and subtracting its orthogonal projection onto the gender direction vector:

$$e^{corrected} = e^{original} - \frac{g g^T}{g^T g}e^{original}$$

Complete the functions below to implement the neutralize method.  Because we will need the orthogonal projection operation again below, I've pulled that step out into a separate function.

In [0]:
def orth_proj(e, g):
  '''
  Arguments:
   - e: a column vector
   - g: a column vector
  
  Returns:
   - the orthogonal projection of e onto the subspace spanned by g
  '''
  # Enforce that e and g are column vectors.  This prevents bugs that may occur
  # if one of them is a row vector or has only one dimension (e.g. shape (50,))
  e = e.reshape((e.shape[0], 1))
  g = g.reshape((g.shape[0], 1))

  # Calculate the orthogonal projection.
  # You will need to call np.dot() multiple times.
  # You can split this over multiple lines if you prefer (for example, you might
  # calculate the projection matrix P first).
  proj = np.dot(np.dot(g, g.T) / np.dot(g.T, g), e)

  # return
  return proj



def neutralize(e, g):
  '''
  Neutralize the embedding e along the subspace defined by the vector g

  Arguments:
   - e: a word embedding to neutralize
   - g: a vector defining a bias subspace
  
  Returns:
   - e minus the orthogonal projection of e onto the subspace spanned by g
  '''
  # Enforce that e and g are column vectors.  This prevents bugs that may occur
  # if one of them is a row vector or has only one dimension (e.g. shape (50,))
  e = e.reshape((e.shape[0], 1))
  g = g.reshape((g.shape[0], 1))

  # Calculate neutralized result as the difference between the original vector e
  # and its orthogonal projection onto g (you should call orth_proj).
  e_corrected = e - orth_proj(e, g)

  # return
  return e_corrected[:,0]

#### 8. Call the function you defined above to neutralize the original word embedding for 'nurse'.  Calculate the cosine similarity for the updated embedding vector and the gender direction g.

In [22]:
e_original = embeddings_index.get('nurse')
e_corrected = neutralize(e_original, g)
cos_similarity(e_corrected, g)

9.369015e-09

#### 9. Update the embedding for nurse with the corrected embedding.  Then call the `complete_analogy` function again with the updated `embeddings_index` to complete the analogy "Man is to doctor as woman is to \_\_\_\_\_\_\_."

Note that because of how we defined the `complete_analogy` function above, it is not allowed to return 'doctor'.

In [0]:
embeddings_index['nurse'] = e_corrected

In [24]:
complete_analogy('man', 'doctor', 'woman', embeddings_index)

'physician'

#### 10. We now turn to our second example from lecture: gender association among the words 'babysit', 'grandmother', and 'grandfather'. Compute the cosine similarity among the following vectors (you will call `cos_similarity` 3 times, once for each bullet point).

 * $e_{babysit}$ and the gender direction vector $g$
 * $e_{babysit}$ and $e_{grandmother}$
 * $e_{babysit}$ and $e_{grandfather}$

In [25]:
cos_similarity(embeddings_index.get('babysit'), g)

0.2294906

In [26]:
cos_similarity(embeddings_index.get('babysit'), embeddings_index.get('grandmother'))

0.3096933

In [27]:
cos_similarity(embeddings_index.get('babysit'), embeddings_index.get('grandfather'))

0.14852814

From the first comparison above, we see that the model associates the word 'babysit' with the female direction in the word embedding space.  If we want to remove this association, we would do so with a neutralize step.  

From the next two comparisons, we see that the model associates the word 'babysit' with both 'grandmother' and 'grandfather', but the association with 'grandmother' is stronger.  If we want to address this stronger association, we would do so with an equalize step.

#### 11. Create a corrected word embedding for 'babysit' by calling `neutralize`.  Then, compute the cosine similarity among the following vectors (you will call `cos_similarity` 3 times, once for each bullet point).

 * $e^{corrected}_{babysit}$ and the gender direction vector $g$
 * $e^{corrected}_{babysit}$ and $e_{grandmother}$
 * $e^{corrected}_{babysit}$ and $e_{grandfather}$

In [0]:
e_corrected = neutralize(embeddings_index.get('babysit'), g)

In [29]:
cos_similarity(e_corrected, g)

2.3395359e-08

In [30]:
cos_similarity(e_corrected, embeddings_index.get('grandmother'))

0.2563601

In [31]:
cos_similarity(e_corrected, embeddings_index.get('grandfather'))

0.19900998

After the neutralize step, 'babysit' is not strongly associated with the gender direction vector.  However, it is still more strongly associated with 'grandmother' than 'grandfather'.  We need to use an equalize step to address this.

#### 12. Implement an equalize step.

The calculations to do are listed in the equations below.  These are the equations given in Step 2a of Bolukbasi et al., but broken down into more steps.


$
\begin{align*}
\mu &= \frac{e_{1} + e_{2}}{2}\tag{1} \\
\mu^{B} &= \frac {g g^T}{g^T g} \mu \tag{2} \\
\mu^B_{\perp} &= \mu - \mu^{B} \tag{3} \\
e_{1}^B &= \frac{g g^T}{g^T g} e_{1} \tag{4} \\
e_{2}^{B} &= \frac {g g^T}{g^T g} e_{2} \tag{5} \\
c &= \sqrt{ |{1 - ||\mu^B_{\perp} ||^2} |} \tag{6} \\
e_{1}^{corrected} &= c * \frac{e_{1}^{B} - \mu^B} {||e_{1}^{B} - \mu^B||} \tag{7} \\
e_{2}^{corrected} &= c * \frac{e_{2}^{B} - \mu^B} {||e_{1}^{B} - \mu^B||} \tag{8} \\
e_1^{equalized} &= e_{1}^{corrected} + \mu^B_{\perp} \tag{9} \\
e_2^{equalized} &= e_{2}^{corrected} + \mu^B_{\perp} \tag{10}\end{align*}
$

Fill in these calculations in the equalize function below.

In [0]:
def equalize(e1, e2, g):
  '''
  Debias gender specific words by following the equalize method described in
  Bolukbasi et al.
  
  Arguments:
    - e1: first word embedding to equalize
    - e2: second word embedding to equalize
    - g: a vector defining a bias subspace
  
  Returns:
    - e1_equalized, e2_equalized: equalized versions of e1 and e2
  '''
  # Enforce that e and g are column vectors.  This prevents bugs that may occur
  # if one of them is a row vector or has only one dimension (e.g. shape (50,))
  e1 = e1.reshape((e1.shape[0], 1))
  e2 = e2.reshape((e2.shape[0], 1))
  g = g.reshape((g.shape[0], 1))

  # Step 1: Compute the mean of e1 and e2
  # (you could just add them up and divide by 2)
  mu = (e1 + e2) / 2

  # Step 2: Compute the orthogonal projection of mu onto the bias subspace
  # using your orth_proj function from above
  mu_B = orth_proj(mu, g)

  # Step 3: Compute the component of mu that's orthogonal to the bias subspace
  mu_orth = mu - mu_B

  # Steps 4 and 5: Compute e1_B and e2_B by calling orth_proj
  e1_B = orth_proj(e1, g)
  e2_B = orth_proj(e2, g)
  
  # Step 6: Calculate the coefficient for the equalized embeddings
  c = np.sqrt(np.abs(1. - np.dot(mu_orth.T, mu_orth)))

  # Steps 7 and 8: Calculate the corrected embeddings, minus the orthogonal component
  corrected_e1_B = c * (e1_B - mu_B) / np.linalg.norm(e1_B - mu_B)
  corrected_e2_B = c * (e2_B - mu_B) / np.linalg.norm(e2_B - mu_B)

  # Steps 9 and 10: Obtain the final equalized embeddings by adding the orthogonal component
  e1_equalized = corrected_e1_B + mu_orth
  e2_equalized = corrected_e2_B + mu_orth

  # return
  return e1_equalized[:, 0], e2_equalized[:, 0]

#### 13. Obtain the equalized embeddings for 'grandmother' and 'grandfather'.  Then calculate the cosine similarities between the following embeddings (one call to `cos_similarity` for each bullet point):
 * `e_corrected` (the neutralized embedding of 'babysit' from part 11) and `e_gmother_equalized`
 * `e_corrected` (the neutralized embedding of 'babysit' from part 11) and `e_gfather_equalized`
 * `g` and `e_gmother_equalized`
 * `g` and `e_gfather_equalized`

 

In [0]:
e_gmother_equalized, e_gfather_equalized = equalize(embeddings_index.get('grandmother'), embeddings_index.get('grandfather'), g)

In [34]:
cos_similarity(e_corrected, e_gmother_equalized)

0.17213492

In [35]:
cos_similarity(e_corrected, e_gfather_equalized)

0.1721349

In [36]:
cos_similarity(g, e_gmother_equalized)

0.6975457

In [37]:
cos_similarity(g, e_gfather_equalized)

-0.69754577