# Debiasing word embeddings

Word embedding are word vectors that have meaning, word vectors similar to each other will be close to each other in a vector space.

**After completing this lab you will be to:**

- Use and load pre-trained word vectors
- Measure similarity of word vectors using cosine similarity
- Solve word analogy probelms such as Man is to Woman as Boy is to ____ using word embeddings
- Reduce gender bias in word embeddings by modifying word embeddings to remove gender stereotypes, such as the association between the words *receptionist* and *female*

## <font color='darkblue'>Word embeddings</font>

Word embedding is a method used to represent words as vectors. They are populary used in machine learning and natural language processing tasks. Despite their success in downstream tasks such as cyberbullying, sentiment analysis, and question retrieval, they exhibit gender sterotypes which raises concerns because their widespread use can amplify these biases.


Word embeddings are trained on word co-occurance using a text dataset. After training, each word $w$ will be represented as a $d$-dimensional word vector $\vec{w} \space ϵ \space ℝ^d $.

#### Word embedding properties:
* Words with similar semantic meaning will be close to each other
* The difference between word embedding vectors can represent relationships between words. For example, given the analogy "man is to King as woman is to $x$" (denoted as $man:king :: woman:x$), by doing simple arithmetic on the embedding vectors, we find that $x = queen$ is the best answer because $\vec{man} - \vec{woman} ≈ \vec{king} - \vec{queen}$. For the analogy $Paris:France :: Nairobi:x$, finds that $x = Kenya$. These embeddings can also amplify sexism implicit in text. For instance, $\vec{man} - \vec{woman} ≈ \vec{computer \space programmer} - \vec{homemaker}$. The same system that produced reasonable answers to the previous examples offensively answers "man is to computer programmer as woman is to $x$" with $x = homemaker$.

Run the following cell to load the required modules.

In [None]:
import os
import json
import numpy as np
from pathlib import Path
from sklearn.decomposition import PCA

## <font color='darkblue'>Download and Load word vectors</font>
Due to the computational resources required to train word embeddings, we will be using a pre-trained 50-dimensional word embeddings, GloVe to represent words.

Run the following cells to download and load the word embeddings.

In [None]:
def download_glove_vectors():
    '''
    Download the GloVe vectors
    Arguments:
        None
    Returns:
        file_name (String): The absolute path of the downloaded 50-dimensional
        GloVe word vector representations
    '''

    if not Path('data').is_dir():
        print("Downloading the embeddings ...")
        !wget --quiet https://nlp.stanford.edu/data/glove.6B.zip
        print("Embeddings downloaded.")

        # Unzip it
        print("Unzipping the downloaded file ...")
        !unzip -q glove.6B.zip -d data/
        print("File unzipped.")

    return '/content/data/glove.6B.50d.txt'

In [None]:
def get_glove_vectors(glove_file):
    '''
    Read the word vectors in glove_file
    Arguments:
        glove_file (String): The absolute path to the downloaded glove word embeddings
    Returns:
        words (Set): The words (vocabulary) in the pretrained glove word embeddings
        word_to_vector_map (Dict): A dictionary mapping the each word to its embedding vector
    '''

    words = set()
    word_to_vector_map = {}
    with open(glove_file, 'r') as file_handle:
        for line in file_handle:
            line = line.strip().split()
            current_word = line[0]
            words.add(current_word)
            current_word_vector = line[1:]
            word_to_vector_map[current_word] = np.array(current_word_vector, dtype=np.float64)

    return words, word_to_vector_map

In [None]:
# Load sets of words in the vocabulary and a dictionary mapping words to their GloVe vectors
words, word_to_vector_map = get_glove_vectors(download_glove_vectors())

## <font color='darkblue'>Operations on word embeddings</font>

### Task 1 - Cosine similarity

Similarity between two words represented as word vectors $u$ and $v$ can be measured by their cosine similarity:

$$\text{CosineSimilarity(u, v)} = \frac {u \cdot v} {||u||_2 ||v||_2} = cos(\theta) \tag{1}$$

Where:

$u \cdot v$ is the dot (inner) product of the two vectors

$||u||_2$ is the length of the vector $u$. The length also called Euclidean length or Euclidean norm defines a distance function defined as $||u||_2 = \sqrt{u_1^2 \space + \space ... \space + \space u_n^2}$

The normalized similarity between $u$ and $v$ is the cosine of the angle between the two vectors denoted as $\theta$. The cosine similarity of $u$ and $v$ will be close to 1 if the two vectors are similar, otherwise, the cosine similarity will be small.

**Note**: We will be refering to the embedding of a word i.e the word vector and the word interchangeably in this lab.


***
**<font color='red'>Task 1a:</font>** Implement equation 1 in the `cosine_similarity()` function below. <br> Hint: check out the numpy documentation on [np.dot](https://numpy.org/doc/stable/reference/generated/numpy.dot.html), [np.sum](https://numpy.org/doc/stable/reference/generated/numpy.sum.html), and [np.sqrt](https://numpy.org/doc/stable/reference/generated/numpy.sqrt.html). Depending on how you choose to implement it, you can check out [np.linalg.norm](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html).
***


In [None]:
def cosine_similarity(vector1, vector2):
    """
    Calculates the cosine similarity of two word vectors - vector1 and vector2
    Arguments:
        vector1 (ndarray): A word vector having shape (n,)
        vector2 (ndarray): A word vector having shape (n,)
    Returns:
        cosine_similarity (float): The cosine similarity between vector1 and vector2
    """

    # Start Code Here #
    # Compute the dot product between vector1 and vector2 (~ 1 line)
    dot = None

    # Compute the Euclidean norm or length of vector1 (~ 1 line)
    norm_vector1 = None

    # Compute the Euclidean norm or length of vector2 (~ 1 line)
    norm_vector2 = None

    # Compute the cosine similarity as defined in equation 1 (~ 1 line)
    cosine_similarity = None
    # End Code Here #

    return cosine_similarity


In [None]:
# Run this cell to obtain and report your answers
man = word_to_vector_map["man"]
woman = word_to_vector_map["woman"]
cat = word_to_vector_map["cat"]
dog = word_to_vector_map["dog"]
orange = word_to_vector_map["orange"]
england = word_to_vector_map["england"]
london = word_to_vector_map["london"]
edinburgh = word_to_vector_map["edinburgh"]
scotland = word_to_vector_map["scotland"]

print(f"Cosine similarity between man and woman: {cosine_similarity(man, woman)}")
print(f"Cosine similarity between cat and dog: {cosine_similarity(cat, dog)}")
print(f"Cosine similarity between cat and cow: {cosine_similarity(cat, orange)}")
print(f"Cosine similarity between england - london and edinburgh - scotland: {cosine_similarity(england - london, edinburgh - scotland)}")

**<font color='red'>Task 1b:</font>** In the code cell below, try out 3 of your own inputs here and report your inputs and outputs

In [None]:
# Start code here #
None = None
None = None
None = None
None = None
None = None
None = None

print(f"Cosine similarity between None and None: {None}")
print(f"Cosine similarity between None and None: {None}")
print(f"Cosine similarity between None and None: {None}")
# End code here #

### Task 2 - Word analogy

In an analogy task, you are given an analogy in the form "i is to j as k is to ___". Your task is to complete this sentence.

For example, if you are given "man is to king as woman is to $l$" (denoted as $man:king :: woman:l$). You are to find the best word $l$ that answers the analogy the best. Simple arithmetic of the embedding vectors will find that $l = queen$ is the best answer because the embedding vectors of words $i$, $j$, $k$, and $l$ denoted as $e_i$, $e_j$, $e_k$, $e_l$ have the following relationship:
$$e_j - e_i ≈ e_l - e_k$$

Cosine similarity can be used to measure the similarity between $e_j - e_i$ and $e_l - e_k$

***
**<font color='red'>Task 2a:</font>** To perform word analogies, implement `answer_analogy()` below.
***


In [None]:
def answer_analogy(word_i, word_j, word_k, word_to_vector_map):
    """
    Performs word analogy as described above
    Arguments:
        word_i (String): A word
        word_j (String): A word
        word_k (String): A word
        word_to_vector_map (Dict): A dictionary of words as key and its associated embedding vector as value
    Returns:
        best_word (String): A word that fufils the relationship that e_j - e_i as close as possible to e_l - e_k, as measured by cosine similarity
    """

    # Convert words to lowercase
    word_i = word_i.lower()
    word_j = word_j.lower()
    word_k = word_k.lower()

    # Start code here #
    try:
        # Get the embedding vectors of word_i (~ 1 line)
        embedding_vector_of_word_i = None
    except KeyError:
        print(f"{word_i} is not in our vocabulary. Please try a different word.")
        return

    try:
        # Get the embedding vectors of word_j (~ 1 line)
        embedding_vector_of_word_j = None
    except KeyError:
        print(f"{word_j} is not in our vocabulary. Please try a different word.")
        return

    try:
        # Get the embedding vectors of word_k (~ 1 line)
        embedding_vector_of_word_k = None
    except KeyError:
        print(f"{word_k} is not in our vocabulary. Please try a different word.")
        return
    # End code here #

    # Get all the words in our word to vector map i.e our vocabulary
    words = word_to_vector_map.keys()
    max_cosine_similarity = -1000                           # Initialize to a large negative number
    best_word = None                                        # Note: Do not change this None. Keeps track of the word that best answers the analogy.

    # Since we are looping through the whole vocabulary, if we encounter a word
    # that is the same as our input, that word becomes the best_word. To avoid
    # that we skip the input word.
    input_words = set([word_i, word_j, word_k])

    for word in words:
        if word in input_words:
            continue

        # Start code here #
        # Compute cosine similarity  (~ 1 line)
        similarity = None

        # Have we seen a cosine similarity bigger than max_cosine_similarity?
            # then update the max_cosine_similarity to the current cosine similarity
            # and update the best_word to the current word (~ 3 lines)
        if None > None:
            max_cosine_similarity = None
            best_word = None
        # End code here

    return best_word

***
**<font color='red'>Task 2b:</font>** Test your implementation by running the code cell below. What are your observations? What do you observe about the last two outputs?.
***


In [None]:
analogies = [('france', 'french', 'germany'),
             ('england', 'london', 'japan'),
             ('boy', 'girl', 'man'),
             ('man', 'doctor', 'woman'),
             ('small', 'smaller', 'big')]
for analogy in analogies:
    best_word = answer_analogy(*analogy, word_to_vector_map)
    if best_word:
        print(f"{analogy[0]} -> {analogy[1]} :: {analogy[2]} -> {best_word}")

***
**<font color='red'>Task 2c:</font>** Try your own analogies by completing and executing the code cell below. Find 2 that works and one that doesn't. Report your inputs and outputs
***

In [None]:
my_analogies = [(None, None, None), (None, None, None), (None, None, None), (None, None, None)]
for analogy in my_analogies:
    best_word = answer_analogy(*analogy, word_to_vector_map)
    print(f"{analogy[0]} -> {analogy[1]} :: {analogy[2]} -> {best_word}")

### Task 3 - Geometry of Gender and Bias in Word Embeddings: Occupational stereotypes
In this task, we will understand the biases present in word-embedding i.e which words are closer to $she$ than to $he$. This will be achieved by evaluating whether the GloVe embeddings have sterotypes on occupation words. Determine gender bias by projecting each of the occupations onto the $she - he$ direction by computing the dot product between each occupation word embedding and the embedding vector of $she - he$ normalized by the Euclidean norm (See task 1).

$$occupation\_word_i \cdot ||she - he||_2 \tag{2}$$

Notice that equation 2 is similar to only the numerator of equation 1 because we are computing the dot product of $occupation\_word_i$ and the normalized difference between $she$ and $he$.

Run the cells below to download and view the occupations.

In [None]:
def download_occupations():
    if not Path('debiaswe').is_dir():
        print("Downloading occupation list ...")
        !git clone -q https://github.com/tolga-b/debiaswe.git
        print("Occupation list downloaded.")

    return '/content/debiaswe/data/professions.json'


def view_occupations(occupations_file):
    with open(occupations_file, 'r') as file_handle:
        occupations = json.load(file_handle)

        for occupation in occupations:
            print(occupation[0])

In [None]:
occupations_file = download_occupations()

In [None]:
view_occupations(occupations_file)

***
**<font color='red'>Task 3a:</font>** Complete the `get_occupation_stereotypes()` below.
***

In [None]:
def get_occupation_stereotypes(she, he, occupations_file, word_to_vector_map, verbose=False):
    """
    Computes the words that are closest to she and he in the GloVe embeddings
    Arguments:
        she (String): A word
        he (String): A word
        occupations_file (String): The path to the occupation file
        word_to_vector_map (Dict): A dictionary mapping words to embedding vectors
    Returns:
        most_similar_words (Tuple(List[Tuple(Float, String)], List[Tuple(Float, String)])):
        A tuple of the list of the most similar occupation words to she and he with their associated similarity
    """

    # Read occupations
    with open(occupations_file, 'r') as file_handle:
        occupations = json.load(file_handle)

    # Extract occupation words
    occupation_words = [occupation[0] for occupation in occupations]

    # Start code here #
    # Get embedding vector of she (~ 1 line)
    embedding_vector_she = None
    # Get embedding vector of he (~ 1 line)
    embedding_vector_he = None
    # Get the vector difference between embedding vectors of she and he (~ 1 line)
    vector_difference_she_he = None
    # Get the normalized difference (~ 1 line)
    normalized_difference_she_he = None
    # End code here #

    # Store the cosine similarities
    similarities = []

    for word in occupation_words:
        # Start code here #
        try:
            # Get the embedding vector of the current occupation word (~ 1 line)
            occupation_word_embedding_vector = None
            # Compute cosine similarity between embedding vector of the occupation word and normalized she - he vector (~ 1 line)
            similarity = None
            similarities.append((similarity, word))
        except KeyError:
            if verbose:
                print(f"{word} is not in our vocabulary.")
        # End code here #

    most_similar_words = sorted(similarities)

    return most_similar_words[:20], most_similar_words[-20:]



***
**<font color='red'>Task 3b:</font>** Execute the cell below and report your results.

1) Does the GloVe word embeddings propagate bias? why?

2) From the list associated with she, list those that reflect gender stereotype.   

3) Compare your list from 2 to the occupations closest to he. What are your conclusions?

Exclude businesswoman from your list.
***

In [None]:
he, she = get_occupation_stereotypes('she', 'he', occupations_file, word_to_vector_map)
print("Occupations closest to he:")
for occupation in he:
    print(f"{occupation[0], occupation[1]}")

print("\nOccupations closest to she:")
for occupation in she:
    print(f"{occupation[0], occupation[1]}")

<!--
### Task 4 - Analogies exhibiting stereotypes
Using analogies to quantify gender stereotype in the embedding. Given two words, e.g. $he$, $she$, generate a pair of words, $x$ and $y$, such that $he$ to $x$ as $she$ to $y$ is a good analogy. This will generate pairs that the embedding believes to be analogous to $he$, $she$ or any other pair of seed words.

The analogy generator takes as input a pair of seed words (a, b) which determines the seed direction $\vec{a} - \vec{b}$ corresponding to the ***normalized*** difference between the two seed words.

In this task, we will use $(a, b) = (she, he)$. Then all pairs of words $x, y$ is scored using the following metric:

$$S_{(a, b)}(x, y) = ||\vec{a} - \vec{b}||_2 \cdot ||\vec{x} - \vec{y}||_2 \space \text{if} \space ||\vec{x} - \vec{y}||_2 ≤ δ, 0 \space \text{else}$$

Where $δ$ is a threshold for semantic similarity. We will use $δ = 1$.

<!-- In other words, the above equation reads, if the normalized difference between $x$ and $y$ is less than or equal to our threshold, then the score of the pair of words $x, y$ is the dot product of the normalized difference between the seed pairs and the normalized difference between the pair of words $x, y$. ->

Notice that each vector difference is normalized, therefore we are basically computing the numerator of equation 1 as part of this equation.

***
**<font color='red'>Task 4:</font>** Test the implementation of your `get_analogies_exhibiting_stereotypes()` below. Report your results. Are the generated analogies biased?
***
-->

### Task 4 - Debiasing word embeddings

**Gender Specific words**

Words that are associated with a gender by definition. For example, brother, sister, businesswoman or businessman.

**Gender neutral words**

The remanining words that are not specific to a gender are gender neutral. For example, flight attendant or shoes. The compliment of gender specific words, can be taken as the gender neutral words.

**Step 1 - Identify gender subspace i.e identify the direction of the embedding that captures the bias**

To robustly estimate bias, we use the gender specific words to learn a gender subpace in the embedding. To identify the gender subspace, we consider the vector difference of gender specific word pairs, such as $\vec{she} - \vec{he}$, $\vec{woman} - \vec{man}$ or $\vec{her} - \vec{his}$. This identifies a **gender direction or bias subspace** $g \space ϵ \space ℝ^d$ which captures gender in the embedding.

**Note:** We will use $g$ and $bias\_direction$ interchangeably in this lab.

In [None]:
gender = word_to_vector_map['she'] - word_to_vector_map['he']
print(gender)

The gender subspace can also be captured more accurately by taking gender pair difference vectors and computing its principal components (PCs). The top PC, denoted by the unit vector $g$, captures the gender subspace.

In [None]:
def get_gender_subspace(pairs, word_to_vector_map, num_components=10):
    """
    Compute the gender subspace by computing the principal components of
    ten gender pair vectors.
    Arguments:
        pairs (List[Tuple(String, String)]): A list of gender specific word pairs
        word_to_vector_map (Dict): A dictionary mapping words to embedding vectors
        num_components (Int): The number of principal components to compute. Defaults to 10
    Returns:
        gender_subspace (ndarray): The gender bias subspace(or direction) of shape (embedding dimension,)
    """

    matrix = []
    for word_1, word_2 in pairs:
        embedding_vector_word_1 = word_to_vector_map[word_1]
        embedding_vector_word_2 = word_to_vector_map[word_2]
        center = (embedding_vector_word_1 + embedding_vector_word_2) / 2
        matrix.append(embedding_vector_word_1 - center)
        matrix.append(embedding_vector_word_2 - center)

    matrix = np.array(matrix)
    pca = PCA(n_components=num_components)
    pca.fit(matrix)

    pcs = pca.components_                  # Sorted by decreasing explained variance
    eigenvalues = pca.explained_variance_  # Eigenvalues
    gender_subspace = pcs[0]               # The first element has the highest eigenvalue
    return gender_subspace

In [None]:
gender_specific_pairs = [
    ('she', 'he'),
    ('her', 'his'),
    ('woman', 'man'),
    ('mary', 'john'),
    ('herself', 'himself'),
    ('daughter', 'son'),
    ('mother', 'father'),
    ('gal', 'guy'),
    ('girl', 'boy'),
    ('female', 'male')
]
gender_direction = get_gender_subspace(gender_specific_pairs, word_to_vector_map)
print(gender_direction)

***
**<font color='red'>Task 4a:</font>** Run the cell below to computes the similarity between the gender embedding and the embedding vectors of male and female names. What can you observe?
***

In [None]:
print('Names and their similarities with simple gender subspace')
names = ["mary", "john", "sweta", "david", "kazim", "angela"]
for name in names:
    print(name, cosine_similarity(word_to_vector_map[name], gender))

print()
print('Names and their similarities with PCA based gender subspace')
names = ["mary", "john", "sweta", "david", "kazim", "angela"]
for name in names:
    print(name, cosine_similarity(word_to_vector_map[name], gender_direction))

***
**<font color='red'>Task 4b:</font>** Quantify direct and indirect biases between words and the gender embedding by running the following cell. What is your observation?
***

In [None]:
words = ["engineer", "science", "pilot", "technology", "lipstick", "arts", "singer", "computer", "receptionist", "fashion", "doctor", "literature"]
for word in words:
    print(word, cosine_similarity(word_to_vector_map[word], gender_direction))

**Step 2 - Neutralize gender neutral words**

Ensures that gender neutral words are zero in the gender subspace. This means that this steps takes a vector such as $e_{fashion}$ and turns its components into zeros in the direction of $g$ to produce $e_{fashion}^{debiased}$

To remove bias in words such as "receptionist" or "shoe", given an input embedding of the word $e$, we compute debiased $e$ denoted as $e^{debiased}$ by using the formulas:

$$e^{bias\_component} = \frac{e \cdot bias\_direction}{||bias\_direction||_2^2} * bias\_direction\tag{3}$$

$$e^{debiased} = e - e^{bias\_component}\tag{4}$$

Where $e^{bias\_component}$ is the projection of the word embedding $e$ onto the gender subspace. Since the gender subspace is an orthogonal unit vector it is simply a direction. This also means that $e^{debiased}$ is the projection onto the orthogonal subspace.

$||g||_2^2$ is the squared euclidean norm of $g$ formulated as:

$$||g||_2^2 = {\sum}_i \space g_i^2$$


***
**<font color='red'>Task 4c:</font>** Implement `neutralize()` below by implementing the formulas above. Hint see [np.sum](https://numpy.org/doc/stable/reference/generated/numpy.sum.html)
***

In [None]:
def neutralize(word, gender_direction, word_to_vector_map):
    """
    Project the vector of word onto the gender subspace to remove the bias of "word"
    Arguments:
        word (String): A word to debias
        gender_direction (ndarray): Numpy array of shape (embedding size (50), ) which is the bias axis
        word_to_vector_map (Dict): A dictionary mapping words to embedding vectors

    Returns:
        debiased_word (ndarray): the vector representation of the neutralized input word
    """

    # Start code here #
    # Get the vector representation of word (~ 1 line)
    embedding_of_word = None

    # Compute the projection of word onto gender direction. e.q. 3 (~ 1 line)
    projection_of_word_onto_gender = None

    # Neutralize word e.q 4 (~ 1 line)
    debiased_word  = None
    # End code here #

    return debiased_word


***
**<font color='red'>Task 4d:</font>** Test your implementation by running the code cell below. What is your observation?
***

In [None]:
word = "babysit"
print(f"Before neutralization, cosine similarity between {word} and gender is: {cosine_similarity(word_to_vector_map[word], gender_direction)}")

debiased_word = neutralize(word, gender_direction, word_to_vector_map)
print(f"After neutralization, cosine similarity between {word} and gender is: {cosine_similarity(debiased_word, gender_direction)}")

**Step 3 - Equalize**

Equalizes sets of gender specific words outside the subspace. The goal is to ensure that gender neutral words are equidistance to all the words in the set. We want to ensure that gender specific words are not biased with respect to neutral words.

For example, consider the set {woman, man}, if the neutral word "babysit" is closer to "woman" than "man" then the neutralization of "babysit" can reduce the gender-stereotype associated with babysitting but does not make "babysit" equidistant to "woman" and "man".

Given two gender specific word pairs $w_1$ and $w_2$ to debias, and their embeddings $e_{w_1}$ and $e_{w_2}$,   equalization can be achieved with the following equations:

$$ \mu = \frac{e_{w_1} + e_{w_2}}{2} \tag{5}$$

$$ \mu_{B} = \frac{\mu \cdot bias\_direction}{||bias\_direction||_2^2} * bias\_direction \tag{6}$$

$$ v = \mu - \mu_B \tag{7}$$

$$ e_{w_1B} = \frac{e_{w_1} \cdot bias\_direction}{||bias\_direction||_2^2} * bias\_direction \tag{8}$$

$$ e_{w_2B} = \frac{e_{w_2} \cdot bias\_direction}{||bias\_direction||_2^2} * bias\_direction \tag{9}$$

$$ e_{w_1B}^{new} = \sqrt{|1 - ||v||_2^2|} * \frac{e_{w_1B} - \mu_B}{||(e_{w_1} - v) - \mu_B||_2} \tag{10}$$

$$  e_{w_2B}^{new} = \sqrt{|1 - ||v||_2^2|} * \frac{e_{w_2B} - \mu_B}{||(e_{w_2} - v) - \mu_B||_2} \tag{11}$$

$$ e_1 = v + e_{w_1B}^{new} \tag{12}$$

$$ e_2 = v + e_{w_2B}^{new} \tag{13}$$

***
**<font color='red'>Task 5a:</font>** Implement `equalization()` below by implementing the formulas above.
***

In [None]:
def equalization(equality_set, bias_direction, word_to_vector_map):
    """
    Equalize the pair of gender specific words in the equality set ensuring that
    any neutral word is equidistant to all words in the equality set.
    Arguments:
        equality_set (Tuple(String, String)): a tuple of strings of gender specific
        words to debias e.g ("grandmother", "grandfather")
        bias_direction (ndarray): numpy array of shape (embedding dimension,). The
        embedding vector representing the bias direction
        word_to_vector_map (Dict):  A dictionary mapping words to embedding vectors
    Returns:
        embedding_word_a (ndarray): numpy array of shape (embedding dimension,). The
        embedding vector representing the first word
        embedding_word_b (ndarray): numpy array of shape (embedding dimension,). The
        embedding vector representing the second word
    """

    # Start code here #
    # Get the vector representation of word pair by unpacking equality_set  (~ 3 line)
    word_a, word_b = None
    embedding_word_a = None
    embedding_word_b = None

    # Compute the mean (eq. 5) of embedding_word_a and embedding_word_a (~ 1 line)
    mean = None

    # Compute the projection of mean representation onto the bias direction (eq. 6) (~ 1 line)
    mean_B = None

    # Compute the projection onto the orthogonal subspace (eq. 7) (~ 1 line)
    mean_othorgonal = None

    # Compute the projection of th embedding of word a onto the bias direction (eq. 8) (~ 1 line)
    embedding_word_a_on_bias_direction = None

    # Compute the projection of th embedding of word b onto the bias direction (eq. 9) (~ 1 line)
    embedding_word_b_on_bias_direction = None
    # Re-embed embedding of word a using eq. 10 (~ 1 long line)
    new_embedding_word_a_on_bias_direction = None

    # Re-embed embedding of word b using eq. 11 (~ 1 long line)
    new_embedding_word_b_on_bias_direction = None

    # Equalize embedding of word a using eq. 12 (~ 1 line)
    embedding_word_a =  None

    # Equalize embedding of word b using eq. 13 (~ 1 line)
    embedding_word_b = None

    # End code here #

    return embedding_word_a, embedding_word_b

***
**<font color='red'>Task 5b:</font>** Test your implementation by running the cell below.
***

In [None]:
print("Cosine similarity before equalization:")
print(f"(embedding vector of father, gender_direction): {cosine_similarity(word_to_vector_map['father'], gender_direction)}")
print(f"(embedding vector of mother, gender_direction): {cosine_similarity(word_to_vector_map['mother'], gender_direction)}")
print()

embedding_word_a, embedding_word_b  = equalization(("father", "mother"), gender_direction, word_to_vector_map)
print("Cosine similarity after equalization:")
print(f"(embedding vector of father, gender_direction): {cosine_similarity(embedding_word_a, gender_direction)}")
print(f"(embedding vector of mother, gender_direction): {cosine_similarity(embedding_word_b, gender_direction)}")

***
**<font color='red'>Task 5c:</font>** Looking at the output of your implementation test above, what can you observe?.
***

**References**:
 - The debiasing algorithm is from Bolukbasi et al., 2016 [Man is to Computer Programmer as Woman is to Homemake? Debiasing word Embeddings](https://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf)
 - The code is partly adapted from Andrew Ng's debiasing word embeddings course on [Coursera](https://www.coursera.org/learn/nlp-sequence-models/lecture/zHASj/debiasing-word-embeddings)
 - The GloVe word embeddings is publicly available at (https://nlp.stanford.edu/projects/glove/) and is due to the works of Jeffrey Pennington, Richard Socher, and Christopher D. Manning.