# Week 5 Replication of Kozlowski et al. 

This is a tutorial to teach you about how word embeddings work and how they can be used to explore the documents we've created. On the whole, you need a large amount of data to train reliable word embeddings. This notebook is more about showing you how useful they can be to understand bias, and make sense of Kozlowski et al. I hoped that larger scale data collection would have been possible, but most of you are working with smaller datasets. Consequently this technique might not be useful now, but it may come in handy during the data sprint.

It is a very useful technique to understand bias in text, which is the theme that this notebook is organized around.

This notebook is a bit advanced, but it will help you make sense of what Kozlowski et al have done.

In [None]:
from drs_word_embedding import *

# **Let's play around with word embeddings**

The reading discussed what word embeddings are, and you may have checked out a few Github pages that have some nice explainers on what word embeddings are.

For our class, we want to look at how word embeddings might tell us something about bias, inequality, and related issues such as injustice, sexism, and racism.

There are two way you can work with word embeddings. You can explore existing models that have been trained on certain kinds of corpora. For example, immediately below we use a Glove model trained on Wikipedia to explore how to project embeddings onto a 2-D plane for visualisation. These are **pretrained** models. You can, for example, compare embeddings from a Glove model trained on Wikipedia with one trained on Google News (though this might not be that interesting!).

The second, more interesting, way to work with word embeddings is to create your own and compare it to others. You can use another notebook to do that.

In this notebook, I provide you with the code to

1. Load a pretrained word embedding model and visualise its outputs
2. Replicate (roughly) Kozlowski et al. (2019)'s projection methods. Projection here refers to **plotting selected embeddings on a 2-D space.**
3. Make a basic plot of projected word embeddings. We use matplotlib here to do the plotting.
4. Load your own corpus and make a word embedding model using the word2vec algorithm
5. Use Kozlowski et al.'s projection technique to compare word embeddings between a test model (in my example, subreddits) and a reference model (in my example, the Glove Wikipedia model).


## **Part 1: Load a pretrained model & Project Embeddings**

Gensim, the package we use for the word2vec algorithm, also has a lot of pretrained models for you to use.

In [None]:
## Let's look at all the available Gensim packages

info = api.info()
for model_name, model_data in sorted(info['models'].items()):
    print(
        '%s (%d records): %s' % (
            model_name,
            model_data.get('num_records', -1),
            model_data['description'][:40] + '...',
        )
    )

## its not super pretty but you can check out the models below
## we are going to use glove-wiki-gigaword-50
## check out Gensim's documentation for more information on these packages

In [None]:
## Let's get the glove-wiki-gigaword-50 model

model = api.load("glove-wiki-gigaword-50") # you can use any of the models you like; just make sure you copy/paste the string correctly

## it might need to download, so grab a coffee...
## once it is done, you've loaded a word embedding model!

## Exploring a Word Embedding Model

As Kozlowski et al (2019) discuss, a word embedding model is basically a representation of words and their relationships to one another in a high-dimensional space. This is done by using vectors, in this case a mathematical representation of a word. Each word in our model has a specific vector. These words can then by analysed by using a metric that calculates the distance between two vectors. In embedding models, the *cosine distance* between vector A and vector B is the metric typically used.

Let's do some basic operations with the model to get a sense of how it works. Check out the code cell below.

In [None]:
## we loaded the model above with api.load()
## so now we have working Word2Vec model
## we are using the Word2Vec class from Gensim, there's plenty of documentation

## Lets see what a word vector looks like.

model["king"]

In [None]:
## Lets see another word vector

model["man"]

In [None]:
## clearly, these are different words with different vectors
## obviously this makes little sense to a human eye. 
## we probably want to know how similar these terms are

## let's calculate the cosine distance between these two terms

model.distance("king","man")

In [None]:
## OK -- these words have a distance of 0.47
## What about the distance between "king" and "woman"?

model.distance("king", "woman")

In [None]:
## Hey! They have a higher difference. That means that MAN is closer to KING than WOMAN is to KING. Makes sense because the noun KING in English is not gender neutral
## We might also want to find all of the words that are similar to KING

model.most_similar("king")

In [None]:
## These are words obviously relating to rulers
## But we can also add and subtract vectors
## In theory, if we subtract the vector for MAN from KING, and add the vector for WOMAN, we should end up with the word QUEEN and related terms

model.most_similar(positive=["king", "woman"], negative=["man"], topn=50)

## Some notes about syntax with this command
## .most_similar() allows you to simply compare two words
## you can use postive for the vectors you want to add together and negative for the vectors you want to subtract
## you can also add the argument topn=N to get more results. Here we ask for the top 50 results

In [None]:
## let's try something else

mdist = model.distance("president","man")
wdist = model.distance("president","woman")

print("Distance, president --> man: ", mdist)
print("Distance, president --> woman: ", wdist)

In [None]:
## So English tends to associate men with the term president MORE than women. Big surprise.
## add some code cells below and try out some distances between words

model.distance("refugee","european")

In [None]:
model.distance("refugee","african")

In [None]:
model.distance("refugee","jewish")

In [None]:
model.distance("refugee","asian")

In [None]:
model.distance("refugee","american")

So English wikipedia seems to think Europeans are less associated with being refugees with other groups except Asians and Americans

This reflects a Eurocentric geopolitical perspective that the model recognizes in English Wikipedia texts. But to be certain of anything, we would need to run many more tests, and visualise these biases somehow.

We can do that by replicating Kozlowski et al's approach. In this manner, we can explore biases in a textual dataset. 

Add some more queries in code cells below, being careful to follow the syntax, and lowercasing your words (for this model).

From the perspective the data used to train the model, we might be able to observe the symbolic spaces at particular boundaries.

In [None]:
model.distance("government","evil")

In [None]:
model.distance("govenment","good")

In [None]:
## Interesting, but we can only glean so much from one query! 

**So what have we learned so far?**

1. Once a Word2Vec model is trained, every word in a corpus has vector which can be compared to another.
2. calling .distance("word1","word2") will calculate the cosine distance between each word's vectors. A higher value means they are more distant, while a lower value means they are more similar.
3. calling .most_similar() will return N words most similar to your query. This is also based on cosine distance, but the value returned is different. In this case, higher values are MORE SIMILAR and lower values are LESS SIMILAR.

Note: calling .most_similar() returns a list of python tuples, which is easy to convert to a data table.

## **Part 2: Reverse Engineering Projection**

It's not much fun when you just have numbers to work with. So how do we visualise these embeddings in a meaningful way?

Fortunately, Kozlowski et al (2019) have an approach that we can **roughly** replicate. What they do is they select anotnym pairs, for example, "rich"/"poor" and "man"/"woman" as axes. Then they plot a set of neutral words on a 2-D plane formed by these axes. 


**The naive version**

So, we can replicate a **naive** version of their projection approach.

Let's assume we wish to plot two neutral words, "engineer" and "nurse" on an axis "man"/"woman".

That means we need to calculate the distance between "man" and "engineer" and "woman" and "engineer". But we need to project that onto the axis. So what we can do is simply calculate the difference between the cosine distance of (man, engineer) and cosine distance of (woman, engineer). So basically, we use this formula:

     x = model.distance("man","engineer") - model.distance("woman", "engineer")

Where x is simply the x-value for the word engineer if the x-axis goes from man to woman.

Then we just do the same with "nurse". We will have two different x-values for each word. We would expect that the x-value for engineer is lower than nurse, meaning that engineer is more closely associated with men in our model.

We *could* plot this, but we need another axis for a 2-D plane. Let's use rich/poor. We do the same thing, except for the y-axis

     y = model.distance("rich","engineer") - model.distance("poor","engineer")

Where y is simply the y-value for the word engineer on the rich/poor (y) axis.

So now we have actual coordinates that we can plot! Engineer, x, y vs Nurse, x, y. 

The code cells below basically do this for you, so you don't need to create your own function. But read through to get a sense of how it works.

You need to have the following to project words onto these axes:
1. A word2vec model
2. Two opposing words for your x axis (eg. man/woman)
3. Two opposing words for your y axis (eg. rich/poor)
4. A set of test words, eg names of jobs


In [None]:
## OK, now lets use the naive projection 
## first set up your x and y axes

x_axis = ["man","woman"]
y_axis = ["rich","poor"]
test_words = ["custodian","player","cleaner","dentist","secretary","dancer","professor","engineer","driver","teacher","nurse","doctor","lecturer","lawyer","paralegal","athlete","criminal","unemployed"]

naive_projection(x_axis, y_axis, test_words, model,  plot_size=6)

# Improving the projection

Recall that we referred to this projection method as naive. It's not bad, but it could be better. That's because its not *really* what Kozlowski et al. do.

Instead, they take a large set of word pairs for each axis--for example, not only man/woman, but also he/she and male/female. All of these are related to the concept man/woman, but are different forms that mean something similar. Let's call an axis where we merge multiple, related antonymic pairs a **composite axis**.

 Then, they calculate the distances like we did using the formulas above, but take the average across antonym pairs. 

So what they do is similar to the following, for the test word "engineer" across multiple axes (man/woman, male/female, he/she):

    x1 = model.distance("man","engineer") - model.distance("woman","engineer")
    x2 = model.distance("he","engineer") - model.distance("she","engineer")
    x3 = model.distance("male","engineer") - model.distance("female","engineer")

But now we have three x values for engineer! We can't plot that. So both research teams simply took the average of these three x values to get one combined x value.

    x_val = average(x1, x2, x3) # this is psuedocode, won't work immediately

In [None]:
## calculate the x values for our 'composite' axes for the word engineer
x1 = model.distance("man","engineer") - model.distance("woman","engineer")
x2 = model.distance("he","engineer") - model.distance("she","engineer")
x3 = model.distance("male","engineer") - model.distance("female","engineer")

## note, statistics.mean() only takes a python list as an input, so you have to pass the values in as a list, with square brackets
x_eng = statistics.mean([x1, x2, x3])

## calculate the x values for our 'composite' axes for the word dancer
x1 = model.distance("man","dancer") - model.distance("woman","dancer")
x2 = model.distance("he","dancer") - model.distance("she","dancer")
x3 = model.distance("male","dancer") - model.distance("female","dancer")

## get the average, our x value for dancer on our composite axis
x_dnc = statistics.mean([x1, x2, x3])

print("x-value for engineer: ", x_eng)
print("x-value for dancer: ", x_dnc)

In [None]:
## now, lets set up our composite axes
## you can put in any words you want, but note the format and the location of square brackets
## here we are hard-coding the dimensions. We could theoretically import them from an external dataframe...

x_dimensions = [
                ["man","woman"],
                ["male","female"],
                ["he","she"],
                ["father","mother"],
                ["him","her"]
] # NOTE: this needs to be a list of lists, and your nested lists should only have ONE pair. same with y.

y_dimensions = [
                ["rich","poor"],
                ["wealthy","impoverished"],
                ["affluent","destitute"],
] # check out the appendices of Kozlowski et al and you will that they used loads of words here
# you can do as many dimensions as you want in your composite axis. check out a thesaurus!

## these are the same as before
## change to whatever test words you want! just make sure to follow the formatting
test_words = [
    "custodian",
    "player",
    "cleaner",
    "dentist",
    "secretary",
    "dancer",
    "professor",
    "engineer",
    "driver",
    "teacher",
    "nurse",
    "doctor",
    "lecturer",
    "lawyer",
    "paralegal",
    "athlete",
    "criminal",
    "unemployed"
]


## These are your axis labels, make them whatever you like

xlab = "man < 0 < woman"
ylab = "rich < 0 < poor"

## call the function, make your plot

advanced_projection(
    x_dimensions,
    y_dimensions,
    test_words,
    model,
    plot_size=8,
    xlab = xlab,
    ylab = ylab
)

### **Now this looks better! (from a projection perspective, the biases are still evident of course!)**

You can play with some composite axes, words, etc. on the Glove wikipedia vectors to look at whether there is some bias. Note that very small changes in the dimensions on the axes (adding or subtracting a pair) have a significant impact on the visualization. 

Try to come up with a different experiment about bias in English wikipedia, for example, looking at different countries and words about geopolitics, or music genres and class categories like Kozlowski et al do.

However, you could go further. For example, you could download and load a different model, and run all of these code cells again, and compare the results. Maybe Glove Wikipedia has less bias than Glove Twitter. That's interesting in itself.

This is the end of the tutorial, and you can move on to the next notebook to train your own w2v model, but remember they are not so useful for small datasets.