# Bias and fairness in machine learning

In this notebook, we'll take a hands on approach to the study of bias and fairness in machine learning, focusing primarily on word embeddings. Word embeddings are (comparatively) low-dimensional vector representations of words that attempt to capture semantics (or meaning). Conceptually, word embeddings are motivated by the observation that the meaning of a word is characterized by "the company that it keeps" (i.e., that we can learn about the meaning of words by looking at the contexts in which they appear). In recent years, advances in computation (e.g., GPUs) and machine learning have made it possible to train embeddings on large-scale text data, thereby making embeddings more valuable for commercial, scientific, and other applications. 

Like all machine learning algorithms, word embeddings depend on training data, specifically text, which is of course generated by human beings. Consequently, there is growing concern that word embeddings may encode human biases (e.g., streotypes about particular groups). Our goal in this session will be to see if we can identify such biases in several widely used, pre-trained word embedding models.

Without further ado, let's get started. 

# Roadmap
  * Preliminaries
  * Word embeddings
    * Similarities
    * Analogies
  * Language models
    * Next word prediction
    * Masked word prediction
  * Exercises

# Preliminaries

Let's start by loading some packages. We'll use gensim to download some pre-trained word embeddings and to run operations on the vectors. We'll use pandas for wrangling some data, numpy for some handy array operations, and itertools to help with some pairwise distances calculations.

In [1]:
# run the following commands to install the needed packages
"""
pip install pandas
pip install gensim
pip install numpy
"""

'\npip install pandas\npip install gensim\npip install numpy\n'

In [2]:
# load some packages
import gensim
import gensim.downloader as api
import itertools
import pandas as pd
import numpy as np

Now let's load our model. This is the set of pre-trained embeddings (i.e., we get a vector for each word). Our package, gensim, comes with some built in embeddings, and to keep things simple, we'll use those. To get us started, we'll load `glove-wiki-gigaword-50`, which are trained on the text of Wikipedia. The 100 means that our vectors are in $\mathbb{R}^{50}$; you don't need to worry about that now, but we'll come back to the dimensions later.

In [3]:
# download and load the model
embeddings = gensim.downloader.load("glove-wiki-gigaword-50")

You can get more information on our model like this.

In [4]:
api.info("glove-wiki-gigaword-50")

{'num_records': 400000,
 'file_size': 69182535,
 'base_dataset': 'Wikipedia 2014 + Gigaword 5 (6B tokens, uncased)',
 'reader_code': 'https://github.com/RaRe-Technologies/gensim-data/releases/download/glove-wiki-gigaword-50/__init__.py',
 'license': 'http://opendatacommons.org/licenses/pddl/',
 'parameters': {'dimension': 50},
 'description': 'Pre-trained vectors based on Wikipedia 2014 + Gigaword, 5.6B tokens, 400K vocab, uncased (https://nlp.stanford.edu/projects/glove/).',
 'preprocessing': 'Converted to w2v format with `python -m gensim.scripts.glove2word2vec -i <fname> -o glove-wiki-gigaword-50.txt`.',
 'read_more': ['https://nlp.stanford.edu/projects/glove/',
  'https://nlp.stanford.edu/pubs/glove.pdf'],
 'checksum': 'c289bc5d7f2f02c6dc9f2f9b67641813',
 'file_name': 'glove-wiki-gigaword-50.gz',
 'parts': 1}

# Word embeddings

Now that we've loaded our word embeddings, we're ready to start running some analyses. Recall that we're just working with a bunch of vectors. Let's check out the vectors for the word `test`.

In [5]:
embeddings["test"]

array([ 0.13175 , -0.25517 , -0.067915,  0.26193 , -0.26155 ,  0.23569 ,
        0.13077 , -0.011801,  1.7659  ,  0.20781 ,  0.26198 , -0.16428 ,
       -0.84642 ,  0.020094,  0.070176,  0.39778 ,  0.15278 , -0.20213 ,
       -1.6184  , -0.54327 , -0.17856 ,  0.53894 ,  0.49868 , -0.10171 ,
        0.66265 , -1.7051  ,  0.057193, -0.32405 , -0.66835 ,  0.26654 ,
        2.842   ,  0.26844 , -0.59537 , -0.5004  ,  1.5199  ,  0.039641,
        1.6659  ,  0.99758 , -0.5597  , -0.70493 , -0.0309  , -0.28302 ,
       -0.13564 ,  0.6429  ,  0.41491 ,  1.2362  ,  0.76587 ,  0.97798 ,
        0.58507 , -0.30176 ], dtype=float32)

## Most similar words

What's neat is that since we're working with vectors, we can start to do things like look for words that are similar (by finding nearby vectors). Here's how we would to that in gensim. 

In [6]:
embeddings.most_similar("cat")

[('dog', 0.9218006134033203),
 ('rabbit', 0.8487821817398071),
 ('monkey', 0.8041081428527832),
 ('rat', 0.7891963720321655),
 ('cats', 0.7865270376205444),
 ('snake', 0.7798910737037659),
 ('dogs', 0.7795815467834473),
 ('pet', 0.7792249917984009),
 ('mouse', 0.7731667757034302),
 ('bite', 0.7728800177574158)]

In [7]:
embeddings.most_similar("dog")

[('cat', 0.9218006134033203),
 ('dogs', 0.8513158559799194),
 ('horse', 0.7907583117485046),
 ('puppy', 0.7754921317100525),
 ('pet', 0.7724707722663879),
 ('rabbit', 0.7720814347267151),
 ('pig', 0.7490062117576599),
 ('snake', 0.7399188876152039),
 ('baby', 0.7395570874214172),
 ('bite', 0.738793671131134)]

In [8]:
embeddings.most_similar("obama")

[('barack', 0.9674172401428223),
 ('bush', 0.9642480611801147),
 ('clinton', 0.9606045484542847),
 ('mccain', 0.9122934937477112),
 ('dole', 0.8878742456436157),
 ('gore', 0.884803831577301),
 ('hillary', 0.8776552677154541),
 ('rodham', 0.8401790857315063),
 ('kerry', 0.8261429071426392),
 ('biden', 0.8095825910568237)]

Already, you can imagine how we might begin probing for potential biases. For example, we might look at the most similar words for different occupations. 

In [9]:
embeddings.most_similar("surgeon", topn=20)

[('physician', 0.8497322797775269),
 ('cardiologist', 0.7978282570838928),
 ('dentist', 0.795362114906311),
 ('orthopedic', 0.7693870663642883),
 ('neurologist', 0.7677544355392456),
 ('psychiatrist', 0.7599009871482849),
 ('surgeons', 0.7580606937408447),
 ('oncologist', 0.7523747086524963),
 ('pediatric', 0.7517416477203369),
 ('doctor', 0.7479072213172913),
 ('neurosurgeon', 0.7459368705749512),
 ('ophthalmologist', 0.7451258301734924),
 ('pathologist', 0.7448002696037292),
 ('nurse', 0.7376463413238525),
 ('orthopaedic', 0.737062931060791),
 ('internist', 0.7313891649246216),
 ('pediatrician', 0.7174053192138672),
 ('anesthesiologist', 0.7062750458717346),
 ('surgery', 0.7021859884262085),
 ('urologist', 0.6995974183082581)]

In [10]:
embeddings.most_similar("nurse", topn=20)

[('doctor', 0.7977497577667236),
 ('nurses', 0.7752917408943176),
 ('dentist', 0.7731257081031799),
 ('pregnant', 0.7462233901023865),
 ('pediatrician', 0.7452079653739929),
 ('therapist', 0.7396323084831238),
 ('surgeon', 0.7376462817192078),
 ('nursing', 0.7353047728538513),
 ('child', 0.7341340184211731),
 ('counselor', 0.7322410345077515),
 ('teacher', 0.7242345213890076),
 ('patient', 0.7242098450660706),
 ('psychiatrist', 0.7219806909561157),
 ('physician', 0.7205138206481934),
 ('parents', 0.7181951403617859),
 ('mother', 0.7177230715751648),
 ('woman', 0.7155020236968994),
 ('hospital', 0.7076544761657715),
 ('paramedic', 0.7050016522407532),
 ('anesthetist', 0.700419008731842)]

Notice that we see more clearly gendered words associated with nurse. Take a few minutes to enter alternative occupations in the code above. Do you find any interesting differences?

## Distances between words

We can also get more explicit in our queries. Rather than limiting our attention to the most similar words, let's go ahead and narrow in on the relationship among particular word pairs. 

In [11]:
embeddings.distance("surgeon", "he")

0.5610992014408112

In [12]:
embeddings.distance("surgeon", "she")

0.5496048629283905

In [13]:
embeddings.distance("nurse", "he")

0.5194914042949677

In [14]:
embeddings.distance("nurse", "she")

0.3543033003807068

The distances between "he" and "she" and doctor are quite similar. But there is quite a large gap between the corresponding distances between "he" and "she" and "nurse". As we did before, take a few minutes to explore distances among pairs of words that you think might be a useful diagnostic for biases. To help you explore a broader set of word pairs, here is a little function that will return a matrix of pairwise distances, given a list of words. 

In [15]:
def distance_matrix(embeddings, words):
  mtx = np.array([embeddings.distance(a, b) for a,b in itertools.product(words, words)]).reshape(len(words), len(words))
  return pd.DataFrame(mtx, index=words, columns=words).round(2)

You can use the function like so.

In [16]:
distance_matrix(embeddings=embeddings, words=["he", "she", "nurse", "ceo", "engineer"])

Unnamed: 0,he,she,nurse,ceo,engineer
he,0.0,0.11,0.52,0.61,0.51
she,0.11,0.0,0.35,0.69,0.59
nurse,0.52,0.35,0.0,0.8,0.52
ceo,0.61,0.69,0.8,0.0,0.57
engineer,0.51,0.59,0.52,0.57,0.0


## Vector magic

If you've read anything on word embeddings, you've probably seen some examples of analogies, the most famous probably being $queen = king - man + woman$. Remember that we're just working with vectors, so we can use vector arithmetic. Let's see if we can replicate the famous king/queen example. We'll start by looking at words similar to "king".

In [17]:
embeddings.most_similar("surgeon", topn=20)

[('physician', 0.8497322797775269),
 ('cardiologist', 0.7978282570838928),
 ('dentist', 0.795362114906311),
 ('orthopedic', 0.7693870663642883),
 ('neurologist', 0.7677544355392456),
 ('psychiatrist', 0.7599009871482849),
 ('surgeons', 0.7580606937408447),
 ('oncologist', 0.7523747086524963),
 ('pediatric', 0.7517416477203369),
 ('doctor', 0.7479072213172913),
 ('neurosurgeon', 0.7459368705749512),
 ('ophthalmologist', 0.7451258301734924),
 ('pathologist', 0.7448002696037292),
 ('nurse', 0.7376463413238525),
 ('orthopaedic', 0.737062931060791),
 ('internist', 0.7313891649246216),
 ('pediatrician', 0.7174053192138672),
 ('anesthesiologist', 0.7062750458717346),
 ('surgery', 0.7021859884262085),
 ('urologist', 0.6995974183082581)]

Now we'll do a little arithmetic, using some options built into gensim's `most_similar` method. So we're adding the vectors for "king" and "woman" and subtracting the vector for "man". 

In [18]:
embeddings.most_similar(positive=["woman", "king"], negative=["man"], topn=20)

[('queen', 0.8523604273796082),
 ('throne', 0.7664334177970886),
 ('prince', 0.759214460849762),
 ('daughter', 0.7473882436752319),
 ('elizabeth', 0.7460219860076904),
 ('princess', 0.7424570322036743),
 ('kingdom', 0.7337412238121033),
 ('monarch', 0.7214491367340088),
 ('eldest', 0.7184861898422241),
 ('widow', 0.7099431157112122),
 ('son', 0.7081551551818848),
 ('father', 0.7072948217391968),
 ('mother', 0.6993737816810608),
 ('emperor', 0.6989730596542358),
 ('grandson', 0.6946032047271729),
 ('wife', 0.6925390362739563),
 ('consort', 0.6895833611488342),
 ('family', 0.6888480186462402),
 ('cousin', 0.6867153644561768),
 ('marriage', 0.6804890632629395)]

Pretty impressive! But how can we use this to study bias and fairness in machine learning. Well, we might go back to our example with gendered occupations. What's your guess on what we'll get when we run $doctor - man + woman$?

In [19]:
embeddings.most_similar(positive=["woman", "doctor"], negative=["man"], topn=20)

[('nurse', 0.840464174747467),
 ('child', 0.7663259506225586),
 ('pregnant', 0.7570130228996277),
 ('mother', 0.7517457604408264),
 ('patient', 0.751666247844696),
 ('physician', 0.7507280707359314),
 ('dentist', 0.7360344529151917),
 ('therapist', 0.7342537045478821),
 ('parents', 0.7286345958709717),
 ('surgeon', 0.7165213823318481),
 ('teacher', 0.7138692736625671),
 ('doctors', 0.7117718458175659),
 ('birth', 0.7071055769920349),
 ('psychiatrist', 0.6999903321266174),
 ('girl', 0.6961426138877869),
 ('she', 0.6924219727516174),
 ('her', 0.6886029243469238),
 ('daughter', 0.6861442923545837),
 ('pediatrician', 0.6856350302696228),
 ('toddler', 0.6853212118148804)]

What if we try to probe more directly for what our word vectors think about gender roles? What might we get when we run $role - man + woman$ and $role - woman + man$?

In [20]:
embeddings.most_similar(positive=["man", "role"], negative=["woman"], topn=20)

[('as', 0.783684492111206),
 ('acting', 0.7572517991065979),
 ('future', 0.7556937336921692),
 ('roles', 0.7423116564750671),
 ('action', 0.7377783060073853),
 ('supporting', 0.7366944551467896),
 ('both', 0.7362772226333618),
 ('character', 0.7349316477775574),
 ('success', 0.7314870953559875),
 ("'s", 0.7283881902694702),
 ('well', 0.7250850796699524),
 ('this', 0.7232421040534973),
 ('powers', 0.7225099205970764),
 ('leadership', 0.7218052744865417),
 ('own', 0.7188370823860168),
 ('responsible', 0.7167803049087524),
 ('credited', 0.7158978581428528),
 ('also', 0.7135252356529236),
 ('whose', 0.7121561765670776),
 ('major', 0.7117612361907959)]

In [21]:
embeddings.most_similar(positive=["woman", "role"], negative=["man"], topn=20)

[('roles', 0.7808736562728882),
 ('relationship', 0.7652266621589661),
 ('acting', 0.7191237807273865),
 ('child', 0.7111478447914124),
 ('focuses', 0.6989119648933411),
 ('supporting', 0.697880208492279),
 ('her', 0.6972039341926575),
 ('engagement', 0.6958260536193848),
 ('marriage', 0.6946836113929749),
 ('life', 0.6913840174674988),
 ('’s', 0.6887852549552917),
 ('relations', 0.6872742176055908),
 ('she', 0.6801809668540955),
 ('recognition', 0.6747872233390808),
 ('character', 0.6741839051246643),
 ('focus', 0.6686821579933167),
 ('herself', 0.6652383208274841),
 ('status', 0.6644878387451172),
 ('collaboration', 0.6629896759986877),
 ('part', 0.6629751324653625)]

The results contain some pretty stereotypical gender characterizations, even on a quick glance, with the "man" word list including things like "futures", "success", "powers" and "leadership" and the "woman" list including things like "relationship", "child", "engagement", and "marriage".

Take a few minutes and adapt the code above to run some more analogies. Can you find any additional evidence of biases?

# Language models

There are a lot of interesting things we can do with word embeddings. As impressive as they are, though, they're just the tip of the iceberg in terms of what can (and is) being done with modern natural language processing. In this next section of our notebook, we'll narrow in on two particular examples, (1) next word prediction and (2) masked word prediction. 

## Next word prediction

We're all familiar with next word prediction. This is what's happening behind the scenes any time we run a Google search and see the automatic query suggestions. It's not too tricky to do next word prediction in Python, but it's a bit more involved than fiddling around with word embeddings, and since this isn't a methodological class, writing our own code will be a bit too much. Fortunately, there a lot of great online tools that will let us play around with state of the art models. 

Here is an online demonstration using the AllenNLP natural language processing platform. Under the hood, the demonstration uses GPT-2, which is a state of the art language model. 

[AllenNLP Next Token Demo](https://demo.allennlp.org/next-token-lm?text=AllenNLP%20is%20)

Take a few minutes to play around with the demonstration. Can you find any evidence of biases? How might you adapt some of the occupational examples we tried out above to the next work prediction context? Can you think of any other ways we might probe for biases?

### Bonus
If you want to check next word prediction for a much broader set of state of the art models (in a Google docs type environment), here is your chance.

[Huggingface Write With Transformer](https://transformer.huggingface.co/)

## Masked word prediction

Masked word prediction is pretty similar to next word prediction, except here, we're trying to predict a hidden word typically in the middle of some other text content. As with next word prediction, actually implementing masked word prediction is a bit beyond the scope of this class. But again, we're lucky that there are a lot of demos available online, including one from AllenNLP. 

[AllenNLP Masked Word Demo](https://demo.allennlp.org/masked-lm?text=The%20doctor%20ran%20to%20the%20emergency%20room%20to%20see%20%5BMASK%5D%20patient)

Once again, take a few minutes to fiddle with the demonstration. Can you find any evidence of biases? What kinds of tests might you do? Hint: the default example given by the AllenNLP creators is already quite revealing. 

# Exercises
  * The examples above were all based on a single set of word vectors, trained on Twitter data. In addition, the word vectors we used were fairly low dimensional. Repeat the exercises above, but using a different model. 
    * How much does the model size (dimensionality) make a difference?
    * Do you notice any differences when you try models trained on a different corpus (e.g., Wikipedia, Twitter)?
  * To more systematically uncover biases in word embeddings, previous research has attempted to adapt the Implicit Association Test (IAT), which is a test designed to unearth unconscious biases in humans. While we don't have time right now to do a systematic analysis, the various IATs that have been developed over the years can serve as some inspiration for additional queries on our word vectors. Take a look at the IAT website, [here](https://implicit.harvard.edu/implicit/selectatest.html). Pick a test, and look at the word pairs you're given. Run some distance and/or analogy queries for the different word pairs you're given. Can you find any evidence of biases?

# Loading a different model

To load a different model, just restart the notebook, and change the string `glove-wiki-gigaword-50` in the line of code below (but at the top of the notebook) to the model you'd like to use.

`embeddings = gensim.downloader.load("glove-wiki-gigaword-50")`

The following models are available in gensim; `wiki` and `twitter` indicate the source of the training data are Wikipedia or Twitter, respectively. The number indicates the dimensionality of the vectors. 
  * `glove-wiki-gigaword-50`
  * `glove-wiki-gigaword-100`
  * `glove-wiki-gigaword-200`
  * `glove-wiki-gigaword-300`
  * `glove-twitter-25`
  * `glove-twitter-50`
  * `glove-twitter-100`
  * `glove-twitter-200`