# Bias and Word Embeddings

Jupyter Notebook complementing blog post: 

## What are word embeddings?

Used for neural nets, benefits over 1 hot encodings

Chose GloVE over Word2vec/FastText because of ease of interacting with data, results should be somewhat similar

## Interfacing with Pre-Trained Embeddings

Here, we read in the data, and create a list of words contained in the embeddings, as well as a dictionary mapping words to vectors

In [438]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from nltk.corpus import wordnet as wn
import re

In [315]:
f = open('glove.6B/glove.6B.200d.txt', 'r')
vocab = []
embeddings = {}
matrix = []
lemm = WordNetLemmatizer()
for line in f:
    split_line = line.split()
    word = split_line[0]
    vector = np.asarray([float(i) for i in split_line[1:]])
    embeddings.update({word:vector})
    vocab.append(word)
    matrix.append(vector)

We use the definition from Levy, Goldberg (2014) when evaluating analogies. The canonical example is $queen - king \approx woman - man$, where $queen$ is the unknown vector.

We solve the analogy as follows(where $v$ is a word in the vocabulary, $V$, and $cos$ represents cosine similarity): $$argmax_{v \in V}(cos(v, woman - man + king)$$

In [530]:
def evaluate(word, relation, k = 4, return_list = False):
    """Takes in a target word, a list of two words expressing a relation,
    an integer denoting the amount of values that should be returned, and whether a list of words should be returned,
    or only the first item. k is set to 4 by default, in the case that the target word and the relation words
    answer the analogy.
    Returns a list of words of length [k - 3, k] that satisfy the analogy in order of decreasing cosine similarity"""
    if [i for i in [word, relation[0], relation[1]] if i not in embeddings]:
        raise ValueError("Word must be in vocabulary")
    result_vector = embeddings[relation[0]] - embeddings[relation[1]] + embeddings[word]
    nearest_indices = k_nearest_vectors(k, matrix, [result_vector])[0]
    closest_words = [vocab[i] for i in nearest_indices if vocab[i] != word and vocab[i] not in relation]
    if return_list:
        return closest_words
    else:
        return closest_words[0]

We define the following helper function to reduce the search space. 

In [544]:
def k_nearest_vectors(k, mtx, candidate_vector):
    """Takes in an integer value(k), a matrix (2D list) of all the vectors, and the vector (list) we want to compare.
    Returns an array of length k for indices of the most similar word vectors, and the cosine similarities of these vectors """
    cos_similarities = cosine_similarity(mtx, candidate_vector).flatten()
    k_sorted = np.flip(np.argsort(cos_similarities)[-k:], axis = 0)
    cos_sorted = np.flip(np.sort(cos_similarities), axis = 0)[:k]
    return k_sorted, cos_sorted

The following test tells us our helper function is correct: the item with the highest cosine similarity to a given word in the dataset will be the word itself. Because the function returns a list of indices, we check if the index corresponds most to the word we are checking by looking at its position in the vocabulary.

In [537]:
vocab.index('woman') == k_nearest_vectors(5, matrix, [embeddings['woman']])[0][0]

True

## Evaluation

Let's look at a couple analogies to make sure our evaluation function is working properly. We'll start with the example: "woman" is to "man" as "queen" is to "king."

In [278]:
evaluate('king', ['woman', 'man'])

'queen'

Embeddings can also reveal details like grammatical properties:

In [249]:
evaluate('hard', ['better', 'good']) #comparative adjectives

'harder'

In [250]:
evaluate('duck', ['men', 'man']) #singular/plural

'ducks'

In [251]:
evaluate('they', ['his', 'he']) #possessive forms

'their'

They can also help us answer questions about the world:

In [538]:
evaluate('china', ['moscow', 'russia']) #capitals of countries

'beijing'

In [305]:
evaluate('danzig', ['mumbai', 'bombay']) #former names of cities

'gdańsk'

In [314]:
evaluate('japan', ['europe', 'germany']) #country-continent mapping

'asia'

However, embeddings can also reveal problematic *biases* in language. 

In [317]:
evaluate('nurse', ['man', 'doctor']) #gender based on occupation

'woman'

In [383]:
evaluate('criminal', ['white', 'police']) #racial stereotypes

'black'

In [414]:
evaluate('terrorist', ['india', 'lawful']) #stereotypes of countries

'pakistan'

## Examining Biases

Let's try to figure out why this is the case by looking at some of the mathematical properties of the embeddings.

### 1. Preprocessing

In order to reduce the dimensionality of our search space, we're only going to consider words that describe people. The WordNet database organizes words based on semantics, so we're going to look at the _hyponyms_(words that could be subcategories of a given word) of the word "person." In WordNet, the entry for a word corresponds to a _synset_, which is a set of _senses_, or contexts, a word could be mentioned in.

The senses of "person" are listed as follows, we will use the first one: http://wordnetweb.princeton.edu/perl/webwn?s=person&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=

Although using built-in tools that perform tasks like named-entity recognition would be useful for this task, these are statistical/neural models that often require the word to be in a sentence.

In [487]:
def get_hyponyms(synset):
    """Get all words that are considered subcategories, or hyponyms of a particular noun.
    Takes in a WordNet Synset object, returns a set of words that are part of that synset"""
    hyponyms = set()
    for hyponym in synset.hyponyms():
        hyponyms |= set(get_hyponyms(hyponym)) #Gets union of all the hyponyms of the word
    hyponyms = hyponyms | set(synset.hyponyms())
    return hyponyms

types_of_people = {synset.name().split('.')[0] for synset in get_hyponyms(wn.synset('person.n.01'))}

We also hard-code a list of gender-specific words, and add additional logic to a function to tell us if a word is specific to a certain gender.

In [630]:
#From Danielle Sucher's https://github.com/DanielleSucher/Jailbreak-the-Patriarchy
male_words=set(['guy','spokesman','chairman',"men's",'men','him',"he's",'his',
                'boy','boyfriend','boyfriends','boys','brother','brothers','dad','dads','dude','father','fathers',
                'fiance','gentleman','gentlemen','god','grandfather','grandpa','grandson','groom','he','himself',
                'husband','husbands','king','male','man','mr','nephew','nephews','priest','prince','son',
                'sons','uncle','uncles','waiter','widower','widowers'])
female_words=set(['heroine','spokeswoman','chairwoman',"women's",
                  'actress','women',"she's",'her','aunt','aunts','bride',
                  'daughter','daughters','female','fiancee','girl','girlfriend','girlfriends','girls',
                  'goddess','granddaughter','grandma','grandmother','herself','ladies','lady','lady','mom',
                  'moms','mother','mothers','mrs','ms','niece','nieces','priestess','princess','queens','she',
                  'sister','sisters','waitress','widow','widows','wife','wives','woman'])
def gender_neutral(word):
    return (word not in male_words) and (word not in female_words) and (not word.endswith('man')) and (not word.endswith('woman'))


### 2. Finding the Closest Words

Using our embeddings for people, let's look at the most similar words to a target word, like "man."

In [631]:
def compare_persons(k, word, gender_bias = False):
    #Here, we can use the cosine similarity array returned by k_nearest_vectors!
    k_nearest = k_nearest_vectors(k, matrix, [embeddings[word]])
    people_tuples = [(vocab[k_nearest[0][i]], k_nearest[1][i], i) for i in np.arange(len(k_nearest[0])) 
            if vocab[k_nearest[0][i]] in types_of_people and vocab[k_nearest[0][i]] != word]
    if gender_bias:
        people_tuples = [t for t in people_tuples if gender_neutral(t[0])]
    return people_tuples

In [638]:
compare_persons(300, 'man', gender_bias = True)

[('life', 0.6307883963794974, 12),
 ('soldier', 0.591583998050716, 27),
 ('victim', 0.5786380246447363, 38),
 ('friend', 0.5766715687263276, 40),
 ('hand', 0.5601769983335778, 58),
 ('suspect', 0.5480306904441015, 85),
 ('back', 0.5473999376575065, 87),
 ('hero', 0.5459395422007508, 90),
 ('face', 0.5428986123388589, 98),
 ('killer', 0.5320978301464728, 115),
 ('second', 0.5257842876491682, 130),
 ('shot', 0.5249789668806233, 132),
 ('character', 0.5227359871777943, 138),
 ('great', 0.5185942815595146, 156),
 ('actor', 0.5092920171540738, 178),
 ('child', 0.5035236174166932, 203),
 ('case', 0.502722658076125, 206),
 ('best', 0.49673880713822366, 235),
 ('witness', 0.4931514459178098, 247),
 ('sort', 0.49218978220344056, 251),
 ('player', 0.49023664128163225, 263),
 ('attacker', 0.4891015907744395, 266),
 ('black', 0.48886219780365137, 268),
 ('lover', 0.4882798709468317, 271),
 ('officer', 0.48768857661959075, 274),
 ('head', 0.4851015081745771, 284),
 ('doctor', 0.48345658170396777, 2

In [636]:
compare_persons(200, 'woman', gender_bias = True)

[('victim', 0.6407717678973269, 14),
 ('child', 0.6294301260530981, 16),
 ('lover', 0.5692697895600373, 28),
 ('nurse', 0.5646969077183875, 30),
 ('friend', 0.564160484551198, 31),
 ('life', 0.5593739976157563, 33),
 ('soldier', 0.5581014336367704, 36),
 ('worker', 0.5545941541036009, 39),
 ('prostitute', 0.5371960748744896, 47),
 ('baby', 0.5344769899519755, 52),
 ('teacher', 0.5338314430177431, 53),
 ('housewife', 0.531849122933435, 54),
 ('doctor', 0.5293920825671159, 57),
 ('married', 0.5071280346270718, 82),
 ('birth', 0.5048731795435029, 87),
 ('patient', 0.48858733901003293, 118),
 ('black', 0.4878013038637666, 119),
 ('attacker', 0.4832373010057014, 140),
 ('witness', 0.47893709378973437, 153),
 ('athlete', 0.47582035817977636, 164),
 ('case', 0.47466957549254385, 169),
 ('lawyer', 0.4741562391662266, 170),
 ('stranger', 0.47383268845460397, 173),
 ('student', 0.4716806493229572, 180),
 ('politician', 0.47160690569794794, 181),
 ('journalist', 0.4713477539783594, 184),
 ('maid'

For "man" and "woman", we see that there are a lot of gender-specific words like "son" and "daughter," but we also see some trends like "prostitute" and "teacher" being much closer to the vector for "woman," when we see words like "soldier" and "hero" being closer to the vector for "man."

In [612]:
compare_persons(60, 'citizen')

[('journalist', 0.5365671755406114, 4),
 ('resident', 0.5335940780111741, 5),
 ('american', 0.524323772697212, 7),
 ('immigrant', 0.5149044179025113, 9),
 ('canadian', 0.5087608372053418, 10),
 ('businessman', 0.4919954582587013, 15),
 ('ordinary', 0.49197911067484723, 16),
 ('politician', 0.49193664604583437, 17),
 ('lawyer', 0.4886279012698642, 18),
 ('advocate', 0.4798776509638012, 21),
 ('soldier', 0.4706849981014689, 23),
 ('diplomat', 0.47035736040449666, 24),
 ('woman', 0.4632373958648325, 29),
 ('fellow', 0.4516985015782866, 35),
 ('worker', 0.4481144856053686, 37),
 ('student', 0.443539956291243, 41),
 ('member', 0.4352997475956328, 46),
 ('man', 0.43399099844859074, 50),
 ('jew', 0.42736058974833313, 58)]

In [597]:
compare_persons(100, 'immigrant')

[('migrant', 0.6586581622064355, 2),
 ('worker', 0.5467345173802629, 7),
 ('haitian', 0.5171931693127529, 10),
 ('citizen', 0.5149044179025113, 11),
 ('refugee', 0.4999778243758676, 19),
 ('mexican', 0.4850413018421284, 21),
 ('african-american', 0.4609086396751774, 35),
 ('child', 0.45092261580120474, 42),
 ('laborer', 0.44883905259640056, 46),
 ('jew', 0.43974927338906955, 53),
 ('native', 0.43409641745666205, 58),
 ('slave', 0.4277706757138491, 67),
 ('peasant', 0.42618831955368186, 69),
 ('mother', 0.41915443702011984, 76),
 ('settler', 0.41906893504267106, 77),
 ('ethnic', 0.4190071505342275, 78),
 ('hmong', 0.41789418111914434, 81),
 ('homeless', 0.41577220439169393, 87),
 ('gypsy', 0.4137312893354037, 89),
 ('cuban', 0.41006546518145714, 94)]

For "citizen" and "immigrant," we see terms that are biased toward professional occupations and poverty, respectively. "Immigrant" yielded more nationalities than "citizen." Inputting specific nationalities only outputted vectors for other nationalities (i.e. "american" is near "canadian")

In [605]:
compare_persons(100, 'islam')

[('muslim', 0.63936552999292, 4),
 ('religious', 0.6055856303162457, 6),
 ('prophet', 0.5690866955013705, 10),
 ('fundamentalist', 0.5615563148585314, 12),
 ('imam', 0.5159405941281745, 24),
 ('extremist', 0.5112426507672221, 26),
 ('sufi', 0.4988681944633398, 33),
 ('radical', 0.4985477790676014, 34),
 ('wahhabi', 0.47694639943682626, 53),
 ('shiite', 0.4716709054408388, 58),
 ('militant', 0.4659725130662824, 60),
 ('islamist', 0.46129436122721695, 64),
 ('cleric', 0.4462623660480373, 72),
 ('christian', 0.43232337579933255, 87),
 ('arab', 0.42549909536029623, 97)]

In [610]:
compare_persons(100, 'christianity')

[('religious', 0.5704454681564426, 23),
 ('christian', 0.5650218802997707, 26),
 ('pagan', 0.5551570324668353, 28),
 ('roman', 0.48583317469927545, 65),
 ('catholic', 0.4855725548429508, 66),
 ('convert', 0.48535247510562557, 67),
 ('protestant', 0.4639060874298178, 84),
 ('byzantine', 0.4615717798917954, 86),
 ('pentecostal', 0.4601624667361376, 90)]

Along with religious words, the vector for "islam" is also close to words like "militant" and "extremist," but this trend does not appear for other religions like Christianity. As a whole, this approach allows us to see some trends, but nothing too specific.

## 3. Projections onto Spaces

As detailed in Bolukbasi et al (2016), if we take the dot product of one vector and the vector representing the difference between two words in a relation, we'll get to see how biased that word is in comparison to the rest of the words in our dataset. In linear algebra terms, the vector is being projected onto the space spanned by the biased words. For instance, let's see what happens to our embeddings when compared to $she - he$.



In [688]:
def projection(word1, word2, filter_people = False):
    dot_products = {}
    target_vector = embeddings[word1] - embeddings[word2]
    for e in embeddings:
        if (filter_people and e in types_of_people) or not filter_people:
            dot_products[e] = np.dot(embeddings[e], target_vector)
    sorted_by_value = sorted(dot_products.items(), key=lambda kv: kv[1])
    return sorted_by_value[:30], sorted_by_value[-30:]

In [690]:
projection('she', 'he', filter_people = True)

([('shortstop', -7.866294980667501),
  ('quarterback', -7.678623495354001),
  ('punter', -7.2444960515638),
  ('outfielder', -6.9177984875780005),
  ('bowler', -6.8500815299922015),
  ('receiver', -6.735542888795701),
  ('legate', -6.6034683495648405),
  ('linebacker', -6.536085448409601),
  ('catcher', -6.4791908237107805),
  ('coach', -6.333899579356301),
  ('cardinal', -6.297014219020201),
  ('whitey', -6.26096189180314),
  ('pitcher', -6.252590102720001),
  ('tackle', -6.2182935602764005),
  ('cornerback', -6.127229501230001),
  ('spokesman', -6.098534586542301),
  ('halfback', -6.0968848928203),
  ('kicker', -6.053579904265401),
  ('prebendary', -6.0239528710942),
  ('pontifex', -6.0055899539254),
  ('general', -5.948584890111202),
  ('leader', -5.921962464129061),
  ('supremo', -5.874865461852),
  ('rusher', -5.85202198737869),
  ('lineman', -5.8457150827908),
  ('flanker', -5.7509251742503995),
  ('playmaker', -5.744284934085),
  ('hitter', -5.7391065158421),
  ('antipope', -5.6

In [700]:
projection('christianity', 'islam', filter_people=True)

([('islamist', -16.6025316387151),
  ('shiite', -16.199378103045298),
  ('militant', -15.1666539752427),
  ('ayatollah', -15.08684344561948),
  ('malik', -14.793646995352901),
  ('imam', -14.77095608413),
  ('muslim', -14.7235459709965),
  ('pakistani', -14.5791543856673),
  ('khan', -14.3126777920802),
  ('cleric', -14.209159374016199),
  ('extremist', -14.003685587469),
  ('saudi', -13.99863485353275),
  ('uzbek', -13.627520026573553),
  ('palestinian', -13.391144966903198),
  ('terrorist', -13.2252381969951),
  ('mullah', -12.8332396852572),
  ('sheik', -12.721732502827098),
  ('sultan', -12.576778063705),
  ('iraqi', -12.352310480974301),
  ('arab', -12.274642209769802),
  ('mufti', -11.92566948831333),
  ('wahhabi', -11.8990821003601),
  ('terror', -11.7864842051484),
  ('kashmiri', -11.4997480425579),
  ('lebanese', -11.160095618294159),
  ('fundamentalist', -10.9845281794652),
  ('kuwaiti', -10.853081371832799),
  ('algerian', -10.77075050081354),
  ('jordanian', -10.63638702231

In [712]:
projection('citizen', 'immigrant', filter_people = True)

([('immigrant', -21.82061844267234),
  ('migrant', -16.0980524684093),
  ('huguenot', -12.8639495360054),
  ('gypsy', -12.8619533918221),
  ('sephardi', -12.7578259644185),
  ('squatter', -12.035755525410181),
  ('laborer', -11.8791138126678),
  ('refugee', -10.5070822362732),
  ('protestant', -10.4690636154425),
  ('nativist', -10.13034726979675),
  ('bantu', -10.075654766595399),
  ('ethnic', -10.0336247598518),
  ('brick', -9.950020202637),
  ('sharecropper', -9.75597473137455),
  ('peasant', -9.6837831575813),
  ('slave', -9.4864931854122),
  ('menial', -9.2491721379344),
  ('settler', -9.2197836398765),
  ('ashkenazi', -9.1087288814775),
  ('melkite', -9.0962071489494),
  ('sicilian', -9.00402892980575),
  ('haitian', -8.934244488505),
  ('emigrant', -8.8923748715085),
  ('bricklayer', -8.8666260641733),
  ('homeless', -8.8343280949987),
  ('hmong', -8.8019893676988),
  ('creole', -8.6676025781352),
  ('loader', -8.616801787553179),
  ('arawak', -8.43548388444135),
  ('mennonite',

Works Cited:
https://nlp.stanford.edu/projects/glove

https://levyomer.files.wordpress.com/2014/04/linguistic-regularities-in-sparse-and-explicit-word-representations-conll-2014.pdf Note: Although this publication comes from an Israeli university, I only acknowledge the individual researchers and not the institution, as it is complicit in the state of Israel's oppressive settler colonial practices.

https://arxiv.org/pdf/1607.06520.pdf

Further Reading:
Original Word2Vec paper https://arxiv.org/pdf/1310.4546.pdf

Using word embeddings to look at stereotypes in historical texts https://arxiv.org/abs/1711.08412
