# Word Embeddings
## As used in ["Word embeddings quantify 100 years of gender and ethnic stereotypes"](https://doi.org/10.1073/pnas.1720347115)

The article by Garg et al. investigates and validates the use of machine-learned word embeddings to study biases in language:

```
"In word-embedding models, each word in a given language is assigned to a high-dimensional vector such that the geometry of the vectors captures semantic relations between the words — e.g., vectors being closer together has been shown to correspond to more similar words."
```

Using pre-trained models of large text corpora, the authors evaluate vectors of words relating to gender and ethnicity against "neutral" word categories to measure bias.

### Load model(s) and look at vectors

```
"For contemporary snapshot analysis, we use the standard Google News word2vec vectors trained on the Google News dataset."
```

In [None]:
from gensim.models.keyedvectors import KeyedVectors
model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)

In [None]:
print("Length of each word's vector (for this model):", len(model["cat"]))
print("Sample vector for 'cat':")
print(model["cat"])

## How to build a model of your own

In [None]:
from gensim.test.utils import common_texts
from gensim.models import Word2Vec

new_model = Word2Vec(sentences=common_texts, vector_size=100, window=5, min_count=1, workers=4)
# new_model.save("word2vec.model")
new_model.wv["computer"]

In [None]:
common_texts

In [None]:
from nltk.tokenize import sent_tokenize, word_tokenize
nltk.download('punkt')
from gensim.models import Word2Vec

with open("KafkaMetamorphosis.txt") as f:
    text = f.read().lower()
    sentences_1 = [word_tokenize(s) for s in sent_tokenize(text)]

kafka_model = Word2Vec(sentences=sentences_1, vector_size=100, window=5, min_count=1, workers=4)
# new_model.save("word2vec.model")
kafka_model.wv.most_similar("vermin")

In [None]:
sentences_1

### What can you do with word vectors?

The [documentation](https://radimrehurek.com/gensim/models/keyedvectors.html) (API) lists out all the associated functions and how they work.

In [None]:
model.similar_by_word("cat")

In [None]:
model.most_similar("democracy")

In [None]:
# "man" is to "king" as "woman" is to _______

model.most_similar(positive=['woman', 'king'], negative=['man'])

In [None]:
# "paltry" is to "significance" as "banal" is to _______

model.most_similar(positive=['banal', 'significance'], negative=['paltry'])

In [None]:
# "clumsy" is to "botch" as "lazy" is to ________

model.most_similar(positive=['botch', 'lazy'], negative=['clumsy'])

In [None]:
model.doesnt_match("breakfast cereal dinner lunch".split())

In [None]:
model.similarity('woman', 'man')

In [None]:
model.most_similar("freedom", topn=100)

### Computing Average Embeddings

```
"We first compute the average embedding distance between words that represent women—e.g., she, female—and words for occupations—e.g., teacher, lawyer."
```

In [None]:
# Calculate mean of vectors among all words of each category.

import numpy as np

woman_words = ["she", "daughter", "hers", "her", "mother", "woman", "girl", "herself", "female", "sister", "daughters", "mothers", "women",
"girls", "femen", "sisters", "aunt", "aunts", "niece", "nieces"]

man_words = ["he", "son", "his", "him", "father", "man", "boy", "himself", "male", "brother", "sons", "fathers", "men", "boys", "males", "brothers", "uncle",
"uncles", "nephew", "nephews"]

occupations = ["janitor", "statistician", "midwife", "bailiff", "auctioneer", "photographer", "geologist", "shoemaker", "athlete", "cashier",
"dancer", "housekeeper", "accountant", "physicist", "gardener", "dentist", "weaver", "blacksmith", "psychologist", "supervisor",
"mathematician", "surveyor", "tailor", "designer", "economist", "mechanic", "laborer", "postmaster", "broker", "chemist", "librarian", "attendant", "clerical", "musician", "porter", "scientist", "carpenter", "sailor", "instructor", "sheriff", "pilot", "inspector", "mason",
"baker", "administrator", "architect", "collector", "operator", "surgeon", "driver", "painter", "conductor", "nurse", "cook", "engineer",
"retired", "sales", "lawyer", "clergy", "physician", "farmer", "clerk", "manager", "guard", "artist", "smith", "official", "police", "doctor",
"professor", "student", "judge", "teacher", "author", "secretary", "soldier"]

prof_occupations = ["statistician", "auctioneer", "photographer", "geologist", "accountant", "physicist", "dentist", "psychologist", "supervisor", "mathematician", "designer", "economist", "postmaster", "broker", "chemist", "librarian", "scientist", "instructor",
"pilot", "administrator", "architect", "surgeon", "nurse", "engineer", "lawyer", "physician", "manager", "official", "doctor", "professor",
"student", "judge", "teacher", "author"]

personality_traits = ['disorganized', 'devious', 'impressionable', 'circumspect', 'impassive', 'aimless', 'effeminate', 'unfathomable', 'fickle', 'unprincipled', 'inoffensive', 'reactive', 'providential', 'resentful', 'bizarre', 'impractical', 'sarcastic', 'misguided', 'imitative', 'pedantic', 'venomous', 'erratic', 'insecure', 'resourceful', 'neurotic', 'forgiving', 'profligate', 'whimsical', 'assertive', 'incorruptible', 'individualistic', 'faithless', 'disconcerting', 'barbaric', 'hypnotic', 'vindictive', 'observant', 'dissolute', 'frightening', 'complacent', 'boisterous', 'pretentious', 'disobedient', 'tasteless', 'sedentary', 'sophisticated', 'regimental', 'mellow', 'deceitful', 'impulsive', 'playful', 'sociable', 'methodical', 'willful', 'idealistic', 'boyish', 'callous', 'pompous', 'unchanging', 'crafty', 'punctual', 'compassionate', 'intolerant', 'challenging', 'scornful', 'possessive', 'conceited', 'imprudent', 'dutiful', 'lovable', 'disloyal', 'dreamy', 'appreciative', 'forgetful', 'unrestrained', 'forceful', 'submissive', 'predatory', 'fanatical', 'illogical', 'tidy', 'aspiring', 'studious', 'adaptable', 'conciliatory', 'artful', 'thoughtless', 'deceptive', 'frugal', 'reflective', 'insulting', 'unreliable', 'stoic', 'hysterical', 'rustic', 'inhibited', 'outspoken', 'unhealthy', 'ascetic', 'skeptical', 'painstaking', 'contemplative', 'leisurely', 'sly', 'mannered', 'outrageous', 'lyrical', 'placid', 'cynical', 'irresponsible', 'vulnerable', 'arrogant', 'persuasive', 'perverse', 'steadfast', 'crisp', 'envious', 'naive', 'greedy', 'presumptuous', 'obnoxious', 'irritable', 'dishonest', 'discreet', 'sporting', 'hateful', 'ungrateful', 'frivolous', 'reactionary', 'skillful', 'cowardly', 'sordid', 'adventurous', 'dogmatic', 'intuitive', 'bland', 'indulgent', 'discontented', 'dominating', 'articulate', 'fanciful', 'discouraging', 'treacherous', 'repressed', 'moody', 'sensual', 'unfriendly', 'optimistic', 'clumsy', 'contemptible', 'focused', 'haughty', 'morbid', 'disorderly', 'considerate', 'humorous', 'preoccupied', 'airy', 'impersonal', 'cultured', 'trusting', 'respectful', 'scrupulous', 'scholarly', 'superstitious', 'tolerant', 'realistic', 'malicious', 'irrational', 'sane', 'colorless', 'masculine', 'witty', 'inert', 'prejudiced', 'fraudulent', 'blunt', 'childish', 'brittle', 'disciplined', 'responsive', 'courageous', 'bewildered', 'courteous', 'stubborn', 'aloof', 'sentimental', 'athletic', 'extravagant', 'brutal', 'manly', 'cooperative', 'unstable', 'youthful', 'timid', 'amiable', 'retiring', 'fiery', 'confidential', 'relaxed', 'imaginative', 'mystical', 'shrewd', 'conscientious', 'monstrous', 'grim', 'questioning', 'lazy', 'dynamic', 'gloomy', 'troublesome', 'abrupt', 'eloquent', 'dignified', 'hearty', 'gallant', 'benevolent', 'maternal', 'paternal', 'patriotic', 'aggressive', 'competitive', 'elegant', 'flexible', 'gracious', 'energetic', 'tough', 'contradictory', 'shy', 'careless', 'cautious', 'polished', 'sage', 'tense', 'caring', 'suspicious', 'sober', 'neat', 'transparent', 'disturbing', 'passionate', 'obedient', 'crazy', 'restrained', 'fearful', 'daring', 'prudent', 'demanding', 'impatient', 'cerebral', 'calculating', 'amusing', 'honorable', 'casual', 'sharing', 'selfish', 'ruined', 'spontaneous', 'admirable', 'conventional', 'cheerful', 'solitary', 'upright', 'stiff', 'enthusiastic', 'petty', 'dirty', 'subjective', 'heroic', 'stupid', 'modest', 'impressive', 'orderly', 'ambitious', 'protective', 'silly', 'alert', 'destructive', 'exciting', 'crude', 'ridiculous', 'subtle', 'mature', 'creative', 'coarse', 'passive', 'oppressed', 'accessible', 'charming', 'clever', 'decent', 'miserable', 'superficial', 'shallow', 'stern', 'winning', 'balanced', 'emotional', 'rigid', 'invisible', 'desperate', 'cruel', 'romantic', 'agreeable', 'hurried', 'sympathetic', 'solemn', 'systematic', 'vague', 'peaceful', 'humble', 'dull', 'expedient', 'loyal', 'decisive', 'arbitrary', 'earnest', 'confident', 'conservative', 'foolish', 'moderate', 'helpful', 'delicate', 'gentle', 'dedicated', 'hostile', 'generous', 'reliable', 'dramatic', 'precise', 'calm', 'healthy', 'attractive', 'artificial', 'progressive', 'odd', 'confused', 'rational', 'brilliant', 'intense', 'genuine', 'mistaken', 'driving', 'stable', 'objective', 'sensitive', 'neutral', 'strict', 'angry', 'profound', 'smooth', 'ignorant', 'thorough', 'logical', 'intelligent', 'extraordinary', 'experimental', 'steady', 'formal', 'faithful', 'curious', 'reserved', 'honest', 'busy', 'educated', 'liberal', 'friendly', 'efficient', 'sweet', 'surprising', 'mechanical', 'clean', 'critical', 'criminal', 'soft', 'proud', 'quiet', 'weak', 'anxious', 'solid', 'complex', 'grand', 'warm', 'slow', 'false', 'extreme', 'narrow', 'dependent', 'wise', 'organized', 'pure', 'directed', 'dry', 'obvious', 'popular', 'capable', 'secure', 'active', 'independent', 'ordinary', 'fixed', 'practical', 'serious', 'fair', 'understanding', 'constant', 'cold', 'responsible', 'deep', 'religious', 'private', 'simple', 'physical', 'original', 'working', 'strong', 'modern', 'determined', 'open', 'political', 'difficult', 'knowledge', 'kind']

average_woman_words = np.mean(np.array([model[word] for word in woman_words if word in model]), axis = 0)
average_man_words = np.mean(np.array([model[word] for word in man_words if word in model]), axis = 0)
average_occupations = np.mean(np.array([model[word] for word in occupations if word in model]), axis = 0)
average_prof_occupations = np.mean(np.array([model[word] for word in prof_occupations if word in model]), axis = 0)

In [None]:
print(len(average_woman_words))
print(len(average_man_words))
print(len(average_occupations))
print(len(average_prof_occupations))

In [None]:
# Calculate distances between occations and 'man' and 'woman' words.
def cossim(v1, v2, signed = True):
    c = np.dot(v1, v2)/np.linalg.norm(v1)/np.linalg.norm(v2)
    if not signed:
        return abs(c)
    return c

def calc_distance_between_vectors(vec1, vec2, distype = ''):
    if distype == 'norm':
        return np.linalg.norm(np.subtract(vec1, vec2))
    else:
        return cossim(vec1, vec2)

occupations_to_woman = calc_distance_between_vectors(average_occupations, average_woman_words)
occupations_to_man = calc_distance_between_vectors(average_occupations, average_man_words)
print("Occupation distances (women, men):", occupations_to_woman, occupations_to_man)

prof_occupations_to_woman = calc_distance_between_vectors(average_prof_occupations, average_woman_words)
prof_occupations_to_man = calc_distance_between_vectors(average_prof_occupations, average_man_words)
print("Occupation distances (women, men):", prof_occupations_to_woman, prof_occupations_to_man)

```
"A natural metric for the embedding bias is the average distance for women minus the average distance for men. If this value is negative, then the embedding more closely associates the occupations with men."
```

In [None]:
embedding_bias = occupations_to_woman - occupations_to_man
print(embedding_bias)

In [None]:
import pandas as pd

output = []

for occupation in occupations:
    man = calc_distance_between_vectors(model[occupation], average_man_words)
    woman = calc_distance_between_vectors(model[occupation], average_woman_words)
    output.append([occupation, woman - man])  

pd.set_option('display.max_rows', 1000)
display(pd.DataFrame(output, columns=["Occupation", "Woman Bias"]))

In [None]:
output = []

for occupation in prof_occupations:
    man = calc_distance_between_vectors(model[occupation], average_man_words)
    woman = calc_distance_between_vectors(model[occupation], average_woman_words)
    output.append([occupation, woman - man])  

pd.set_option('display.max_rows', 1000)
display(pd.DataFrame(output, columns=["Occupation", "Woman Bias"]))

In [None]:
output = []

for t in personality_traits:
    man = calc_distance_between_vectors(model[t], average_man_words)
    woman = calc_distance_between_vectors(model[t], average_woman_words)
    output.append([t, woman - man])  

pd.set_option('display.max_rows', 1000)
display(pd.DataFrame(output, columns=["Trait", "Woman Bias"]))

### Temporal Data

Loading Sample COHA corpora, divided by decade.

In [None]:
import pickle
import numpy as np
from gensim.models.keyedvectors import KeyedVectors

with open("1850-w.npy", "rb") as f:
    vectors = np.lib.format.read_array(f)

with open("1850-vocab.pkl", "rb") as f:
    vocab = pickle.load(f)

model1850s = KeyedVectors(vectors.shape[1])
model1850s.add_vectors(vocab, vectors)

In [None]:
with open("1900-w.npy", "rb") as f:
    vectors = np.lib.format.read_array(f)

with open("1900-vocab.pkl", "rb") as f:
    vocab = pickle.load(f)

model1900s = KeyedVectors(vectors.shape[1])
model1900s.add_vectors(vocab, vectors)

In [None]:
with open("1950-w.npy", "rb") as f:
    vectors = np.lib.format.read_array(f)

with open("1950-vocab.pkl", "rb") as f:
    vocab = pickle.load(f)

model1950s = KeyedVectors(vectors.shape[1])
model1950s.add_vectors(vocab, vectors)

In [None]:
with open("2000-w.npy", "rb") as f:
    vectors = np.lib.format.read_array(f)

with open("2000-vocab.pkl", "rb") as f:
    vocab = pickle.load(f)

model2000s = KeyedVectors(vectors.shape[1])
model2000s.add_vectors(vocab, vectors)

In [None]:
def compile_data(fname):
    with open(fname) as fin:
        rlist = [n.replace("\n","") for n in fin.readlines()]
        return rlist

white_names = compile_data("./data/names_white.txt")

asian_names = compile_data("./data/names_asian.txt")

black_names = compile_data("./data/names_black.txt")

words_christianity = compile_data("./data/words_christianity.txt")

words_islam = compile_data("./data/words_islam.txt")

adjectives_otherization = compile_data("./data/adjectives_otherization.txt")


personality_traits = ['disorganized', 'devious', 'impressionable', 'circumspect', 'impassive', 'aimless', 'effeminate', 'unfathomable', 'fickle', 'unprincipled', 'inoffensive', 'reactive', 'providential', 'resentful', 'bizarre', 'impractical', 'sarcastic', 'misguided', 'imitative', 'pedantic', 'venomous', 'erratic', 'insecure', 'resourceful', 'neurotic', 'forgiving', 'profligate', 'whimsical', 'assertive', 'incorruptible', 'individualistic', 'faithless', 'disconcerting', 'barbaric', 'hypnotic', 'vindictive', 'observant', 'dissolute', 'frightening', 'complacent', 'boisterous', 'pretentious', 'disobedient', 'tasteless', 'sedentary', 'sophisticated', 'regimental', 'mellow', 'deceitful', 'impulsive', 'playful', 'sociable', 'methodical', 'willful', 'idealistic', 'boyish', 'callous', 'pompous', 'unchanging', 'crafty', 'punctual', 'compassionate', 'intolerant', 'challenging', 'scornful', 'possessive', 'conceited', 'imprudent', 'dutiful', 'lovable', 'disloyal', 'dreamy', 'appreciative', 'forgetful', 'unrestrained', 'forceful', 'submissive', 'predatory', 'fanatical', 'illogical', 'tidy', 'aspiring', 'studious', 'adaptable', 'conciliatory', 'artful', 'thoughtless', 'deceptive', 'frugal', 'reflective', 'insulting', 'unreliable', 'stoic', 'hysterical', 'rustic', 'inhibited', 'outspoken', 'unhealthy', 'ascetic', 'skeptical', 'painstaking', 'contemplative', 'leisurely', 'sly', 'mannered', 'outrageous', 'lyrical', 'placid', 'cynical', 'irresponsible', 'vulnerable', 'arrogant', 'persuasive', 'perverse', 'steadfast', 'crisp', 'envious', 'naive', 'greedy', 'presumptuous', 'obnoxious', 'irritable', 'dishonest', 'discreet', 'sporting', 'hateful', 'ungrateful', 'frivolous', 'reactionary', 'skillful', 'cowardly', 'sordid', 'adventurous', 'dogmatic', 'intuitive', 'bland', 'indulgent', 'discontented', 'dominating', 'articulate', 'fanciful', 'discouraging', 'treacherous', 'repressed', 'moody', 'sensual', 'unfriendly', 'optimistic', 'clumsy', 'contemptible', 'focused', 'haughty', 'morbid', 'disorderly', 'considerate', 'humorous', 'preoccupied', 'airy', 'impersonal', 'cultured', 'trusting', 'respectful', 'scrupulous', 'scholarly', 'superstitious', 'tolerant', 'realistic', 'malicious', 'irrational', 'sane', 'colorless', 'masculine', 'witty', 'inert', 'prejudiced', 'fraudulent', 'blunt', 'childish', 'brittle', 'disciplined', 'responsive', 'courageous', 'bewildered', 'courteous', 'stubborn', 'aloof', 'sentimental', 'athletic', 'extravagant', 'brutal', 'manly', 'cooperative', 'unstable', 'youthful', 'timid', 'amiable', 'retiring', 'fiery', 'confidential', 'relaxed', 'imaginative', 'mystical', 'shrewd', 'conscientious', 'monstrous', 'grim', 'questioning', 'lazy', 'dynamic', 'gloomy', 'troublesome', 'abrupt', 'eloquent', 'dignified', 'hearty', 'gallant', 'benevolent', 'maternal', 'paternal', 'patriotic', 'aggressive', 'competitive', 'elegant', 'flexible', 'gracious', 'energetic', 'tough', 'contradictory', 'shy', 'careless', 'cautious', 'polished', 'sage', 'tense', 'caring', 'suspicious', 'sober', 'neat', 'transparent', 'disturbing', 'passionate', 'obedient', 'crazy', 'restrained', 'fearful', 'daring', 'prudent', 'demanding', 'impatient', 'cerebral', 'calculating', 'amusing', 'honorable', 'casual', 'sharing', 'selfish', 'ruined', 'spontaneous', 'admirable', 'conventional', 'cheerful', 'solitary', 'upright', 'stiff', 'enthusiastic', 'petty', 'dirty', 'subjective', 'heroic', 'stupid', 'modest', 'impressive', 'orderly', 'ambitious', 'protective', 'silly', 'alert', 'destructive', 'exciting', 'crude', 'ridiculous', 'subtle', 'mature', 'creative', 'coarse', 'passive', 'oppressed', 'accessible', 'charming', 'clever', 'decent', 'miserable', 'superficial', 'shallow', 'stern', 'winning', 'balanced', 'emotional', 'rigid', 'invisible', 'desperate', 'cruel', 'romantic', 'agreeable', 'hurried', 'sympathetic', 'solemn', 'systematic', 'vague', 'peaceful', 'humble', 'dull', 'expedient', 'loyal', 'decisive', 'arbitrary', 'earnest', 'confident', 'conservative', 'foolish', 'moderate', 'helpful', 'delicate', 'gentle', 'dedicated', 'hostile', 'generous', 'reliable', 'dramatic', 'precise', 'calm', 'healthy', 'attractive', 'artificial', 'progressive', 'odd', 'confused', 'rational', 'brilliant', 'intense', 'genuine', 'mistaken', 'driving', 'stable', 'objective', 'sensitive', 'neutral', 'strict', 'angry', 'profound', 'smooth', 'ignorant', 'thorough', 'logical', 'intelligent', 'extraordinary', 'experimental', 'steady', 'formal', 'faithful', 'curious', 'reserved', 'honest', 'busy', 'educated', 'liberal', 'friendly', 'efficient', 'sweet', 'surprising', 'mechanical', 'clean', 'critical', 'criminal', 'soft', 'proud', 'quiet', 'weak', 'anxious', 'solid', 'complex', 'grand', 'warm', 'slow', 'false', 'extreme', 'narrow', 'dependent', 'wise', 'organized', 'pure', 'directed', 'dry', 'obvious', 'popular', 'capable', 'secure', 'active', 'independent', 'ordinary', 'fixed', 'practical', 'serious', 'fair', 'understanding', 'constant', 'cold', 'responsible', 'deep', 'religious', 'private', 'simple', 'physical', 'original', 'working', 'strong', 'modern', 'determined', 'open', 'political', 'difficult', 'knowledge', 'kind']



## Racial Otherization

In [None]:
import pandas as pd

# Calculate distances between otherization and 'Black,' 'Asian,' and 'White' words.
def cossim(v1, v2, signed = True):
    c = np.dot(v1, v2)/np.linalg.norm(v1)/np.linalg.norm(v2)
    if not signed:
        return abs(c)
    return c

def calc_distance_between_vectors(vec1, vec2, distype = ''):
    if distype == 'norm':
        return np.linalg.norm(np.subtract(vec1, vec2))
    else:
        return cossim(vec1, vec2)

# models = {}
output = []

for decade, model in [("1850", model1850s),("1900", model1900s),("1950", model1950s), ("2000", model2000s)]:
    
    average_white_names = np.mean(np.array([model[word] for word in white_names if word in model]), axis = 0)
    average_asian_names = np.mean(np.array([model[word] for word in asian_names if word in model]), axis = 0)
    average_black_names = np.mean(np.array([model[word] for word in black_names if word in model]), axis = 0)
    average_adjectives_otherization = np.mean(np.array([model[word] for word in adjectives_otherization if word in model]), axis = 0)
#     models[decade] = [average_adjectives_otherization, average_asian_names, average_white_names,]
   
    row = [decade]
    other_to_white = calc_distance_between_vectors(average_adjectives_otherization, average_white_names)
    other_to_black = calc_distance_between_vectors(average_adjectives_otherization, average_black_names)
    other_to_asian = calc_distance_between_vectors(average_adjectives_otherization, average_asian_names)
    row.append(other_to_asian - other_to_white)
    row.append(other_to_black - other_to_white)
    output.append(row)

pd.set_option('display.max_rows', 1000)
display(pd.DataFrame(output, columns=["Decade", "Asian Otherization Bias", "Black Otherization Bias"]))

## Religious Otherization

In [None]:
import pandas as pd

# Calculate distances between otherization and 'Muslim,' and 'Christian' words.
def cossim(v1, v2, signed = True):
    c = np.dot(v1, v2)/np.linalg.norm(v1)/np.linalg.norm(v2)
    if not signed:
        return abs(c)
    return c

def calc_distance_between_vectors(vec1, vec2, distype = ''):
    if distype == 'norm':
        return np.linalg.norm(np.subtract(vec1, vec2))
    else:
        return cossim(vec1, vec2)

# models = {}
output = []

for decade, model in [("1850", model1850s),("1900", model1900s),("1950", model1950s), ("2000", model2000s)]:
     
    average_christian = np.mean(np.array([model[word] for word in words_christianity if word in model]), axis = 0)
    average_islamic = np.mean(np.array([model[word] for word in words_islam if word in model]), axis = 0)
    average_adjectives_otherization = np.mean(np.array([model[word] for word in adjectives_otherization if word in model]), axis = 0)
#     models[decade] = [average_adjectives_otherization, average_asian_names, average_white_names,]
   
    row = [decade]
    other_to_christian = calc_distance_between_vectors(average_adjectives_otherization, average_christian)
    other_to_islamic = calc_distance_between_vectors(average_adjectives_otherization, average_islamic)
    row.append(other_to_islamic - other_to_christian)
    output.append(row)

pd.set_option('display.max_rows', 1000)
display(pd.DataFrame(output, columns=["Decade", "Asian Otherization Bias", "Black Otherization Bias"]))

## Personality traits - Ethnicity

In [None]:

# Calculate distances between occations and 'man' and 'woman' words.
def cossim(v1, v2, signed = True):
    c = np.dot(v1, v2)/np.linalg.norm(v1)/np.linalg.norm(v2)
    if not signed:
        return abs(c)
    return c

def calc_distance_between_vectors(vec1, vec2, distype = ''):
    if distype == 'norm':
        return np.linalg.norm(np.subtract(vec1, vec2))
    else:
        return cossim(vec1, vec2)

decades_and_models = [("1850", model1850s),("1900", model1900s),("1950", model1950s), ("2000", model2000s)]

models = {}
output = []

for decade, model in decades_and_models:
    average_white_names = np.mean(np.array([model[word] for word in white_names if word in model]), axis = 0)
    average_asian_names = np.mean(np.array([model[word] for word in asian_names if word in model]), axis = 0)
    average_black_names = np.mean(np.array([model[word] for word in black_names if word in model]), axis = 0)
    models[decade] = [average_white_names, average_asian_names,average_black_names]

available_traits = [p for p in personality_traits if np.any(model1950s[p]) and np.any(model2000s[p]) and np.any(model1900s[p])and np.any(model1850s[p])]

for trait in available_traits:
    for decade, model in decades_and_models:
        row = [int(decade),trait]
        white = calc_distance_between_vectors(model[trait], models[decade][0]) # the average vectors for men are found at index 0.
        # asian = calc_distance_between_vectors(model[trait], models[decade][1])
        black = calc_distance_between_vectors(model[trait], models[decade][2])
        # row.append(asian - white)
        row.append(black - white)  
        output.append(row)

columnnames = ["decade","trait", "black bias"]
pd.set_option('display.max_rows', 1000)
df = pd.DataFrame(output, columns=columnnames)


In [None]:
#run this to see the data!
display(df)

In [None]:
# Test some words using either or both of the models! 