## 4. Role of Bias in Word Embeddings (50 mins)

In this unit, we will explore an application and caveat of using word embeddings -- cultural bias. Presenting methods and results from recent articles, we will show how word embeddings can carry historical bias of the corpora trained on and lead an activity that shows these human-biases on vectors and how they can be mitigated.

- 0:00 - 0:10 Algorithmic bias vs human bias 
- 0:10 - 0:40 [Activity 4] Identifying bias in corpora (occupations, gender, ...) [GloVe]
- 0:40 - 0:50 Towards unbiased embeddings; Examine “debiased” embeddings
- 0:50 - 0:60 Conclusion remarks and debate

In [172]:
!pip install numpy nltk scikit-learn matplotlib gensim seaborn plotly;
!python -m nltk.downloader all;

[31mdistributed 1.21.8 requires msgpack, which is not installed.[0m
[nltk_data] Downloading collection 'all'
[nltk_data]    | 
[nltk_data]    | Downloading package abc to /Users/eunseo/nltk_data...
[nltk_data]    |   Package abc is already up-to-date!
[nltk_data]    | Downloading package alpino to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package alpino is already up-to-date!
[nltk_data]    | Downloading package biocreative_ppi to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package biocreative_ppi is already up-to-date!
[nltk_data]    | Downloading package brown to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package brown is already up-to-date!
[nltk_data]    | Downloading package brown_tei to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package brown_tei is already up-to-date!
[nltk_data]    | Downloading package cess_cat to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   

[nltk_data]    |   Package gutenberg is already up-to-date!
[nltk_data]    | Downloading package ieer to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package ieer is already up-to-date!
[nltk_data]    | Downloading package inaugural to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package inaugural is already up-to-date!
[nltk_data]    | Downloading package indian to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package indian is already up-to-date!
[nltk_data]    | Downloading package jeita to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package jeita is already up-to-date!
[nltk_data]    | Downloading package kimmo to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package kimmo is already up-to-date!
[nltk_data]    | Downloading package knbc to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package knbc is already up-to-date!
[nltk_data]    | Downloading packa

[nltk_data]    |   Package wordnet is already up-to-date!
[nltk_data]    | Downloading package wordnet_ic to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package wordnet_ic is already up-to-date!
[nltk_data]    | Downloading package words to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package words is already up-to-date!
[nltk_data]    | Downloading package ycoe to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package ycoe is already up-to-date!
[nltk_data]    | Downloading package rslp to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package rslp is already up-to-date!
[nltk_data]    | Downloading package maxent_treebank_pos_tagger to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    |   Package maxent_treebank_pos_tagger is already up-
[nltk_data]    |       to-date!
[nltk_data]    | Downloading package universal_tagset to
[nltk_data]    |     /Users/eunseo/nltk_data...
[nltk_data]    

In [173]:
import numpy as np
import nltk
# import plotly.plotly as py
import sklearn
import matplotlib.pyplot as plt
import gensim

from IPython.display import HTML


<div style="font-size: 1em; margin: 1em 0 1em 0; border: 1px solid #86989B; background-color: #f7f7f7; padding: 0;">
<p style="margin: 0; padding: 0.1em 0 0.1em 0.5em; color: white; border-bottom: 1px solid #86989B; font-weight: bold; background-color: #AFC1C4;">
Activity
</p>
<p style="margin: 0.5em 1em 0.5em 1em; padding: 0;">
Your turn: Generate a vector including integers from 4 and 8 of size 10
<br>
<em>
<strong>Hint</strong>: Use the numpy functions
</em>
</p>
</div>


First we will work through examples from a paper that came out in 2016 on debiasing gender vectors. (https://arxiv.org/pdf/1607.06520.pdf)


In [181]:
import json
import random
import numpy as np

import debiaswe as dwe
import debiaswe.we as we
from debiaswe.we import WordEmbedding
from debiaswe.data import load_professions
import numpy.linalg as la

In [182]:
from sklearn.metrics.pairwise import cosine_similarity

In [234]:
#We first load our embedddings with the code the authors provide
E = WordEmbedding('./embeddings/w2v_gnews_small.txt')

*** Reading data from ./embeddings/w2v_gnews_small.txt
(26423, 300)
26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine


In [235]:
#The authors also give us a list of hand-picked set of definitional gender pairs, pairs of gender words we want to equalize, and words that are gender-specific  

with open('./data/definitional_pairs.json', "r") as f:
    defs = json.load(f)
print("definitional", defs)

with open('./data/equalize_pairs.json', "r") as f:
    equalize_pairs = json.load(f)

with open('./data/gender_specific_seed.json', "r") as f:
    gender_specific_words = json.load(f)
print("gender specific", len(gender_specific_words), gender_specific_words[:10])

definitional [['woman', 'man'], ['girl', 'boy'], ['she', 'he'], ['mother', 'father'], ['daughter', 'son'], ['gal', 'guy'], ['female', 'male'], ['her', 'his'], ['herself', 'himself'], ['Mary', 'John']]
gender specific 218 ['actress', 'actresses', 'aunt', 'aunts', 'bachelor', 'ballerina', 'barbershop', 'baritone', 'beard', 'beards']


In [236]:
defs

[['woman', 'man'],
 ['girl', 'boy'],
 ['she', 'he'],
 ['mother', 'father'],
 ['daughter', 'son'],
 ['gal', 'guy'],
 ['female', 'male'],
 ['her', 'his'],
 ['herself', 'himself'],
 ['Mary', 'John']]

In [237]:
equalize_pairs

[['monastery', 'convent'],
 ['spokesman', 'spokeswoman'],
 ['Catholic_priest', 'nun'],
 ['Dad', 'Mom'],
 ['Men', 'Women'],
 ['councilman', 'councilwoman'],
 ['grandpa', 'grandma'],
 ['grandsons', 'granddaughters'],
 ['prostate_cancer', 'ovarian_cancer'],
 ['testosterone', 'estrogen'],
 ['uncle', 'aunt'],
 ['wives', 'husbands'],
 ['Father', 'Mother'],
 ['Grandpa', 'Grandma'],
 ['He', 'She'],
 ['boy', 'girl'],
 ['boys', 'girls'],
 ['brother', 'sister'],
 ['brothers', 'sisters'],
 ['businessman', 'businesswoman'],
 ['chairman', 'chairwoman'],
 ['colt', 'filly'],
 ['congressman', 'congresswoman'],
 ['dad', 'mom'],
 ['dads', 'moms'],
 ['dudes', 'gals'],
 ['ex_girlfriend', 'ex_boyfriend'],
 ['father', 'mother'],
 ['fatherhood', 'motherhood'],
 ['fathers', 'mothers'],
 ['fella', 'granny'],
 ['fraternity', 'sorority'],
 ['gelding', 'mare'],
 ['gentleman', 'lady'],
 ['gentlemen', 'ladies'],
 ['grandfather', 'grandmother'],
 ['grandson', 'granddaughter'],
 ['he', 'she'],
 ['himself', 'herself'],

In [238]:
gender_specific_words

['actress',
 'actresses',
 'aunt',
 'aunts',
 'bachelor',
 'ballerina',
 'barbershop',
 'baritone',
 'beard',
 'beards',
 'beau',
 'bloke',
 'blokes',
 'boy',
 'boyfriend',
 'boyfriends',
 'boyhood',
 'boys',
 'brethren',
 'bride',
 'brides',
 'brother',
 'brotherhood',
 'brothers',
 'bull',
 'bulls',
 'businessman',
 'businessmen',
 'businesswoman',
 'chairman',
 'chairwoman',
 'chap',
 'colt',
 'colts',
 'congressman',
 'congresswoman',
 'convent',
 'councilman',
 'councilmen',
 'councilwoman',
 'countryman',
 'countrymen',
 'czar',
 'dad',
 'daddy',
 'dads',
 'daughter',
 'daughters',
 'deer',
 'diva',
 'dowry',
 'dude',
 'dudes',
 'elder_brother',
 'eldest_son',
 'estranged_husband',
 'estranged_wife',
 'estrogen',
 'ex_boyfriend',
 'ex_girlfriend',
 'father',
 'fathered',
 'fatherhood',
 'fathers',
 'fella',
 'fellas',
 'female',
 'females',
 'feminism',
 'fiance',
 'fiancee',
 'fillies',
 'filly',
 'fraternal',
 'fraternities',
 'fraternity',
 'gal',
 'gals',
 'gelding',
 'gentle

In [239]:
# We also get a list of professions
professions = load_professions()

Loaded professions
Format:
word,
definitional female -1.0 -> definitional male 1.0
stereotypical female -1.0 -> stereotypical male 1.0


In [240]:
professions

[['accountant', 0.0, 0.4],
 ['acquaintance', 0.0, 0.0],
 ['actor', 0.8, 0.0],
 ['actress', -1.0, 0.0],
 ['adjunct_professor', 0.0, 0.5],
 ['administrator', 0.0, 0.2],
 ['adventurer', 0.0, 0.5],
 ['advocate', 0.0, -0.1],
 ['aide', 0.0, -0.2],
 ['alderman', 0.7, 0.2],
 ['alter_ego', 0.0, 0.0],
 ['ambassador', 0.0, 0.7],
 ['analyst', 0.0, 0.4],
 ['anthropologist', 0.0, 0.4],
 ['archaeologist', 0.0, 0.6],
 ['archbishop', 0.4, 0.5],
 ['architect', 0.1, 0.6],
 ['artist', 0.0, -0.2],
 ['artiste', -0.1, -0.2],
 ['assassin', 0.1, 0.8],
 ['assistant_professor', 0.1, 0.4],
 ['associate_dean', 0.0, 0.4],
 ['associate_professor', 0.0, 0.4],
 ['astronaut', 0.1, 0.8],
 ['astronomer', 0.1, 0.5],
 ['athlete', 0.0, 0.7],
 ['athletic_director', 0.1, 0.7],
 ['attorney', 0.0, 0.3],
 ['author', 0.0, 0.1],
 ['baker', 0.0, -0.1],
 ['ballerina', -0.5, -0.5],
 ['ballplayer', 0.2, 0.8],
 ['banker', 0.0, 0.6],
 ['barber', 0.5, 0.5],
 ['baron', 0.6, 0.3],
 ['barrister', 0.1, 0.4],
 ['bartender', 0.0, 0.3],
 ['biol

What is the problem we're dealing with here? Where is the 'bias'?

In [241]:
#We identify a gender subspace or direction by subtracting the 'he' vector from the 'she' vector
gender_vector = E.diff('she','he')

In [242]:
#Using cosine similarity, see how gender specific words (such as 'boyfriend' or 'actress') relate to the gender vector
for w in gender_specific_words:
    if w in E.index:
        print (w, cosine_similarity(E.v(w).reshape(1,-1), gender_vector.reshape(1,-1)))

actress [[0.35235137]]
actresses [[0.35750276]]
aunt [[0.21072564]]
aunts [[0.223503]]
bachelor [[0.01410255]]
ballerina [[0.2527647]]
barbershop [[-0.02640718]]
baritone [[-0.11957256]]
beard [[-0.25878197]]
beards [[-0.13258722]]
beau [[0.15536301]]
bloke [[-0.15578036]]
blokes [[-0.01713428]]
boy [[-0.05544763]]
boyfriend [[0.2210198]]
boyfriends [[0.3152038]]
boyhood [[-0.30272686]]
boys [[0.0115604]]
brethren [[-0.15830205]]
bride [[0.19153392]]
brides [[0.30411255]]
brother [[-0.16182595]]
brotherhood [[-0.12884653]]
brothers [[-0.14913867]]
bull [[-0.07443305]]
bulls [[-0.0096081]]
businessman [[-0.20206767]]
businessmen [[-0.12944545]]
businesswoman [[0.359654]]
chairman [[-0.1800002]]
chairwoman [[0.36205798]]
chap [[-0.14431384]]
colt [[0.03022994]]
colts [[0.0358269]]
congressman [[-0.08690432]]
congresswoman [[0.2329951]]
convent [[0.2271874]]
councilman [[-0.06791365]]
councilmen [[-0.02734063]]
councilwoman [[0.3120814]]
countryman [[-0.3090027]]
countrymen [[-0.2185572]]

In [243]:
#Activity: Make your own list of words that you think will be telling.


In [244]:
#We will look at how these 'biases' are reflected in our hand-picked set of profession-related words.
for w in [p[0] for p in professions]:
    if w in E.index:
        print (w, cosine_similarity(E.v(w).reshape(1,-1), gender_vector.reshape(1,-1)))

accountant [[0.02344247]]
acquaintance [[-0.0223289]]
actor [[-0.06750044]]
actress [[0.35235137]]
adjunct_professor [[0.01924637]]
administrator [[0.07803117]]
adventurer [[-0.05271553]]
advocate [[0.08000673]]
aide [[-0.00559208]]
alderman [[0.01872149]]
alter_ego [[-0.03154503]]
ambassador [[-0.0218579]]
analyst [[-0.05954619]]
anthropologist [[0.08572732]]
archaeologist [[-0.00342097]]
archbishop [[-0.0563061]]
architect [[-0.16785556]]
artist [[0.07598083]]
artiste [[0.094669]]
assassin [[-0.0850487]]
assistant_professor [[0.10646202]]
associate_dean [[0.08549082]]
associate_professor [[0.09039041]]
astronaut [[0.05733159]]
astronomer [[-0.04710868]]
athlete [[-0.02273922]]
athletic_director [[-0.0060405]]
attorney [[0.00519307]]
author [[0.07351885]]
baker [[0.0845248]]
ballerina [[0.2527647]]
ballplayer [[-0.12088909]]
banker [[-0.06638345]]
barber [[-0.11345601]]
baron [[-0.11973347]]
barrister [[0.00180577]]
bartender [[0.0565259]]
biologist [[-0.00424051]]
bishop [[-0.0282890

While as humanists and social scientists we might find such discrepancies, or rather consistent biases, fascinating, computer scientists found this horrifying. So they have tried to correct these discrepancies.

In [245]:
E.v('physicist')

array([ 0.0123079 ,  0.00184618,  0.159561  ,  0.0151482 , -0.0153375 ,
        0.138353  , -0.00100199, -0.0888693 ,  0.0355982 , -0.0643798 ,
        0.0097832 , -0.0676619 , -0.0304226 , -0.0355982 , -0.108057  ,
        0.0989681 ,  0.045697  ,  0.04191   ,  0.0598353 , -0.0416575 ,
       -0.0552909 ,  0.0517563 , -0.0388803 , -0.0472118 ,  0.0227223 ,
       -0.00984632, -0.0229747 , -0.0436773 , -0.0994731 , -0.0873545 ,
       -0.0393853 , -0.0401427 , -0.0316849 ,  0.0124973 ,  0.0651372 ,
        0.00735318,  0.00965697,  0.0578156 ,  0.0054281 ,  0.0487267 ,
       -0.0214599 ,  0.00046944,  0.0383754 ,  0.0878595 ,  0.00839462,
       -0.0823051 , -0.0165368 , -0.00735318,  0.00216177, -0.0823051 ,
       -0.083315  ,  0.0363556 ,  0.00105722, -0.0656421 , -0.0363556 ,
       -0.0527662 ,  0.0525137 , -0.0848298 , -0.0383754 , -0.17067   ,
       -0.00416575, -0.0228485 , -0.010225  ,  0.0633699 ,  0.074731  ,
        0.0497365 ,  0.00119923,  0.120681  ,  0.0335785 , -0.02

In [246]:
physicist = E.v('physicist')/la.norm(E.v('physicist'))

In [247]:
factor = np.dot(physicist,gender_vector.T)


In [248]:
bias_comp = np.multiply(gender_vector, factor)


In [249]:
debiased = physicist - bias_comp


In [250]:
cosine_similarity(debiased.reshape(1,-1), gender_vector.reshape(1,-1))

array([[-7.450581e-09]], dtype=float32)

In [251]:
cosine_similarity(physicist.reshape(1,-1), gender_vector.reshape(1,-1))

array([[-0.06018062]], dtype=float32)

In [270]:
before_debiasing = sorted([(w,cosine_similarity(E.v(w).reshape(1,-1),gender_vector.reshape(1,-1))) for w in profession_words], key=lambda x:x[1][[0]])
before_debiasing[:15], before_debiasing[-15:]

([('congressman', array([[-0.4073272]], dtype=float32)),
  ('businessman', array([[-0.3975184]], dtype=float32)),
  ('councilman', array([[-0.3079872]], dtype=float32)),
  ('dad', array([[-0.28662366]], dtype=float32)),
  ('statesman', array([[-0.21665451]], dtype=float32)),
  ('salesman', array([[-0.11345413]], dtype=float32)),
  ('monk', array([[-0.07300488]], dtype=float32)),
  ('handyman', array([[-0.07216395]], dtype=float32)),
  ('minister', array([[-0.05075533]], dtype=float32)),
  ('trader', array([[-0.04121685]], dtype=float32)),
  ('commissioner', array([[-0.03801057]], dtype=float32)),
  ('archbishop', array([[-0.0366865]], dtype=float32)),
  ('skipper', array([[-0.03591242]], dtype=float32)),
  ('surgeon', array([[-0.03531513]], dtype=float32)),
  ('firebrand', array([[-0.03490256]], dtype=float32))],
 [('janitor', array([[0.04433245]], dtype=float32)),
  ('registered_nurse', array([[0.04522654]], dtype=float32)),
  ('paralegal', array([[0.04842081]], dtype=float32)),
  ('p

In [252]:
from debiaswe.debias import debias

In [256]:
debias(E, gender_specific_words, defs, equalize_pairs)

26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine
{('Colt', 'Filly'), ('SONS', 'DAUGHTERS'), ('Fraternity', 'Sorority'), ('uncle', 'aunt'), ('GENTLEMEN', 'LADIES'), ('Gentlemen', 'Ladies'), ('GRANDSON', 'GRANDDAUGHTER'), ('He', 'She'), ('Wives', 'Husbands'), ('brother', 'sister'), ('CONGRESSMAN', 'CONGRESSWOMAN'), ('FRATERNITY', 'SORORITY'), ('nephew', 'niece'), ('DUDES', 'GALS'), ('Dudes', 'Gals'), ('his', 'her'), ('dads', 'moms'), ('Gentleman', 'Lady'), ('Catholic_Priest', 'Nun'), ('BOYS', 'GIRLS'), ('Councilman', 'Councilwoman'), ('Twin_Brother', 'Twin_Sister'), ('fatherhood', 'motherhood'), ('His', 'Her'), ('Ex_Girlfriend', 'Ex_Boyfriend'), ('gelding', 'mare'), ('father', 'mother'), ('BROTHER', 'SISTER'), ('businessman', 'businesswoman'), ('FELLA', 'GRANNY'), ('BUSINESSMAN', 'BUSINESSWOMAN'), ('colt', 'filly'), ('Himself', 'Herself'), ('Man', 'Woman'), ('Gelding', 'Mare'), ('catholic_priest', 'nun'), ('sons', 'daughters'), ('son', 'daughter'), ('Brot

In [273]:
after_debiasing = sorted([(w,cosine_similarity(E.v(w).reshape(1,-1),gender_vector.reshape(1,-1))) for w in profession_words], key=lambda x:x[1][[0]])



In [275]:
after_debiasing[:15], after_debiasing[-15:]

([('congressman', array([[-0.4073272]], dtype=float32)),
  ('businessman', array([[-0.3975184]], dtype=float32)),
  ('councilman', array([[-0.3079872]], dtype=float32)),
  ('dad', array([[-0.28662366]], dtype=float32)),
  ('statesman', array([[-0.21665451]], dtype=float32)),
  ('salesman', array([[-0.11345413]], dtype=float32)),
  ('monk', array([[-0.07300488]], dtype=float32)),
  ('handyman', array([[-0.07216395]], dtype=float32)),
  ('minister', array([[-0.05075533]], dtype=float32)),
  ('trader', array([[-0.04121685]], dtype=float32)),
  ('commissioner', array([[-0.03801057]], dtype=float32)),
  ('archbishop', array([[-0.0366865]], dtype=float32)),
  ('skipper', array([[-0.03591242]], dtype=float32)),
  ('surgeon', array([[-0.03531513]], dtype=float32)),
  ('firebrand', array([[-0.03490256]], dtype=float32))],
 [('janitor', array([[0.04433245]], dtype=float32)),
  ('registered_nurse', array([[0.04522654]], dtype=float32)),
  ('paralegal', array([[0.04842081]], dtype=float32)),
  ('p

In [95]:
a_gender_debiased = E.best_analogies_dist_thresh(v_gender)

Computing neighbors
Mean: 10.218597434053665
Median: 7.0


In [96]:
sp = sorted([(E.v(w).dot(v_gender), w) for w in profession_words])



In [97]:
sp[:30]

[(-0.41963243, 'congressman'),
 (-0.40675846, 'businessman'),
 (-0.32398775, 'councilman'),
 (-0.30967087, 'dad'),
 (-0.21665451, 'statesman'),
 (-0.11345412, 'salesman'),
 (-0.073004864, 'monk'),
 (-0.072163954, 'handyman'),
 (-0.04946822, 'minister'),
 (-0.043583844, 'archbishop'),
 (-0.040207192, 'bishop'),
 (-0.038332455, 'commissioner'),
 (-0.035724368, 'surgeon'),
 (-0.033133984, 'trader'),
 (-0.032377183, 'observer'),
 (-0.032095812, 'neurosurgeon'),
 (-0.031450085, 'priest'),
 (-0.031133948, 'skipper'),
 (-0.029659139, 'lawmaker'),
 (-0.029511206, 'commander'),
 (-0.029176578, 'poet'),
 (-0.029120361, 'citizen'),
 (-0.028854534, 'analyst'),
 (-0.02802065, 'captain'),
 (-0.027623912, 'diplomat'),
 (-0.02708204, 'colonel'),
 (-0.027062505, 'vice_chancellor'),
 (-0.026749752, 'firebrand'),
 (-0.025862658, 'legislator'),
 (-0.02563687, 'saint')]

In [38]:
with open('./glove/glove.6B.50d.txt','rb') as lines:
    w2v = {line.split()[0].decode('utf-8'): np.array(list(map(float, line.split()[1:]))).reshape(1,50)
           for line in lines}
    


$S_{(a,b)}(x,y) = 
  \left\{
\begin{array}{ll}
      cos(\vec{a} - \vec{b},\vec{x} - \vec{y})& if \left\Vert \vec{x} - \vec{y} \right\Vert \leq \delta \\
      0 & otherwise \\
\end{array} 
\right. $

In [82]:
a_gender = E.best_analogies_dist_thresh(v_gender)

Computing neighbors
Mean: 10.219732808538016
Median: 7.0


In [84]:
print(a_gender[:30])

[('she', 'he', 1.0000001), ('herself', 'himself', 0.9213449), ('her', 'his', 0.90778077), ('woman', 'man', 0.7530545), ('daughter', 'son', 0.67479837), ('businesswoman', 'businessman', 0.65976363), ('girl', 'boy', 0.6581324), ('actress', 'actor', 0.6525245), ('chairwoman', 'chairman', 0.6397418), ('heroine', 'hero', 0.62939006), ('mother', 'father', 0.60740584), ('spokeswoman', 'spokesman', 0.5979808), ('sister', 'brother', 0.5973292), ('girls', 'boys', 0.5955373), ('sisters', 'brothers', 0.59103584), ('queen', 'king', 0.584144), ('niece', 'nephew', 0.5641366), ('councilwoman', 'councilman', 0.5600652), ('motherhood', 'fatherhood', 0.55260265), ('women', 'men', 0.5513933), ('petite', 'lanky', 0.55130965), ('ovarian_cancer', 'prostate_cancer', 0.5463923), ('Anne', 'John', 0.5452819), ('schoolgirl', 'schoolboy', 0.54329777), ('granddaughter', 'grandson', 0.53478867), ('aunt', 'uncle', 0.5305544), ('matriarch', 'patriarch', 0.516279), ('twin_sister', 'twin_brother', 0.5126212), ('mom', 'd

In [85]:
sp = sorted([(E.v(w).dot(v_gender), w) for w in profession_words])

