# DEMO
Demo for our presentation on Enhancing Fairness: Debiasing Word Embeddings based on:

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai
2016

https://github.com/tolga-b/debiaswe.git

### NOTES FOR US, REMOVE LATER
- structure is close to original "tutorial"-repo
- add, polish and individualize code
- add introduction text
- add step-headlines, explanations, comments, etc.


In [1]:
from utils.wordembedding import WordEmbedding
from utils.util import load_professions, debias
import json

In [2]:
embedding = WordEmbedding('./data/w2v_gnews_small.txt')

#load professions
professions = load_professions()
profession_words = [p[0] for p in professions]

*** Reading data from ./data/w2v_gnews_small.txt
(26423, 300)
26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine
Loaded professions
Format:
word,
definitional female -1.0 -> definitional male 1.0
stereotypical female -1.0 -> stereotypical male 1.0


In [3]:
#gender direction 
v_gender = embedding.diff('she', 'he')

#also show for different seed pairs?

In [4]:
#gender analogies
a_gender = embedding.n_analogies(v_gender)

for (a, b, _) in a_gender:
    print(f"{a} - {b}")

#we should display this more nicely, also maybe show especially critical ones?

#Maybe we can also write a function to directly explore relation like "x is to y as v is to w"
#where the user can try out different values for x, y and v and the system will suggest w?


Computing neighbors
Mean: 10.219808500170306
Median: 7.0
she - he
herself - himself
her - his
woman - man
daughter - son
businesswoman - businessman
girl - boy
actress - actor
chairwoman - chairman
heroine - hero
mother - father
spokeswoman - spokesman
sister - brother
girls - boys
sisters - brothers
queen - king
niece - nephew
councilwoman - councilman
motherhood - fatherhood
women - men
petite - lanky
ovarian_cancer - prostate_cancer
Anne - John
schoolgirl - schoolboy
granddaughter - grandson
aunt - uncle
matriarch - patriarch
twin_sister - twin_brother
mom - dad
lesbian - gay
husband - younger_brother
gal - dude
lady - gentleman
sorority - fraternity
mothers - fathers
grandmother - grandfather
blouse - shirt
soprano - baritone
queens - kings
Jill - Greg
daughters - sons
grandma - grandpa
volleyball - football
diva - superstar
mommy - kid
Sarah - Matthew
hairdresser - barber
softball - baseball
goddess - god
Aisha - Jamal
waitress - waiter
princess - prince
filly - colt
mare - geldin

In [5]:
#profession analysis gender
sp = sorted([(embedding.v(w).dot(v_gender), w) for w in profession_words])

sp[0:20], sp[-20:]

#again, we should display this more nicely

([(-0.23798445, 'maestro'),
  (-0.21665451, 'statesman'),
  (-0.2075867, 'skipper'),
  (-0.20267203, 'protege'),
  (-0.20206761, 'businessman'),
  (-0.19492394, 'sportsman'),
  (-0.1883635, 'philosopher'),
  (-0.18073657, 'marksman'),
  (-0.17289859, 'captain'),
  (-0.16785555, 'architect'),
  (-0.16702038, 'financier'),
  (-0.1631364, 'warrior'),
  (-0.15280864, 'major_leaguer'),
  (-0.15001443, 'trumpeter'),
  (-0.14718868, 'broadcaster'),
  (-0.14637241, 'magician'),
  (-0.14401692, 'fighter_pilot'),
  (-0.13782284, 'boss'),
  (-0.137182, 'industrialist'),
  (-0.13684887, 'pundit')],
 [(0.19714224, 'interior_designer'),
  (0.20833439, 'housekeeper'),
  (0.21560374, 'stylist'),
  (0.2236317, 'bookkeeper'),
  (0.23776127, 'maid'),
  (0.24125955, 'nun'),
  (0.2478258, 'nanny'),
  (0.24929334, 'hairdresser'),
  (0.24946158, 'paralegal'),
  (0.25276464, 'ballerina'),
  (0.2571882, 'socialite'),
  (0.26647124, 'librarian'),
  (0.27317622, 'receptionist'),
  (0.27540293, 'waitress'),
  (0.

# NOTE: Maybe we should also include demo for racial bias as in the original repo?

# DEBIASING:

In [6]:
#loading definitional pairs, equalize pairs and gender specific words to debias
with open('./data/definitional_pairs.json', "r") as f:
    defs = json.load(f)
print("definitional", defs)

with open('./data/equalize_pairs.json', "r") as f:
    equalize_pairs = json.load(f)

with open('./data/gender_specific_seed.json', "r") as f:
    gender_specific_words = json.load(f)
print("gender specific", len(gender_specific_words), gender_specific_words[:10])

#should we really print it like that?

definitional [['woman', 'man'], ['girl', 'boy'], ['she', 'he'], ['mother', 'father'], ['daughter', 'son'], ['gal', 'guy'], ['female', 'male'], ['her', 'his'], ['herself', 'himself'], ['Mary', 'John']]
gender specific 218 ['actress', 'actresses', 'aunt', 'aunts', 'bachelor', 'ballerina', 'barbershop', 'baritone', 'beard', 'beards']


In [7]:
debias(embedding, gender_specific_words, defs, equalize_pairs)

26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine
{('males', 'females'), ('grandpa', 'grandma'), ('nephew', 'niece'), ('CONGRESSMAN', 'CONGRESSWOMAN'), ('king', 'queen'), ('GRANDPA', 'GRANDMA'), ('Sons', 'Daughters'), ('GRANDSONS', 'GRANDDAUGHTERS'), ('male', 'female'), ('Wives', 'Husbands'), ('Nephew', 'Niece'), ('SPOKESMAN', 'SPOKESWOMAN'), ('Councilman', 'Councilwoman'), ('gelding', 'mare'), ('GELDING', 'MARE'), ('SCHOOLBOY', 'SCHOOLGIRL'), ('Businessman', 'Businesswoman'), ('GRANDFATHER', 'GRANDMOTHER'), ('Grandfather', 'Grandmother'), ('Prince', 'Princess'), ('brother', 'sister'), ('Fatherhood', 'Motherhood'), ('KING', 'QUEEN'), ('Grandsons', 'Granddaughters'), ('fraternity', 'sorority'), ('sons', 'daughters'), ('he', 'she'), ('CATHOLIC_PRIEST', 'NUN'), ('PRINCE', 'PRINCESS'), ('Kings', 'Queens'), ('boy', 'girl'), ('testosterone', 'estrogen'), ('GENTLEMEN', 'LADIES'), ('BUSINESSMAN', 'BUSINESSWOMAN'), ('SON', 'DAUGHTER'), ('Dads', 'Moms'), ('himself

In [8]:
#profession analysis gender
sp_debiased = sorted([(embedding.v(w).dot(v_gender), w) for w in profession_words])

sp_debiased[0:20], sp_debiased[-20:]

#again, this does not look very nice

#this is like in the cell above, maybe we can do a comparison or something like that?

([(-0.4196324, 'congressman'),
  (-0.40675837, 'businessman'),
  (-0.32398775, 'councilman'),
  (-0.30967087, 'dad'),
  (-0.21665451, 'statesman'),
  (-0.11345412, 'salesman'),
  (-0.07300486, 'monk'),
  (-0.072163954, 'handyman'),
  (-0.04946824, 'minister'),
  (-0.043583885, 'archbishop'),
  (-0.040207215, 'bishop'),
  (-0.038332503, 'commissioner'),
  (-0.03572438, 'surgeon'),
  (-0.03313399, 'trader'),
  (-0.03237723, 'observer'),
  (-0.032095872, 'neurosurgeon'),
  (-0.03145013, 'priest'),
  (-0.031133903, 'skipper'),
  (-0.029659167, 'lawmaker'),
  (-0.029511217, 'commander')],
 [(0.029965678, 'teenager'),
  (0.03023706, 'instructor'),
  (0.030946188, 'student'),
  (0.03111694, 'paralegal'),
  (0.032039404, 'bookkeeper'),
  (0.032434635, 'cinematographer'),
  (0.034329087, 'graphic_designer'),
  (0.03470566, 'lifeguard'),
  (0.035666507, 'janitor'),
  (0.035971917, 'drummer'),
  (0.042120144, 'wrestler'),
  (0.043902352, 'hairdresser'),
  (0.04813312, 'firefighter'),
  (0.2377612

In [9]:
#gender analogies
a_gender = embedding.n_analogies(v_gender)

for (a, b, _) in a_gender:
    print(f"{a} - {b}")

#same comments like above, we should make this nicer for the demo

#maybe we can do a comparison or something like that?

Computing neighbors
Mean: 10.218597434053665
Median: 7.0
grandma - grandpa
sister - brother
gals - dudes
mother - father
daughter - son
schoolgirl - schoolboy
girl - boy
females - males
businesswoman - businessman
women - men
girls - boys
spokeswoman - spokesman
filly - colt
mothers - fathers
princess - prince
daughters - sons
estrogen - testosterone
queen - king
niece - nephew
motherhood - fatherhood
grandmother - grandfather
councilwoman - councilman
she - he
female - male
her - his
ex_boyfriend - ex_girlfriend
ovarian_cancer - prostate_cancer
aunt - uncle
woman - man
moms - dads
chairwoman - chairman
herself - himself
twin_sister - twin_brother
ladies - gentlemen
queens - kings
mare - gelding
mom - dad
granddaughters - grandsons
sorority - fraternity
congresswoman - congressman
convent - monastery
sisters - brothers
husbands - wives
granddaughter - grandson
actress - actor
lesbian - gay
compatriot - countryman
husband - younger_brother
gal - dude
hers - theirs
heroine - protagonist
