# INFORMATION RETRIEVAL PROJECT

---
## Gender stereotypes in parliamentary speeches

In word embedding models, each word is assigned to a high-dimensional vector such that the geometry of the vectors captures semantic relations between the words – e.g. vectors being closer together has been shown to correspond to more similar words. Recent works in machine learning demonstrate that word embeddings also capture common stereotypes, as these stereotypes are likely to be present, even if subtly, in the large corpora of training texts. These stereotypes are automatically learned by the embedding algorithm and could be problematic in many context if the embedding is then used for sensitive applications such as search rankings, product recommendations, or translations. An important direction of research is on developing algorithms to debias the word embeddings.

This project aims to use the word embeddings to study historical trends – specifically trends in the gender and ethnic stereotypes in the Italian parliamentary speeches from 1948 to 2020.

In [1]:
import pymongo
import numpy as np
import pandas as pd
from gensim.models import KeyedVectors
from gensim.models import Word2Vec
from tqdm.auto import tqdm
import pickle
import os
from itertools import product# INFORMATION RETRIEVAL PROJECT
import copy
import warnings
import math
import gensim
import matplotlib.pylab as plt
from six import string_types
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.metrics.pairwise import cosine_similarity
from numba import jit
from collections import defaultdict



# 2. ANALYSIS OF GENDER STEREOTYPES BY YEARS

# 1) PRELIMINARY ANALYSIS
Create a group of gendered words and retrieve the mean vector, then use it to retrieve the most similar words to the vector 

In [4]:
!ls we_models

W2V_by_years_1948_1968
W2V_by_years_1948_1968.trainables.syn1neg.npy
W2V_by_years_1948_1968.wv.vectors.npy
W2V_by_years_1968_1985
W2V_by_years_1968_1985.trainables.syn1neg.npy
W2V_by_years_1968_1985.wv.vectors.npy
W2V_by_years_1985_2000
W2V_by_years_1985_2000.trainables.syn1neg.npy
W2V_by_years_1985_2000.wv.vectors.npy
W2V_by_years_2000_2020
W2V_by_years_2000_2020.trainables.syn1neg.npy
W2V_by_years_2000_2020.wv.vectors.npy


In [5]:
!ls -lh we_models

total 4187512
-rw-r--r--  1 Niki  staff    22M 27 Giu 11:34 W2V_by_years_1948_1968
-rw-r--r--  1 Niki  staff   400M 27 Giu 11:36 W2V_by_years_1948_1968.trainables.syn1neg.npy
-rw-r--r--  1 Niki  staff   400M 27 Giu 11:38 W2V_by_years_1948_1968.wv.vectors.npy
-rw-r--r--  1 Niki  staff    15M 27 Giu 11:38 W2V_by_years_1968_1985
-rw-r--r--  1 Niki  staff   271M 27 Giu 11:39 W2V_by_years_1968_1985.trainables.syn1neg.npy
-rw-r--r--  1 Niki  staff   271M 27 Giu 11:41 W2V_by_years_1968_1985.wv.vectors.npy
-rw-r--r--  1 Niki  staff    10M 27 Giu 11:41 W2V_by_years_1985_2000
-rw-r--r--  1 Niki  staff   194M 27 Giu 11:42 W2V_by_years_1985_2000.trainables.syn1neg.npy
-rw-r--r--  1 Niki  staff   194M 27 Giu 11:43 W2V_by_years_1985_2000.wv.vectors.npy
-rw-r--r--  1 Niki  staff   7,1M 27 Giu 11:43 W2V_by_years_2000_2020
-rw-r--r--  1 Niki  staff   131M 27 Giu 11:43 W2V_by_years_2000_2020.trainables.syn1neg.npy
-rw-r--r--  1 Niki  staff   131M 27 Giu 11:44 W2V_by_years_2000_2020.wv.vector

In [3]:
from INFORET_project import calculate_avg_vector, print_similar_to_avg_gender 
from INFORET_project import Analogies, Most_Similar_Avg_Gender
from INFORET_project.data import gendered_neutral_words

In [12]:
W2V_models = !ls we_models/W2V_by_years_????_????

for path in W2V_models:
    print(f"\nYears: {path[-9:]}")
    model = KeyedVectors.load(path)
    
    for gender in ['male','female']:
        print(f"\nMost similar words to {gender} vector:")
        print_similar_to_avg_gender(model.wv,gender)


Years: 1948_1968

Most similar words to male vector:
('maschiare', 0.6602274179458618)
('figlio', 0.6437594890594482)
('Uscito', 0.6435737609863281)
('ventenne', 0.6424250602722168)
('Tuo', 0.6405723094940186)
('figliare', 0.6390287280082703)
('maritare', 0.6381059885025024)
('orbare', 0.6376898288726807)
('riabbracciare', 0.6346984505653381)
('settantenne', 0.630131185054779)
('Settembrini', 0.6298233866691589)
('giovanissime', 0.6251242160797119)
('figliuolo', 0.6234318614006042)
('Partono', 0.6232901215553284)
('fraticello', 0.6180324554443359)
('figliuola', 0.6157432198524475)
('Olgiati', 0.6155548095703125)

Most similar words to female vector:
('nutrice', 0.6796956062316895)
('figliola', 0.6751199960708618)
('bambino', 0.6639881730079651)
('maschio', 0.65898597240448)
('figlio', 0.6538442373275757)
('spose', 0.6526853442192078)
('giovanissime', 0.645454466342926)
('settantenne', 0.6447814106941223)
('ragazza', 0.6414586305618286)
('nubile', 0.6408020257949829)
('fidanzato', 0.64

- FEMALE VECTOR: 13 words related to family
- MALE VECTOR: 5 words related to family, 6 proper name

# 2) ANALOGIES

In [177]:
#pip install fastdist

Note: you may need to restart the kernel to use updated packages.


In [2]:
#conda install numba

In [2]:
import pymongo
import numpy as np
import pandas as pd
from gensim.models import KeyedVectors
from gensim.models import Word2Vec
from tqdm.auto import tqdm
import pickle
import os
from itertools import product# INFORMATION RETRIEVAL PROJECT

In [7]:
model = KeyedVectors.load('we_models/W2V_by_years_1948_1968')

In [4]:
from INFORET_project.data import gendered_neutral_words

In [7]:
analogies = Analogies(model.wv)

In [189]:
pd.set_option("display.max_rows", 100, "display.max_columns", 100)

analogies.generate_analogies(n_analogies=100, seed=['uomo','donna'], use_avg_gender=False,
                            multiple=False, delta=1., restrict_vocab=10000,
                            unrestricted=True)

Unnamed: 0,uomo,donna,distance,score,most_x,most_y,match
0,uomo,uomini,0.946525,0.483156,uomini,casalingo,False
1,giovane,donna,0.997564,0.419548,uomini,giovane,False
2,figliare,madre,0.905048,0.418259,figliare,figliare,False
3,ammalare,ammalato,0.912308,0.333209,ammalato,ammalare,False
4,classe,lavoratrice,0.956242,0.317785,lavoratrice,classe,False
5,amicare,compagno,0.955529,0.285504,compagno,amicare,False
6,ufficiare,ufficio,0.849572,0.281914,ufficio,ufficiare,False
7,suo,loro,0.938783,0.279944,loro,suo,False
8,cittadino,lavoratore,0.973228,0.274711,lavoratore,cittadino,False
9,dicastero,Ministeri,0.909511,0.261613,Ministeri,dicastero,False


In [194]:
analogies.generate_analogies(n_analogies=100, use_avg_gender=True,
                            multiple=False, delta=1., restrict_vocab=10000,
                            unrestricted=True)

Unnamed: 0,female_avg,male_avg,distance,score,most_x,most_y,match
0,madre,padre,0.867867,0.530152,figliare,mamma,False
1,sorella,fratello,0.909467,0.447607,concittadino,figliare,False
2,ella,egli,0.942863,0.425736,Gli,Ella,False
3,lei,lui,0.722058,0.412102,me,me,False
4,donna,giovane,0.997564,0.386546,giovane,casalingo,False
5,moglie,figliare,0.845687,0.288969,figliare,maritare,False
6,Ella,Egli,0.842111,0.286905,Egli,Ella,False
7,lavoratrice,operaio,0.828033,0.27455,operaio,lavoratrice,False
8,lode,elogiare,0.973672,0.262296,elogiare,lode,False
9,termoelettrico,energia,0.966241,0.25893,energia,termoelettrico,False


----

### Create a list of adjectives to test with analogies
Inspired from https://github.com/nikhgarg/EmbeddingDynamicStereotypes and WEAT test

In [2]:
from INFORET_project.data import gendered_neutral_words
from INFORET_project import Analogies_Distance_Bias

In [8]:
score = Analogies_Distance_Bias(model.wv,
                                gender_female='donna',
                                gender_male='uomo',
                                type_most_similar = 'cosmul')

In [9]:
top_bias = score.get_top_bias(pred_positive_word='adj_appearance')


Word: rozzo
Similarity of 'male' analogies to 'male': 0.3707369565963745, to 'female': 0.11216678470373154
Bias for 'male' analogies: 0.2585701644420624
Similarity of 'female' analogies to 'male': 0.17590370774269104, to 'female': 0.38686612248420715
Bias for 'female' analogies: 0.2109624147415161

Word: brutto
Similarity of 'male' analogies to 'male': 0.31335216760635376, to 'female': 0.06841546297073364
Bias for 'male' analogies: 0.24493670463562012
Similarity of 'female' analogies to 'male': 0.16295498609542847, to 'female': 0.38321977853775024
Bias for 'female' analogies: 0.22026479244232178

Word: sensuale
Similarity of 'male' analogies to 'male': 0.39317917823791504, to 'female': 0.16171547770500183
Bias for 'male' analogies: 0.2314637005329132
Similarity of 'female' analogies to 'male': 0.19913843274116516, to 'female': 0.4105088710784912
Bias for 'female' analogies: 0.21137043833732605

Word: piacevole
Similarity of 'male' analogies to 'male': 0.3499279022216797, to 'female': 

In [144]:
score.print_top_analogies()


Word: rozzo

Positive gender: male


[('rozzo', 1.055098056793213),
 ('inquisitore', 0.9339007139205933),
 ('araldo', 0.9232358336448669),
 ('utilitarismo', 0.891850471496582),
 ('concettualismo', 0.8837195038795471),
 ('esagitare', 0.876812756061554),
 ('nichilismo', 0.8724641799926758),
 ('apologeta', 0.8691055774688721),
 ('astuto', 0.867689311504364),
 ('hobby', 0.8633082509040833)]

Positive gender: female


[('rozzo', 0.9477760195732117),
 ('spudorato', 0.8033801317214966),
 ('ragazza', 0.7914453148841858),
 ('manganare', 0.7875456809997559),
 ('semioccupati', 0.7796444296836853),
 ('atrocemente', 0.7769207954406738),
 ('casalingo', 0.7768971920013428),
 ('lavoratrice', 0.775959312915802),
 ('inumane', 0.7758634686470032),
 ('corredo', 0.7737535834312439)]


Word: brutto

Positive gender: male


[('brutto', 1.0333833694458008),
 ('bruttare', 0.8925676941871643),
 ('hobby', 0.8543740510940552),
 ('araldo', 0.8458457589149475),
 ('lapsus', 0.8430330753326416),
 ('eclettico', 0.8411643505096436),
 ('infausto', 0.8395583629608154),
 ('abbagliare', 0.8371183276176453),
 ('inquisitore', 0.8352816700935364),
 ('neologismo', 0.8335784673690796)]

Positive gender: female


[('brutto', 0.9676918387413025),
 ('ragazza', 0.866706907749176),
 ('casalingo', 0.7819167375564575),
 ('bello', 0.7818884253501892),
 ('bambina', 0.7782366275787354),
 ('coniugato', 0.7708193063735962),
 ('matricola', 0.7671971321105957),
 ('passeggiatrice', 0.7608211040496826),
 ('porcheria', 0.7598817944526672),
 ('disoccupato', 0.7572304606437683)]


Word: sensuale

Positive gender: male


[('sensuale', 0.9746366739273071),
 ('inquisitore', 0.9382933974266052),
 ('abilitA', 0.9084437489509583),
 ('araldo', 0.9040325880050659),
 ('compassato', 0.8978230953216553),
 ('scontroso', 0.8938275575637817),
 ('concettualismo', 0.892277717590332),
 ('immanentismo', 0.8908933401107788),
 ('anticonformismo', 0.8903072476387024),
 ('utilitarismo', 0.8880628943443298)]

Positive gender: female


[('sensuale', 1.0260200500488281),
 ('ragazza', 0.8629611134529114),
 ('corredo', 0.8596376776695251),
 ('traviata', 0.8430247902870178),
 ('sospensi', 0.8368242979049683),
 ('giovanissime', 0.8325833678245544),
 ('sifilitico', 0.8237883448600769),
 ('nubile', 0.8186115026473999),
 ('psicopatia', 0.8184255957603455),
 ('nutrice', 0.8179838061332703)]


Word: piacevole

Positive gender: male


[('piacevole', 0.9957322478294373),
 ('inquisitore', 0.892082929611206),
 ('lapsus', 0.8710498809814453),
 ('contradittore', 0.8668596744537354),
 ('gioviale', 0.8614170551300049),
 ('humour', 0.8534638285636902),
 ('esagitare', 0.8497864007949829),
 ('araldo', 0.8459872007369995),
 ('dolciastro', 0.8455997705459595),
 ('amicare', 0.8428431749343872)]

Positive gender: female


[('piacevole', 1.0042823553085327),
 ('ragazza', 0.828667938709259),
 ('lcro', 0.7932448983192444),
 ('cinematografo', 0.7870739102363586),
 ('porcheria', 0.767779529094696),
 ('domenicale', 0.7661802768707275),
 ('banchetto', 0.7628483176231384),
 ('cast', 0.7585383653640747),
 ('bambina', 0.7575772404670715),
 ('colonia', 0.7568740844726562)]


Word: splendido

Positive gender: male


[('splendido', 1.0988596677780151),
 ('araldo', 0.8920515775680542),
 ('adamantino', 0.8848455548286438),
 ('insignire', 0.8829899430274963),
 ('parlatore', 0.8752474784851074),
 ('ideatore', 0.8722020387649536),
 ('meraviglioso', 0.8639161586761475),
 ('inquisitore', 0.8602320551872253),
 ('anticipatore', 0.8598359823226929),
 ('splendente', 0.8557685017585754)]

Positive gender: female


[('splendido', 0.910031259059906),
 ('tessitrici', 0.7881715297698975),
 ('santificare', 0.7821335792541504),
 ('ragazza', 0.7709360718727112),
 ('piscina', 0.7648804783821106),
 ('bello', 0.7640078663825989),
 ('banchetto', 0.7638187408447266),
 ('tendopoli', 0.7621203064918518),
 ('casale', 0.7620607614517212),
 ('esangui', 0.7591239809989929)]


Word: carino

Positive gender: male


[('carino', 0.9065060019493103),
 ('araldo', 0.7999435067176819),
 ('Iiberth', 0.7970865964889526),
 ('abilitA', 0.7970033288002014),
 ('stop', 0.796446681022644),
 ('Eiar', 0.7945173978805542),
 ('opt', 0.7943747043609619),
 ('chiazzare', 0.789283037185669),
 ('potentissimo', 0.7889971137046814),
 ('dismesso', 0.7870283722877502)]

Positive gender: female


[('carino', 1.103132724761963),
 ('lir', 0.8603341579437256),
 ('manganare', 0.8489342927932739),
 ('nutrice', 0.8451550602912903),
 ('erie', 0.8418395519256592),
 ('sospensi', 0.8314362168312073),
 ('autocolonne', 0.8293560147285461),
 ('prado', 0.8279494047164917),
 ('galizia', 0.8247182965278625),
 ('ragazza', 0.8246026039123535)]


Word: bello

Positive gender: male


[('bello', 0.9577351212501526),
 ('insignire', 0.8513430953025818),
 ('immaginifico', 0.8499125242233276),
 ('araldo', 0.8472561240196228),
 ('inquisitore', 0.8444651961326599),
 ('amicare', 0.8373761773109436),
 ('poesia', 0.8351735472679138),
 ('ideatore', 0.8270019888877869),
 ('epoque', 0.8204033970832825),
 ('saggista', 0.8187044262886047)]

Positive gender: female


[('bello', 1.0441265106201172),
 ('ragazza', 0.8436667323112488),
 ('Sovraintendenza', 0.8315069675445557),
 ('cinematografo', 0.7831221222877502),
 ('Belle', 0.7825037837028503),
 ('arto', 0.7816190123558044),
 ('matricola', 0.7788771390914917),
 ('Accademie', 0.7769147157669067),
 ('tessitrici', 0.7747092247009277),
 ('Antichita', 0.7585965991020203)]


Word: frivolo

Positive gender: male


[('frivolo', 1.0699609518051147),
 ('inquisitore', 0.8924795389175415),
 ('lapsus', 0.8859121799468994),
 ('memorialista', 0.8813603520393372),
 ('Cartesio', 0.8808109760284424),
 ('immaginifico', 0.8763315677642822),
 ('Rabelais', 0.8754727244377136),
 ('Rensi', 0.8678659200668335),
 ('Dudan', 0.8640059232711792),
 ('arguto', 0.863005518913269)]

Positive gender: female


[('frivolo', 0.9346103668212891),
 ('monacare', 0.7878923416137695),
 ('ragazza', 0.78507000207901),
 ('saletta', 0.7839958071708679),
 ('casale', 0.7780316472053528),
 ('puerpera', 0.7722499370574951),
 ('pergamena', 0.7699514031410217),
 ('scialbo', 0.7654573917388916),
 ('Magneti', 0.7654496431350708),
 ('tiravolisti', 0.764329195022583)]


Word: magro

Positive gender: male


[('magro', 0.9405244588851929),
 ('gruzzolo', 0.8010366559028625),
 ('beffardamente', 0.7944894433021545),
 ('avvizzire', 0.782113254070282),
 ('anemizza', 0.7795281410217285),
 ('striminzire', 0.7769705057144165),
 ('morbido', 0.7707599997520447),
 ('popolaresco', 0.7705590128898621),
 ('inaridimento', 0.7640417814254761),
 ('tiepido', 0.7607002854347229)]

Positive gender: female


[('magro', 1.0632330179214478),
 ('disoccupato', 0.8367560505867004),
 ('semioccupati', 0.8328951001167297),
 ('misero', 0.8317129015922546),
 ('mietitore', 0.8298977017402649),
 ('mietitura', 0.8225225210189819),
 ('raccoglitrici', 0.8103126883506775),
 ('mondina', 0.8089637756347656),
 ('falcidiando', 0.8052897453308105),
 ('ragazza', 0.7983862161636353)]


Word: grasso

Positive gender: male


[('grasso', 0.9597359895706177),
 ('sesamo', 0.8564077019691467),
 ('caustico', 0.8451469540596008),
 ('arsenico', 0.8167009949684143),
 ('inquisitore', 0.8158231377601624),
 ('combusto', 0.8124383687973022),
 ('oleico', 0.8058986067771912),
 ('solvente', 0.8049765229225159),
 ('commerciabile', 0.8047857284545898),
 ('aroma', 0.8027013540267944)]

Positive gender: female


[('grasso', 1.0419492721557617),
 ('carne', 0.903937816619873),
 ('panificazione', 0.8952084183692932),
 ('suino', 0.8826417922973633),
 ('lana', 0.8692362904548645),
 ('latta', 0.8684764504432678),
 ('salume', 0.866744875907898),
 ('alimentario', 0.8655551075935364),
 ('pollame', 0.863935649394989),
 ('turacciolo', 0.8597216010093689)]

---

- implement ECT
- implement WEAT
- try clustering?

# 3) WEAT and ECT

## WEAT

In [3]:
from INFORET_project import WEAT
from INFORET_project.data import gendered_neutral_words

In [3]:
model = KeyedVectors.load('we_models/W2V_by_years_1948_1968')

In [21]:
WEAT(model.wv, 
     first_target={'name':'career', 'words': gendered_neutral_words['career']},
     second_target={'name':'family', 'words': gendered_neutral_words['family']},
     first_attribute={'name':'donna', 'words': gendered_neutral_words['female']},
     second_attribute={'name':'uomo', 'words': gendered_neutral_words['male']}
)

# WEAT result (score, size effect, Nt, Na and p-value)
# score: z-score. result of the test statistic
# size effect: intensity of the effect, how much the 2 samples are separated
# p-value: The null hypothesis is that there is no difference between the two sets of target words in 
#terms of their relative similarity to the two sets of attribute words.
# Nt: dimension of target (6x2: 6 words for 2 targets)
# Na: dimension of attributes (8x2: 8 words for 2 attributes)

# low p-value, so H0 rejected

{'Target words': 'career vs. family',
 'Attrib. words': 'donna vs. uomo',
 's': -0.752927340567112,
 'd': -1.6041145,
 'p': 0.9989177489177489,
 'Nt': '10x2',
 'Na': '3x2'}

In [22]:
WEAT(model.wv, 
     first_target={'name':'male_stereotypes', 'words': gendered_neutral_words['male_stereotypes']},
     second_target={'name':'female_stereotypes', 'words': gendered_neutral_words['female_stereotypes']},
     first_attribute={'name':'donna', 'words': gendered_neutral_words['female']},
     second_attribute={'name':'uomo', 'words': gendered_neutral_words['male']}
)

{'Target words': 'male_stereotypes vs. female_stereotypes',
 'Attrib. words': 'donna vs. uomo',
 's': -0.3558421954512596,
 'd': -1.1222324,
 'p': 0.9860916860916861,
 'Nt': '8x2',
 'Na': '3x2'}

## ECT

In [2]:
from INFORET_project.utils import fast_cosine_sim, calculate_avg_vector
from INFORET_project import ECT

In [22]:
spearman_corr, pval = ECT(model.wv, 
                          gendered_neutral_words['female'], 
                          gendered_neutral_words['male'], 
                          neutral_words = gendered_neutral_words['family'])

ECT for words: ['famiglia', 'figlio', 'matrimonio', 'genitore', 'bambino', 'accudire']

Spearman correlation has value 0.7714 with p-value 0.0724
High correlation --> Low bias

Cosine similarity of 'famiglia' to 'female' is 0.4840, to 'male' is 0.3599
Cosine similarity of 'figlio' to 'female' is 0.4568, to 'male' is 0.3649
Cosine similarity of 'matrimonio' to 'female' is 0.2523, to 'male' is 0.2134
Cosine similarity of 'genitore' to 'female' is 0.4640, to 'male' is 0.4045
Cosine similarity of 'bambino' to 'female' is 0.5482, to 'male' is 0.4164
Cosine similarity of 'accudire' to 'female' is 0.5405, to 'male' is 0.4674


In [16]:
spearman_corr, pval = ECT(model.wv, 
                          gendered_neutral_words['female'], 
                          gendered_neutral_words['male'], 
                          neutral_words = gendered_neutral_words['career'])

ECT for words: ['capo', 'presidente', 'onorevole', 'potere', 'carriera', 'salario', 'lavoro', 'professionale', 'denaro', 'ambizione']

Spearman correlation has value 0.4182 with p-value 0.2291
Moderate correlation --> Moderate bias

Cosine similarity of 'capo' to 'female' is 0.1374, to 'male' is 0.1829
Cosine similarity of 'presidente' to 'female' is 0.1468, to 'male' is 0.2569
Cosine similarity of 'onorevole' to 'female' is 0.1621, to 'male' is 0.2763
Cosine similarity of 'potere' to 'female' is 0.1830, to 'male' is 0.2610
Cosine similarity of 'carriera' to 'female' is 0.2111, to 'male' is 0.1765
Cosine similarity of 'salario' to 'female' is 0.2812, to 'male' is 0.2145
Cosine similarity of 'lavoro' to 'female' is 0.1026, to 'male' is 0.1105
Cosine similarity of 'professionale' to 'female' is 0.2597, to 'male' is 0.2347
Cosine similarity of 'denaro' to 'female' is 0.1057, to 'male' is 0.1724
Cosine similarity of 'ambizione' to 'female' is 0.2159, to 'male' is 0.1939


In [9]:
spearman_corr, pval = ECT(model.wv, 
                          gendered_neutral_words['female'], 
                          gendered_neutral_words['male'], 
                          neutral_words = gendered_neutral_words['female_stereotypes'])

ECT for words: ['bello', 'superficiale', 'frivolo', 'sensibile', 'delicato', 'gentile', 'passivo', 'silenzioso', 'insicuro', 'illogico', 'isterico', 'debole', 'irrazionale']

Spearman correlation has value 0.6538 with p-value 0.0153
Moderate correlation --> Moderate bias

Cosine similarity of 'bello' to 'female' is 0.2014, to 'male' is 0.2699
Cosine similarity of 'superficiale' to 'female' is 0.1012, to 'male' is 0.0804
Cosine similarity of 'frivolo' to 'female' is 0.3124, to 'male' is 0.4547
Cosine similarity of 'sensibile' to 'female' is 0.2410, to 'male' is 0.2043
Cosine similarity of 'delicato' to 'female' is 0.2247, to 'male' is 0.1971
Cosine similarity of 'gentile' to 'female' is 0.2040, to 'male' is 0.2745
Cosine similarity of 'passivo' to 'female' is 0.2332, to 'male' is 0.1659
Cosine similarity of 'silenzioso' to 'female' is 0.2856, to 'male' is 0.2322
Cosine similarity of 'insicuro' to 'female' is 0.3190, to 'male' is 0.2932
Cosine similarity of 'illogico' to 'female' is 0.07

In [18]:
spearman_corr, pval = ECT(model.wv, 
                          gendered_neutral_words['female'], 
                          gendered_neutral_words['male'], 
                          neutral_words = gendered_neutral_words['male_stereotypes'])

ECT for words: ['intelligente', 'razionale', 'saggio', 'ambizioso', 'forte', 'crudele', 'intollerante']

Spearman correlation has value 0.9643 with p-value 0.0005
High correlation --> Low bias

Cosine similarity of 'intelligente' to 'female' is 0.2679, to 'male' is 0.3162
Cosine similarity of 'razionale' to 'female' is 0.1441, to 'male' is 0.1110
Cosine similarity of 'saggio' to 'female' is 0.2478, to 'male' is 0.2573
Cosine similarity of 'ambizioso' to 'female' is 0.1682, to 'male' is 0.2052
Cosine similarity of 'forte' to 'female' is 0.2035, to 'male' is 0.2288
Cosine similarity of 'crudele' to 'female' is 0.3126, to 'male' is 0.2876
Cosine similarity of 'intollerante' to 'female' is 0.3143, to 'male' is 0.3642


In [30]:
spearman_corr, pval = ECT(model.wv, 
                          gendered_neutral_words['female'], 
                          gendered_neutral_words['male'], 
                          neutral_words = gendered_neutral_words['rage'])

ECT for words: ['intollerante', 'crudele', 'isterico', 'aggressivo', 'brutale', 'odioso', 'cattivo']

Spearman correlation has value 0.8571 with p-value 0.0137
High correlation --> Low bias

Cosine similarity of 'intollerante' to 'female' is 0.2680, to 'male' is 0.2491
Cosine similarity of 'crudele' to 'female' is 0.2685, to 'male' is 0.2354
Cosine similarity of 'isterico' to 'female' is 0.2520, to 'male' is 0.2439
Cosine similarity of 'aggressivo' to 'female' is 0.1572, to 'male' is 0.1096
Cosine similarity of 'brutale' to 'female' is 0.2082, to 'male' is 0.1489
Cosine similarity of 'odioso' to 'female' is 0.2511, to 'male' is 0.1883
Cosine similarity of 'cattivo' to 'female' is 0.1634, to 'male' is 0.1506


In [31]:
spearman_corr, pval = ECT(model.wv, 
                          gendered_neutral_words['female'], 
                          gendered_neutral_words['male'], 
                          neutral_words = gendered_neutral_words['kindness'])

ECT for words: ['premuroso', 'sensibile', 'delicato', 'buono', 'bravo']

Spearman correlation has value 0.3000 with p-value 0.6238
Moderate correlation --> Moderate bias

Cosine similarity of 'premuroso' to 'female' is 0.4219, to 'male' is 0.4423
Cosine similarity of 'sensibile' to 'female' is 0.2052, to 'male' is 0.1993
Cosine similarity of 'delicato' to 'female' is 0.1560, to 'male' is 0.1426
Cosine similarity of 'buono' to 'female' is 0.1840, to 'male' is 0.2057
Cosine similarity of 'bravo' to 'female' is 0.1441, to 'male' is 0.3304


In [32]:
spearman_corr, pval = ECT(model.wv, 
                          gendered_neutral_words['female'], 
                          gendered_neutral_words['male'], 
                          neutral_words = gendered_neutral_words['active'])

ECT for words: ['ambizioso', 'forte', 'assertivo', 'sicuro']

Spearman correlation has value 0.8000 with p-value 0.2000
High correlation --> Low bias

Cosine similarity of 'ambizioso' to 'female' is 0.1076, to 'male' is 0.1195
Cosine similarity of 'forte' to 'female' is 0.1613, to 'male' is 0.1652
Cosine similarity of 'assertivo' to 'female' is 0.2730, to 'male' is 0.2641
Cosine similarity of 'sicuro' to 'female' is 0.1501, to 'male' is 0.2230


In [33]:
spearman_corr, pval = ECT(model.wv, 
                          gendered_neutral_words['female'], 
                          gendered_neutral_words['male'], 
                          neutral_words = gendered_neutral_words['passive'])

ECT for words: ['timido', 'passivo', 'insicuro', 'debole']

Spearman correlation has value 0.4000 with p-value 0.6000
Moderate correlation --> Moderate bias

Cosine similarity of 'timido' to 'female' is 0.1413, to 'male' is 0.0818
Cosine similarity of 'passivo' to 'female' is 0.2038, to 'male' is 0.1974
Cosine similarity of 'insicuro' to 'female' is 0.2750, to 'male' is 0.1942
Cosine similarity of 'debole' to 'female' is 0.2337, to 'male' is 0.2131


In [34]:
spearman_corr, pval = ECT(model.wv, 
                          gendered_neutral_words['female'], 
                          gendered_neutral_words['male'], 
                          neutral_words = gendered_neutral_words['intelligence'])

ECT for words: ['intelligente', 'razionale', 'saggio', 'studioso', 'serio']

Spearman correlation has value 0.6000 with p-value 0.2848
Moderate correlation --> Moderate bias

Cosine similarity of 'intelligente' to 'female' is 0.1905, to 'male' is 0.2639
Cosine similarity of 'razionale' to 'female' is 0.0948, to 'male' is 0.1226
Cosine similarity of 'saggio' to 'female' is 0.2104, to 'male' is 0.2332
Cosine similarity of 'studioso' to 'female' is 0.1374, to 'male' is 0.2804
Cosine similarity of 'serio' to 'female' is 0.1004, to 'male' is 0.1440


In [11]:
spearman_corr, pval = ECT(model.wv, 
                          gendered_neutral_words['female'], 
                          gendered_neutral_words['male'], 
                          neutral_words = gendered_neutral_words['dumbness'])

ECT for words: ['illogico', 'irrazionale', 'stupido', 'superficiale']

Spearman correlation has value 1.0000 with p-value 0.0000
High correlation --> Low bias

Cosine similarity of 'illogico' to 'female' is 0.1155, to 'male' is 0.0661
Cosine similarity of 'irrazionale' to 'female' is 0.1216, to 'male' is 0.0806
Cosine similarity of 'stupido' to 'female' is 0.2806, to 'male' is 0.3270
Cosine similarity of 'superficiale' to 'female' is 0.1665, to 'male' is 0.1251


---

In [4]:
model = KeyedVectors.load('we_models/W2V_by_years_1948_1968')

In [6]:
model = KeyedVectors.load('we_models/W2V_by_years_1968_1985')

In [8]:
model = KeyedVectors.load('we_models/W2V_by_years_1985_2000')

In [6]:
model = KeyedVectors.load('we_models/W2V_by_years_2000_2020')