<a href="https://colab.research.google.com/github/yiranamejia/Diplodatos2020/blob/https%2Fcolab.research.google.com%2Fdrive%2F1GWE8PVGwvpo1ZxRket7xszJy3Q9DQABn%23scrollTo%3DyoJDoeIzTrqz/tutorial_mitigar_bias_en_word_embeddings_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Diagnóstico y mitigación de sesgo de género en embeddings de palabras**

## basado en el workshop https://learn.responsibly.ai/word-embedding

y sobre el toolkit [`responsibly`](https://docs.responsibly.ai/) - para auditar y mitigar sesgo y obtener equidad en los sistemas de aprendizaje automático.

# Descargos

En este ejemplo nos enfocamos en sesgo de género simplificándolo a un fenómeno binario, pero entendemos que se trata de una sobresimplificación, una primera aproximación a la familia de soluciones de mitigación que requiere de una mayor complejidad para tratar los fenómenos de sesgo como construcciones sociales.

Este material es un ejercicio puntual, no una perspectiva completa sobre sesgo en aprendizaje automático, equidad o inteligencia artificial responsable.

# Configuración

## Instalar `responsibly`

In [None]:
!pip install --user responsibly



## Validar la instalación de `responsibly`

In [None]:
import responsibly

# deberían obtener '0.1.2'
responsibly.__version__

---

Si están trabajando en Colab, es normal que después de la instalación tengan el error **`ModuleNotFoundError: No module named 'responsibly'`**.
<br/> <br/>
Reinicien el Kernel/Runtime (usen el menú de arriba o el botón en la notebook), salteen la celda de instalación (`!pip install --user responsibly`) y ejecuten la celda previa de vuelta. 

# Jugar con el embedding de Word2Vec

Con el paquete [`responsibly`](http://docs.responsibly.ai) viene la función [`responsibly.we.load_w2v_small`]() que devuelve un objeto [`KeyedVectors`](https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.KeyedVectors) de [`gensim`](https://radimrehurek.com/gensim/). Este modelo fue entrenado con Google News - 100B tokens, vocabulario de 3 millones, vectores de 300 dimensiones, sólo nos quedamos con el vocabulario en minúscula.

Para más información: [Word2Vec](https://code.google.com/archive/p/word2vec/) - 

## Propiedades Básicas

In [None]:
# ignorar warnings
# en general no queremos hacerlo pero ahora nos queremos enfocar en otra cosa

import warnings
warnings.filterwarnings('ignore')

In [None]:
from responsibly.we import load_w2v_small

w2v_small = load_w2v_small()

In [None]:
# tamanio del vocabulario

len(w2v_small.vocab)

In [None]:
# obtener el vector de la palabra "home"

print('home =', w2v_small['home'])

In [None]:
# la dimensión del embedding de la palabra, en este caso, es 300

len(w2v_small['home'])

In [None]:
# todas las palabras están normalizadas (=tienen una norma igual a uno como vectores)

from numpy.linalg import norm

norm(w2v_small['home'])

In [None]:
# asegurémonos que todos los vectores están normalizados

from numpy.testing import assert_almost_equal

length_vectors = norm(w2v_small.vectors, axis=1)

assert_almost_equal(actual=length_vectors,
                    desired=1,
                    decimal=5)

## Medir la similitud entre palabras

Usaremos el [coseno](https://es.wikipedia.org/wiki/Similitud_coseno) como medida de similitud (o distancia) entre palabras.
- Mide el coseno del ángulo entre dos vectores.
- Rango entre 1 (vectores idénticos) y -1 (vectores opuestos).
- En Python, para vectores normalizados (arrays de Numpy), usar el operador `@`

In [None]:
w2v_small['cat'] @ w2v_small['cat']

In [None]:
w2v_small['cat'] @ w2v_small['cats']

In [None]:
from math import acos, degrees

degrees(acos(w2v_small['cat'] @ w2v_small['cats']))

In [None]:
w2v_small['cat'] @ w2v_small['dog']

In [None]:
degrees(acos(w2v_small['cat'] @ w2v_small['dog']))

In [None]:
w2v_small['cat'] @ w2v_small['cow']

In [None]:
degrees(acos(w2v_small['cat'] @ w2v_small['cow']))

In [None]:
w2v_small['cat'] @ w2v_small['graduated']

In [None]:
degrees(acos(w2v_small['cat'] @ w2v_small['graduated']))

## Visualización del Word Embedding usando T-SNE 

<small>fuente: [Google's Seedbank](https://research.google.com/seedbank/seed/pretrained_word_embeddings)</small>

In [None]:
from sklearn.manifold import TSNE
from matplotlib import pylab as plt

# take the most common words in the corpus between 200 and 600
words = [word for word in w2v_small.index2word[200:600]]

# convert the words to vectors
embeddings = [w2v_small[word] for word in words]

# perform T-SNE
words_embedded = TSNE(n_components=2).fit_transform(embeddings)

# ... and visualize!
plt.figure(figsize=(20, 20))
for i, label in enumerate(words):
    x, y = words_embedded[i, :]
    plt.scatter(x, y)
    plt.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points',
                 ha='right', va='bottom', size=11)
plt.show()

### Extra: [Tensorflow Embedding Projector](http://projector.tensorflow.org)

## Palabras más semejantes

Cuáles son las palabras más semejantes a una determinada palabra?

In [None]:
w2v_small.most_similar('cat')

### EXTRA: Cuál es la palabra que desentona?

Dada una lista de palabras, cuál desentona? Es decir, cuál es la que está más lejos de la media de palabras.

In [None]:
w2v_small.doesnt_match('breakfast cereal dinner lunch'.split())

## Suma de palabras

![](https://github.com/ResponsiblyAI/word-embedding/blob/master/images/vector-addition.png?raw=1)

<small>fuente: [Wikipedia](https://commons.wikimedia.org/wiki/File:Vector_add_scale.svg)</small>

In [None]:
# nature + science = ?

w2v_small.most_similar(positive=['nature', 'science'])

## Analogía de vectores

![](https://www.tensorflow.org/images/linear-relationships.png)
<small>fuente: [Documentación de Tensorflow](https://www.tensorflow.org/tutorials/representation/word2vec)</small>

In [None]:
# man:king :: woman:?
# king - man + woman = ?

w2v_small.most_similar(positive=['king', 'woman'],
                       negative=['man'])

In [None]:
w2v_small.most_similar(positive=['big', 'smaller'],
                       negative=['small'])

## La dirección de un embedding puede verse como una relación

# $\overrightarrow{she} - \overrightarrow{he}$
# $\overrightarrow{smaller} - \overrightarrow{small}$
# $\overrightarrow{Spain} - \overrightarrow{Madrid}$


# Diagnosticamos sesgo de género en embeddings

Bolukbasi Tolga, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. [Man is to computer programmer as woman is to homemaker? debiasing word embeddings](https://arxiv.org/abs/1607.06520). NIPS 2016.

¿Cómo afecta el sesgo de género en embeddings en el contexto de aplicaciones downstream?

![](https://github.com/ResponsiblyAI/word-embedding/blob/master/images/examples-gender-bias-nlp.png?raw=1)

<small>fuente: Sun, T., Gaut, A., Tang, S., Huang, Y., ElSherief, M., Zhao, J., ... & Wang, W. Y. (2019). [Mitigating Gender Bias in Natural Language Processing: Literature Review](https://arxiv.org/pdf/1906.08976.pdf). arXiv preprint arXiv:1906.08976.</small>


## Probemos algunas propiedades con expresiones que sabemos que están fuertemente marcadas por el género



In [None]:
# she:sister :: he:?
# sister - she + he = ?

w2v_small.most_similar(positive=['sister', 'he'],
                       negative=['she'])

```
queen-king
waitress-waiter
sister-brother
mother-father
ovarian_cancer-prostate_cancer
convent-monastery
```

In [None]:
w2v_small.most_similar(positive=['nurse', 'he'],
                       negative=['she'])

```
sewing-carpentry
nurse-doctor
blond-burly
giggle-chuckle
sassy-snappy
volleyball-football
register_nurse-physician
interior_designer-architect
feminism-conservatism
vocalist-guitarist
diva-superstar
cupcakes-pizzas
housewife-shopkeeper
softball-baseball
cosmetics-pharmaceuticals
petite-lanky
charming-affable
hairdresser-barber
```

Parece que el método de generar analogías no es la forma más adecuada de observar sesgo en los embeddings, por la paradoja del observador: introduce sesgo, fuerza la producción de estereotipos de género!

Nissim, M., van Noord, R., van der Goot, R. (2019). [Fair is Better than Sensational: Man is to Doctor as Woman is to Doctor](https://arxiv.org/abs/1905.09866).


## Qué sí nos da la analogía? La dirección del género!

# $\overrightarrow{she} - \overrightarrow{he}$

In [None]:
gender_direction = w2v_small['she'] - w2v_small['he']

gender_direction /= norm(gender_direction)

In [None]:
gender_direction @ w2v_small['architect']

In [None]:
gender_direction @ w2v_small['interior_designer']

Con todos los recaudos de saber que estamos sobresimplificando el fenómeno, podemos ver que la palabra *architect* aparece en más contextos con *he* que con *she*, y viceversa para *interior designer*.**

Basándonos en esta propiedad, podemos calcular la dirección del género usano varios pares de palabras que sabemos que están fuertemente marcadas para género.:

- woman - man
- girl - boy
- she - he
- mother - father
- daughter - son
- gal - guy
- female - male
- her - his
- herself - himself
- Mary - John

## Prueben con algunas palabras
Reflexión: ¿están haciendo análisis exploratorio o evaluación sistemática?

In [None]:
gender_direction @ w2v_small['word']

Proyecciones

In [None]:
from responsibly.we import GenderBiasWE

w2v_small_gender_bias = GenderBiasWE(w2v_small, only_lower=True)

In [None]:
w2v_small_gender_bias.positive_end, w2v_small_gender_bias.negative_end

In [None]:
# dirección del género
w2v_small_gender_bias.direction[:10]

In [None]:
from responsibly.we.data import BOLUKBASI_DATA

neutral_profession_names = BOLUKBASI_DATA['gender']['neutral_profession_names']

In [None]:
neutral_profession_names[:8]

Nota: `actor` está en la lista de nombres de profesión neutros, y no`actress` porque parece que el uso de la palabra ha cambiado con el tiempo y ahora es más neutro, en comparación por ejemplo con  waiter-waitress (ver [Wikipedia - The term Actress](https://en.wikipedia.org/wiki/Actor#The_term_actress))

In [None]:
len(neutral_profession_names)

In [None]:
# the same of using the @ operator on the bias direction

w2v_small_gender_bias.project_on_direction(neutral_profession_names[0])

Visualicemos las proyecciones de las profesiones (neutras y específicas) en la dirección del género.

In [None]:
import matplotlib.pylab as plt

f, ax = plt.subplots(1, figsize=(10, 10))

w2v_small_gender_bias.plot_projection_scores(n_extreme=20, ax=ax);

EXTRA: Demo - Visualizando sesgo de género con [Nubes de palabras](http://wordbias.umiacs.umd.edu/)

Las proyecciones en la dirección de género de las palabras de profesiones se corresponden con datos de ocupación desglosados por género, según se puede ver en el porcentaje de mujeres en diversas profesiones según la encuesta de población de 2017 del Labor Force Statistics: https://arxiv.org/abs/1804.06876

In [None]:
from operator import itemgetter  # 🛠️ For idiomatic sorting in Python

from responsibly.we.data import OCCUPATION_FEMALE_PRECENTAGE

sorted(OCCUPATION_FEMALE_PRECENTAGE.items(), key=itemgetter(1))

In [None]:
f, ax = plt.subplots(1, figsize=(10, 8))

w2v_small_gender_bias.plot_factual_association(ax=ax);

Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2018). [Word embeddings quantify 100 years of gender and ethnic stereotypes](https://www.pnas.org/content/pnas/115/16/E3635.full.pdf). Proceedings of the National Academy of Sciences, 115(16), E3635-E3644.

![](https://github.com/ResponsiblyAI/word-embedding/blob/master/images/gender-bias-over-decades.png?raw=1)

<small>Data: Google Books/Corpus of Historical American English (COHA)</small>

## Medición directa del sesgo

1. Proyectamos cada uno de los nombres de profesión neutros en la dirección de género
2. Calculamos el valor absoluto de cada proyección
3. Lo promediamos

In [None]:
# función de alto nivel en responsibly

w2v_small_gender_bias.calc_direct_bias()

In [None]:
# qué hace responsibly internamente:

neutral_profession_projections = [w2v_small[word] @ w2v_small_gender_bias.direction
                                  for word in neutral_profession_names]

abs_neutral_profession_projections = [abs(proj) for proj in neutral_profession_projections]

sum(abs_neutral_profession_projections) / len(abs_neutral_profession_projections)

**Atención** la medición directa de sesgo está haciendo asunciones fuertes sobre las palabras neutras.

## 5.10 - [EXTRA] Medición indirecta del sesgo
Semejanza por proyección en la misma "dirección de género".

In [None]:
w2v_small_gender_bias.generate_closest_words_indirect_bias('softball',
                                                           'football')

# Mitigar sesgo

> We intentionally do not reference the resulting embeddings as "debiased" or free from all gender bias, and
prefer the term "mitigating bias" rather that "debiasing," to guard against the misconception that the resulting
embeddings are entirely "safe" and need not be critically evaluated for bias in downstream tasks. <small>James-Sorenson, H., & Alvarez-Melis, D. (2019). [Probabilistic Bias Mitigation in Word Embeddings](https://arxiv.org/pdf/1910.14497.pdf). arXiv preprint arXiv:1910.14497.</small>


## Neutralizar

Si neutralizamos, vamos a eliminar la proyección de género de todas las palabras excepto las de género neutro, y después normalizamos.

**Atención** un prerequisito fuerte es tener la lista de palabras fuertemente marcadas para género.

In [None]:
w2v_small_gender_debias = w2v_small_gender_bias.debias(method='neutralize', inplace=False)

In [None]:
print('home:',
      'before =', w2v_small_gender_bias.model['home'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['home'] @ w2v_small_gender_debias.direction)

In [None]:
print('man:',
      'before =', w2v_small_gender_bias.model['man'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['man'] @ w2v_small_gender_debias.direction)

In [None]:
print('woman:',
      'before =', w2v_small_gender_bias.model['woman'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['woman'] @ w2v_small_gender_debias.direction)

In [None]:
w2v_small_gender_debias.calc_direct_bias()

In [None]:
f, ax = plt.subplots(1, figsize=(10, 10))

w2v_small_gender_debias.plot_projection_scores(n_extreme=20, ax=ax);

In [None]:
f, ax = plt.subplots(1, figsize=(10, 8))

w2v_small_gender_debias.plot_factual_association(ax=ax);

## [EXTRA] Ecualizar

Las palabras en la lista de palabras marcadas para género (como por ejemplo `man` y `woman`) pueden tener una proyección diferente en la dirección de género. Eso puede resultar en una similitud diferente a palabras neutras, como `kitchen`.

In [None]:
w2v_small_gender_debias.model['man'] @ w2v_small_gender_debias.model['kitchen']

In [None]:
w2v_small_gender_debias.model['woman'] @ w2v_small_gender_debias.model['kitchen']

In [None]:
BOLUKBASI_DATA['gender']['equalize_pairs'][:10]

## Eliminación de sesgo dura: Neutralizar y Ecualizar

In [None]:
w2v_small_gender_debias = w2v_small_gender_bias.debias(method='hard', inplace=False)

In [None]:
print('home:',
      'before =', w2v_small_gender_bias.model['home'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['home'] @ w2v_small_gender_debias.direction)

In [None]:
print('man:',
      'before =', w2v_small_gender_bias.model['man'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['man'] @ w2v_small_gender_debias.direction)

In [None]:
print('woman:',
      'before =', w2v_small_gender_bias.model['woman'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['woman'] @ w2v_small_gender_debias.direction)

In [None]:
w2v_small_gender_debias.calc_direct_bias()

In [None]:
w2v_small_gender_debias.model['man'] @ w2v_small_gender_debias.model['kitchen']

In [None]:
w2v_small_gender_debias.model['woman'] @ w2v_small_gender_debias.model['kitchen']

In [None]:
f, ax = plt.subplots(1, figsize=(10, 10))

w2v_small_gender_debias.plot_projection_scores(n_extreme=20, ax=ax);

Después de mitigar el sesgo, el rendimiento del embedding resultante en benchmarks estándares no se ve fuertemente afectado.

In [None]:
w2v_small_gender_bias.evaluate_word_embedding()

In [None]:
w2v_small_gender_debias.evaluate_word_embedding()

# Explorar otros tipos de sesgo en word embeddings

### Sesgo racial

Usaremos la clase [`responsibly.we.BiasWordEmbedding`](http://docs.responsibly.ai/word-embedding-bias.html#ethically.we.bias.BiasWordEmbedding). `GenderBiasWE` es una subclase de `BiasWordEmbedding`.

In [None]:
from responsibly.we import BiasWordEmbedding

w2v_small_racial_bias = BiasWordEmbedding(w2v_small, only_lower=True)

Identificar la dirección racial usando el método `sum`

In [None]:
white_common_names = ['Emily', 'Anne', 'Jill', 'Allison', 'Laurie', 'Sarah', 'Meredith', 'Carrie',
                      'Kristen', 'Todd', 'Neil', 'Geoffrey', 'Brett', 'Brendan', 'Greg', 'Matthew',
                      'Jay', 'Brad']

black_common_names = ['Aisha', 'Keisha', 'Tamika', 'Lakisha', 'Tanisha', 'Latoya', 'Kenya', 'Latonya',
                      'Ebony', 'Rasheed', 'Tremayne', 'Kareem', 'Darnell', 'Tyrone', 'Hakim', 'Jamal',
                      'Leroy', 'Jermaine']

w2v_small_racial_bias._identify_direction('Whites', 'Blacks',
                                          definitional=(white_common_names, black_common_names),
                                          method='sum')

Usar los nombres de profesión neutros para medir el sesgo racial.

In [None]:
neutral_profession_names = BOLUKBASI_DATA['gender']['neutral_profession_names']

In [None]:
neutral_profession_names[:10]

In [None]:
f, ax = plt.subplots(1, figsize=(10, 10))

w2v_small_racial_bias.plot_projection_scores(neutral_profession_names, n_extreme=20, ax=ax);

Calcular la medida directa de sesgo

In [None]:
# Your Code Here...

Sigan explorando el sesgo racial

In [None]:
# Your Code Here...

# Recursos

## [Doing Data Science Responsibly - Resources](https://handbook.responsibly.ai/appendices/resources.html)

In particular:

- CVPR 2020 - [FATE Tutorial](https://youtu.be/-xGvcDzvi7Q) [Video]

- fast.ai - [Algorithmic Bias (NLP video 16)](https://youtu.be/pThqge9QDn8) [Video]

-  Solon Barocas, Moritz Hardt, Arvind Narayanan - [Fairness and machine learning - Limitations and Opportunities](https://fairmlbook.org/) [Textbook]



## Non-Technical Overview with More Downstream Application Examples
- [Google - Text Embedding Models Contain Bias. Here's Why That Matters.](https://developers.googleblog.com/2018/04/text-embedding-models-contain-bias.html)
- [Kai-Wei Chang (UCLA) - What It Takes to Control Societal Bias in Natural Language Processing](https://www.youtube.com/watch?v=RgcXD_1Cu18)
- Sun, T., Gaut, A., Tang, S., Huang, Y., ElSherief, M., Zhao, J., ... & Wang, W. Y. (2019). [Mitigating Gender Bias in Natural Language Processing: Literature Review](https://arxiv.org/pdf/1906.08976.pdf). arXiv preprint arXiv:1906.08976.

## Additional Related Work

- **Understanding Bias**
    - Ethayarajh, K., Duvenaud, D., & Hirst, G. (2019, July). [Understanding Undesirable Word Embedding Associations](https://arxiv.org/pdf/1908.06361.pdf). In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 1696-1705). - **Including critical analysis of the current metrics and debiasing methods (quite technical)**

  - Brunet, M. E., Alkalay-Houlihan, C., Anderson, A., & Zemel, R. (2019, May). [Understanding the Origins of Bias in Word Embeddings](https://arxiv.org/pdf/1810.03611.pdf). In International Conference on Machine Learning (pp. 803-811).


- **Discovering Biases**
  - Swinger, N., De-Arteaga, M., Heffernan IV, N. T., Leiserson, M. D., & Kalai, A. T. (2019, January). [What are the biases in my word embedding?](https://arxiv.org/pdf/1812.08769.pdf). In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (pp. 305-311). ACM.
    Measuring Gender Bias in Word Embeddings across Domains and Discovering New Gender Bias Word Categories
  
  - Chaloner, K., & Maldonado, A. (2019, August). [Measuring Gender Bias in Word Embeddings across Domains and Discovering New Gender Bias Word Categories](https://www.aclweb.org/anthology/W19-3804). In Proceedings of the First Workshop on Gender Bias in Natural Language Processing (pp. 25-32).


- **Fairness in Classification**
  - Prost, F., Thain, N., & Bolukbasi, T. (2019, August). [Debiasing Embeddings for Reduced Gender Bias in Text Classification](https://arxiv.org/pdf/1908.02810.pdf). In Proceedings of the First Workshop on Gender Bias in Natural Language Processing (pp. 69-75).
  
  - Romanov, A., De-Arteaga, M., Wallach, H., Chayes, J., Borgs, C., Chouldechova, A., ... & Kalai, A. (2019, June). [What's in a Name? Reducing Bias in Bios without Access to Protected Attributes](https://arxiv.org/pdf/1904.05233.pdf). In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4187-4195).


- **Other**
  
  - Zhao, J., Wang, T., Yatskar, M., Cotterell, R., Ordonez, V., & Chang, K. W. (2019, June). [Gender Bias in Contextualized Word Embeddings](https://arxiv.org/pdf/1904.03310.pdf). In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 629-634). [slides](https://jyzhao.net/files/naacl19.pdf)

  - Zhou, P., Shi, W., Zhao, J., Huang, K. H., Chen, M., & Chang, K. W. [Analyzing and Mitigating Gender Bias in Languages with Grammatical Gender and Bilingual Word Embeddings](https://aiforsocialgood.github.io/icml2019/accepted/track1/pdfs/47_aisg_icml2019.pdf). ICML 2019 - AI for Social Good. [Poster](https://aiforsocialgood.github.io/icml2019/accepted/track1/posters/47_aisg_icml2019.pdf)

- Zhao, J., Mukherjee, S., Hosseini, S., Chang, K. W., & Awadallah, A. [Gender Bias in Multilingual Embeddings](https://www.researchgate.net/profile/Subhabrata_Mukherjee/publication/340660062_Gender_Bias_in_Multilingual_Embeddings/links/5e97428692851c2f52a6200a/Gender-Bias-in-Multilingual-Embeddings.pdf).


##### Complete example of using `responsibly` with Word2Vec, GloVe and fastText: http://docs.responsibly.ai/notebooks/demo-gender-bias-words-embedding.html


## Bias in NLP

Around dozen of papers on this field until 2019, but nowdays plenty of work is done. Two venues from back then:
- [1st ACL Workshop on Gender Bias for Natural Language Processing](https://genderbiasnlp.talp.cat/)
- [NAACL 2019](https://naacl2019.org/)
