# Bias in Word Embeddings

- üì∫ **Video:** [https://youtu.be/J_227g77Jqg](https://youtu.be/J_227g77Jqg)

## Overview
An important discussion on how word embeddings can capture biases present in their training corpora. This video explains that because embeddings are learned from real-world text, they may reflect societal stereotypes and biases in that text.

In [None]:
import os, random
random.seed(0)
CI = os.environ.get('CI') == 'true'

## Key ideas
- For example, the notorious analogy result ‚Äúman : computer programmer :: woman : homemaker‚Äù was found in embeddings illustrating a gender bias: the model associated men with technical occupations and women with domestic roles.
- The lecture likely quantifies bias using metri like the Word Embedding Association Test (WEAT) and gives examples of gender, ethnic, or racial biases that have been observed (e.g., ‚Äúdoctor‚Äù closer to ‚Äúhe‚Äù and ‚Äúnurse‚Äù to ‚Äúshe‚Äù).
- The included references underscore this: Bolukbasi et al.
- (2016) explicitly addressed the gender bias in embeddings and proposed a method to ‚Äúdebias‚Äù them by zeroing out the gender direction However, later works like Manzini et al.

## Demo

In [None]:
print('Try the exercises below and follow the linked materials.')

## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 14.5](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Distributed Representations of Words and Phrases and their Compositionality](https://papers.nips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf)
- [A Scalable Hierarchical Distributed Language Model](https://papers.nips.cc/paper/2008/hash/1e056d2b0ebd5c878c550da6ac5d3724-Abstract.html)
- [Neural Word Embedding as Implicit Matrix Factorization](https://papers.nips.cc/paper/2014/file/feab05aa91085b7a8012516bc3533958-Paper.pdf)
- [GloVe: Global Vectors for Word Representation](https://www.aclweb.org/anthology/D14-1162/)
- [Enriching Word Vectors with Subword Information](https://arxiv.org/abs/1607.04606)
- [Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings](https://papers.nips.cc/paper/2016/file/a486cd07e4ac3d270571622f4f316ec5-Paper.pdf)
- [Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings](https://www.aclweb.org/anthology/N19-1062/)
- [Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them](https://www.aclweb.org/anthology/N19-1061/)
- [Deep Unordered Composition Rivals Syntactic Methods for Text Classification](https://www.aclweb.org/anthology/P15-1162/)


*Links only; we do not redistribute slides or papers.*