# Bias in Word Embeddings

- 📺 **Video:** [https://youtu.be/J_227g77Jqg](https://youtu.be/J_227g77Jqg)

## Overview
- Examine how word embeddings capture societal biases present in training corpora.
- Measure bias directions and discuss mitigation strategies.

## Key ideas
- **Bias direction:** compute difference vectors (e.g., he - she) to probe stereotypes.
- **WEAT:** Word Embedding Association Tests quantify associations statistically.
- **Debiasing:** projection and neutralization can reduce measured bias but may hide underlying issues.
- **Data reflection:** embedding bias mirrors imbalances in the underlying corpus.

## Demo
Construct a simple embedding space and compute gender bias projections similar to the lecture (https://youtu.be/mYASY9b9Ec0), highlighting how associations emerge.

In [1]:
import numpy as np

corpus = [
    'she is a skilled doctor and compassionate leader',
    'he is a brilliant engineer and creative designer',
    'the nurse offered patient support and kindness',
    'the manager coordinated the project with precision',
    'artists create inspiring work with emotion and style',
    'scientists test hypotheses with rigorous experiments',
    'teachers guide students with patience and care',
    'the programmer solved complex problems quickly',
    'she leads the team with empathy',
    'he directs the team with authority'
]

vocab = sorted(set(' '.join(corpus).split()))
word_to_id = {word: idx for idx, word in enumerate(vocab)}
window = 2
cooc = np.zeros((len(vocab), len(vocab)))
for sentence in corpus:
    words = sentence.split()
    for i, word in enumerate(words):
        for j in range(max(0, i - window), min(len(words), i + window + 1)):
            if i == j:
                continue
            cooc[word_to_id[word], word_to_id[words[j]]] += 1

u, s, vt = np.linalg.svd(cooc)
embeddings = u[:, :10] * np.sqrt(s[:10])

he_vec = embeddings[word_to_id['he']]
she_vec = embeddings[word_to_id['she']]
gender_dir = he_vec - she_vec

def projection(word):
    idx = word_to_id.get(word)
    if idx is None:
        return 0.0
    vec = embeddings[idx]
    return (vec @ gender_dir) / (np.linalg.norm(gender_dir) + 1e-8)

profession_words = ['doctor', 'nurse', 'engineer', 'artists', 'manager', 'teacher']
for word in profession_words:
    score = projection(word)
    print(f"Gender projection of {word:8s}: {score:.3f}")

print()
print('After neutralizing along the gender direction:')
for word in profession_words:
    idx = word_to_id.get(word)
    if idx is None:
        print(f"Gender projection of {word:8s}: 0.000")
        continue
    vec = embeddings[idx]
    bias_component = (vec @ gender_dir) / (np.linalg.norm(gender_dir) ** 2 + 1e-8) * gender_dir
    neutral_vec = vec - bias_component
    score = (neutral_vec @ gender_dir) / (np.linalg.norm(gender_dir) + 1e-8)
    print(f"Gender projection of {word:8s}: {score:.3f}")


Gender projection of doctor  : -0.000
Gender projection of nurse   : -0.000
Gender projection of engineer: -0.000
Gender projection of artists : -0.000
Gender projection of manager : 0.000
Gender projection of teacher : 0.000

After neutralizing along the gender direction:
Gender projection of doctor  : -0.000
Gender projection of nurse   : -0.000
Gender projection of engineer: -0.000
Gender projection of artists : -0.000
Gender projection of manager : 0.000
Gender projection of teacher : 0.000


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 14.5](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Distributed Representations of Words and Phrases and their Compositionality](https://papers.nips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf)
- [A Scalable Hierarchical Distributed Language Model](https://papers.nips.cc/paper/2008/hash/1e056d2b0ebd5c878c550da6ac5d3724-Abstract.html)
- [Neural Word Embedding as Implicit Matrix Factorization](https://papers.nips.cc/paper/2014/file/feab05aa91085b7a8012516bc3533958-Paper.pdf)
- [GloVe: Global Vectors for Word Representation](https://www.aclweb.org/anthology/D14-1162/)
- [Enriching Word Vectors with Subword Information](https://arxiv.org/abs/1607.04606)
- [Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings](https://papers.nips.cc/paper/2016/file/a486cd07e4ac3d270571622f4f316ec5-Paper.pdf)
- [Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings](https://www.aclweb.org/anthology/N19-1062/)
- [Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them](https://www.aclweb.org/anthology/N19-1061/)
- [Deep Unordered Composition Rivals Syntactic Methods for Text Classification](https://www.aclweb.org/anthology/P15-1162/)


*Links only; we do not redistribute slides or papers.*