# Other Word Embedding Methods

- 📺 **Video:** [https://youtu.be/gpP-depOUwg](https://youtu.be/gpP-depOUwg)

## Overview
- Contrast count-based methods (PPMI, GloVe) with predictive models like skip-gram and CBOW.
- Recognize how each technique factorizes co-occurrence statistics differently.

## Key ideas
- **PPMI:** positive pointwise mutual information emphasises informative co-occurrences.
- **GloVe:** minimizes reconstruction error on weighted co-occurrence counts.
- **Subword models:** fastText extends embeddings to character n-grams for rare words.
- **Hybrid strategies:** combine count-based initialization with predictive fine-tuning.

## Demo
Compute a PPMI matrix from a toy corpus and compare it with SVD-derived embeddings to illustrate the relationships outlined in the lecture (https://youtu.be/Pj5HY8zDuhY).

In [1]:
import numpy as np

corpus = [
    'she is a skilled doctor and compassionate leader',
    'he is a brilliant engineer and creative designer',
    'the nurse offered patient support and kindness',
    'the manager coordinated the project with precision',
    'artists create inspiring work with emotion and style',
    'scientists test hypotheses with rigorous experiments',
    'teachers guide students with patience and care',
    'the programmer solved complex problems quickly'
]

vocab = sorted(set(' '.join(corpus).split()))
word_to_id = {word: idx for idx, word in enumerate(vocab)}
window = 2
cooc = np.zeros((len(vocab), len(vocab)), dtype=float)
for sentence in corpus:
    words = sentence.split()
    for i, word in enumerate(words):
        target = word_to_id[word]
        for j in range(max(0, i - window), min(len(words), i + window + 1)):
            if i == j:
                continue
            cooc[target, word_to_id[words[j]]] += 1

row_totals = cooc.sum(axis=1, keepdims=True)
col_totals = cooc.sum(axis=0, keepdims=True)
total = cooc.sum()
ppmi = np.log((cooc * total + 1e-8) / (row_totals * col_totals + 1e-8))
ppmi = np.maximum(ppmi, 0)

u, s, vt = np.linalg.svd(ppmi)
embeddings = u[:, :5] * np.sqrt(s[:5])

for word in ['doctor', 'nurse', 'engineer', 'artists']:
    idx = word_to_id[word]
    sims = embeddings @ embeddings[idx] / (np.linalg.norm(embeddings, axis=1) * np.linalg.norm(embeddings[idx]) + 1e-8)
    ranking = sims.argsort()[::-1]
    neighbors = [(vocab[i], sims[i]) for i in ranking if vocab[i] != word][:3]
    print(f"Neighbors of {word}: {neighbors}")


Neighbors of doctor: [('engineer', 0.999999982255126), ('brilliant', 0.9883275031391461), ('skilled', 0.9883275031391461)]
Neighbors of nurse: [('offered', 0.9986717018144635), ('patient', 0.987963449140842), ('support', 0.9741913504659862)]
Neighbors of engineer: [('doctor', 0.999999982255126), ('skilled', 0.9883275031391461), ('brilliant', 0.9883275031391457)]
Neighbors of artists: [('create', 0.9996331580542107), ('inspiring', 0.994443644318663), ('work', 0.980440427739844)]


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 14.5](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Distributed Representations of Words and Phrases and their Compositionality](https://papers.nips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf)
- [A Scalable Hierarchical Distributed Language Model](https://papers.nips.cc/paper/2008/hash/1e056d2b0ebd5c878c550da6ac5d3724-Abstract.html)
- [Neural Word Embedding as Implicit Matrix Factorization](https://papers.nips.cc/paper/2014/file/feab05aa91085b7a8012516bc3533958-Paper.pdf)
- [GloVe: Global Vectors for Word Representation](https://www.aclweb.org/anthology/D14-1162/)
- [Enriching Word Vectors with Subword Information](https://arxiv.org/abs/1607.04606)
- [Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings](https://papers.nips.cc/paper/2016/file/a486cd07e4ac3d270571622f4f316ec5-Paper.pdf)
- [Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings](https://www.aclweb.org/anthology/N19-1062/)
- [Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them](https://www.aclweb.org/anthology/N19-1061/)
- [Deep Unordered Composition Rivals Syntactic Methods for Text Classification](https://www.aclweb.org/anthology/P15-1162/)


*Links only; we do not redistribute slides or papers.*