# Applying Embeddings, Deep Averaging Networks

- 📺 **Video:** [https://youtu.be/3pwwdHuH0I4](https://youtu.be/3pwwdHuH0I4)

## Overview
- Explore deep averaging networks (DANs) that compose word embeddings by averaging then feeding through layers.
- Appreciate how pre-trained embeddings transfer to downstream classification tasks.

## Key ideas
- **Averaging:** simple pooling of embeddings provides surprisingly strong baselines.
- **Nonlinear layers:** additional dense layers refine averaged vectors.
- **Transfer learning:** initialize with pre-trained embeddings to reduce data requirements.
- **Regularization:** dropout applied to embeddings combats co-adaptation.

## Demo
Average pre-trained-like embeddings from our toy corpus and pass them through a small neural reader to classify sentiment, mirroring the lecture (https://youtu.be/USkdJfi4V64).

In [1]:
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

corpus = [
    'she is a skilled doctor and compassionate leader',
    'he is a brilliant engineer and creative designer',
    'the nurse offered patient support and kindness',
    'the manager coordinated the project with precision',
    'artists create inspiring work with emotion and style',
    'scientists test hypotheses with rigorous experiments',
    'teachers guide students with patience and care',
    'the programmer solved complex problems quickly'
]

vocab = sorted(set(' '.join(corpus).split()))
word_to_id = {word: idx for idx, word in enumerate(vocab)}
emb_dim = 10
rng = np.random.default_rng(42)
embeddings = rng.normal(scale=0.3, size=(len(vocab), emb_dim))

train_texts = [
    'compassionate nurse offered support',
    'creative artist delivered stunning work',
    'patient teacher inspired students',
    'the experiment succeeded with rigor',
    'the engineer solved complex design',
    'kind doctor reassured everyone',
    'the manager mismanaged the schedule',
    'the designer delivered a clumsy layout'
]
labels = [1, 1, 1, 1, 1, 1, 0, 0]

features = []
for sentence in train_texts:
    ids = [word_to_id[word] for word in sentence.split() if word in word_to_id]
    if not ids:
        features.append(np.zeros(emb_dim))
        continue
    avg = embeddings[ids].mean(axis=0)
    hidden = np.tanh(avg)
    features.append(hidden)

features = np.vstack(features)
clf = LogisticRegression(max_iter=2000, random_state=0)
clf.fit(features, labels)

pred = clf.predict(features)
print(classification_report(labels, pred, digits=3))


              precision    recall  f1-score   support

           0      0.000     0.000     0.000         2
           1      0.750     1.000     0.857         6

    accuracy                          0.750         8
   macro avg      0.375     0.500     0.429         8
weighted avg      0.562     0.750     0.643         8



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 14.5](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Distributed Representations of Words and Phrases and their Compositionality](https://papers.nips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf)
- [A Scalable Hierarchical Distributed Language Model](https://papers.nips.cc/paper/2008/hash/1e056d2b0ebd5c878c550da6ac5d3724-Abstract.html)
- [Neural Word Embedding as Implicit Matrix Factorization](https://papers.nips.cc/paper/2014/file/feab05aa91085b7a8012516bc3533958-Paper.pdf)
- [GloVe: Global Vectors for Word Representation](https://www.aclweb.org/anthology/D14-1162/)
- [Enriching Word Vectors with Subword Information](https://arxiv.org/abs/1607.04606)
- [Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings](https://papers.nips.cc/paper/2016/file/a486cd07e4ac3d270571622f4f316ec5-Paper.pdf)
- [Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings](https://www.aclweb.org/anthology/N19-1062/)
- [Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them](https://www.aclweb.org/anthology/N19-1061/)
- [Deep Unordered Composition Rivals Syntactic Methods for Text Classification](https://www.aclweb.org/anthology/P15-1162/)


*Links only; we do not redistribute slides or papers.*