<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Embeddings" data-toc-modified-id="Embeddings-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Embeddings</a></span></li><li><span><a href="#Resources" data-toc-modified-id="Resources-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Resources</a></span></li><li><span><a href="#Word2Vec" data-toc-modified-id="Word2Vec-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Word2Vec</a></span><ul class="toc-item"><li><span><a href="#Skip-Gram-Model" data-toc-modified-id="Skip-Gram-Model-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Skip-Gram Model</a></span></li></ul></li><li><span><a href="#GloVe---Global-Vectors-for-Word-Representation" data-toc-modified-id="GloVe---Global-Vectors-for-Word-Representation-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>GloVe - Global Vectors for Word Representation</a></span></li><li><span><a href="#What-Can-We-Do-with-Embeddings?" data-toc-modified-id="What-Can-We-Do-with-Embeddings?-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>What Can We Do with Embeddings?</a></span><ul class="toc-item"><li><span><a href="#Genism-Library" data-toc-modified-id="Genism-Library-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Genism Library</a></span></li><li><span><a href="#Visualize-with-t-SNE-(t-Distributed-Stochastic-Neighbor-Embeddings)" data-toc-modified-id="Visualize-with-t-SNE-(t-Distributed-Stochastic-Neighbor-Embeddings)-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Visualize with t-SNE (t-Distributed Stochastic Neighbor Embeddings)</a></span></li><li><span><a href="#Transfer-Learning" data-toc-modified-id="Transfer-Learning-5.3"><span class="toc-item-num">5.3&nbsp;&nbsp;</span>Transfer Learning</a></span></li></ul></li></ul></div>

# Embeddings

<img src='https://developers.google.com/machine-learning/crash-course/images/linear-relationships.svg' width=100%/>

- Convert words into a vector space
    + Mathematical object
- It's all about closeness
    + Distributional Hypothesis: https://en.wikipedia.org/wiki/Distributional_semantics#Distributional_hypothesis

# Resources

- Kaggle Tutorial on Embeddings (using movie ratings):
    * [Embeddings Intro](https://www.kaggle.com/colinmorris/embedding-layers)
    * [Matrix Factorization](https://www.kaggle.com/colinmorris/matrix-factorization)
    * [Using Gensim to Explore Embeddings](https://www.kaggle.com/colinmorris/exploring-embeddings-with-gensim)
    * [Visualizing with t-SNE](https://www.kaggle.com/colinmorris/visualizing-embeddings-with-t-sne)
- Google Embedding Crash Course: https://developers.google.com/machine-learning/crash-course/embeddings

<img src='https://developers.google.com/machine-learning/crash-course/images/EmbeddingExample3-1.svg' width=60%/>

# Word2Vec

## Skip-Gram Model

We essentially predict the words that will predict the words around it

<img src='https://adventuresinmachinelearning.com/wp-content/uploads/2017/07/Word2Vec-softmax.jpg'/>

- Train the MLP to find the best weights (context) to map word-to-word
- But since words close to another usually contain context, we're _really_ teaching it context in those weights
- Gut check: similar contexted words can be exchanged
    + EX: "A fluffy **dog** is a great pet" <--> "A fluffy **cat** is a great pet"

Each word will have a vector of contexts: the embeddings!

# GloVe - Global Vectors for Word Representation

- Create a matrix of probability of word $w_i$ occurs in **context** of word $w_j$ for whole corpus $P(i | j)$
- For each word, find  vectors when $w_i \cdot w_j = P(i|j)$
- Train to achieve good vectors

<img src='https://cdn-images-1.medium.com/max/800/1*UNtsSilztKXjLG99VXxSQw.png' />

# What Can We Do with Embeddings?

## Genism Library

- `sentences`: dataset to train on
- `size`: how big of a word vector do we want
- `window`: how many words around the target word to train with
- `min_count`: how many times the word shows up in corpus; we don't want words that are rarely used
- `workers`: number of threads (individual task "workers")


```python
from gensim.models import Word2Vec

# Let's assume we have our text corpus already tokenized and stored inside the variable 'data'--the regular text preprocessing steps still need to be handled before training a Word2Vec model!

model = Word2Vec(data, size=100, window=5, min_count=1, workers=4)

model.train(data, total_examples=model.corpus_count)
```

Can also do some interesting (fun) stuff to explore like analogies: https://www.kaggle.com/colinmorris/exploring-embeddings-with-gensim#Exploring-embeddings-with-Gensim

## Visualize with t-SNE (t-Distributed Stochastic Neighbor Embeddings)

Dimensionality reduction (like PCA)

Tries to maintain relative distances (also works for images as well as words)

Can identify relationships and bugs

![t-sne example](images/t-sne_example.png)
> Example image from _Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd ed)_ GitHub repo (https://github.com/ageron/handson-ml2/blob/master/08_dimensionality_reduction.ipynb)

**Example of t-SNE visualization:** https://www.kaggle.com/colinmorris/visualizing-embeddings-with-t-sne

## Transfer Learning

- Usually embeddings are hundreds of dimensions
- Just use the word embeddings already learned from before!
    + Unless very specific terminology, context will likely carry within language
- Comparable to CNN transfer learning