## Word2Vec and CBOW (Continuous Bag of Words) 

**Semantic Ambiguity**
- We want same things to share the same weight.
- We need to learn how words are related to do this.
    - And to do this, we need a lot of labelled data.
- Trend
    - Similar words tend to occur in similar contexts.
    - ![](cbow1.png)
        - If the model can predict a word's context would treat "cat" and "kitty" similarly.

**Embeddings**
- We are going to map words to small vectors (embeddings).
    - Close to each other for words with same meanings.
    - Far apart when they have different meanings.
    - Embedding solves some of the sparsity problem.
    - We have a word representation where all the cat-like things are represented by vectors that are similar.
    - It can generalize from this pattern of cat-like things.
    - ![](cbow2.png)

**Word2Vec**
- Simple model to embed words.
- For each word in the sentence, we'll map it to an embedding.
    - We'll use the embedding to predict the context of the word.
    - The context: word nearby (in the window).
    - We'll use a simple logistic regression to predict the context (word nearby) as like any supervised learning problem where our context is our target.
    - ![](cbow3.png)

**t-SNE**
- One way we see embeddings are clustering is to use a nearest neighbour look-up.
    - ![](cbow4.png)
- Another way is to reduce dimensionality to 2d.
    - Normally we would use PCA. But this would lose critical information.
        - ![](cbow5.png)
    - We can preserve the neighbourhood structure using t-SNE.
        - ![](cbow6.png)

**Further Readings**
- [Visualizing Data using t-SNE](http://jmlr.csail.mit.edu/papers/volume9/vandermaaten08a/vandermaaten08a.pdf)

**Word2Vec Important Notes**
1. Comparing Embeddings.
    - Normally we would use the L2 way of calculating the distance between 2 vectors.
    - But it would be better to measure the closeness to use cosine because the length is not relevant.
2. Instead of comparing our softmax results with our labelled data, we can sample the words that are not the targets.
    - Pick only a handful of them and act other words are not there.
    - This is called softmax where we make things faster with no performance issues.
        - ![](cbow7.png)

**Word Analogies**
- ![](cbow8.png)
- ![](cbow9.png)