In [11]:
import numpy as np

### Word2Vec Embedding Method

* Problem with tf-idf word-context embedding
    * Long vectors
    * Sparse or very close to 0
    * Idea we want denser representation that is relatively short
     * Dense vectors work better in machine learning
       * Better lower-dimensional representations
       * Generalize better 


### Taxonomy to Encode Similarity
* The lack of context can be added using a taxonomy
A taxonomy is a "knowledge organization system," or a data structure (typically a tree or graph) that encodes the terms used in a subject field and their relationships
  * can be generic (eg. English Langue taxonomy) or Specific (Computing taxonomy or Amazon Product Taxonomy. 
* E.g., the WordNet Taxonomy
  * Contains 155 327 English words
    * https://en.wikipedia.org/wiki/WordNet

![](https://www.dropbox.com/s/9kapn0eq6v84g2m/word_net_taxonomy.png?dl=1)
https://escholarship.org/content/qt9j8221x8/qt9j8221x8.pdf

### Problem with Discrete Representations
* Do not convey all the relationships 
    * Coffee and Cup are in completely different subtrees of the taxonomy
* Terribly incomplete
* Adding new words requires the work of a taxonomist
  * Building such taxonomies is complex and complicated
* Subjective: depend on a user's experience and prior-belief    
  * Subject sometimes of heated debates

### Why Word2Vec 

* So far, we assigned IDs to words using their positions in the text

* What is the meaning of a word?
  * The idea that is represented by a word
     * By using words, signs, etc., a person expresses an idea.
    
* Ideally, we would like to be able to carry out operations on the meaning of words
 * King - Man + Woman ~ Queen

### Explaining Operations on words using Operation on Colors

* It turns out that we can already something similar on words representing colors

* Ideally, we want to reason about words the same way we reason about colors
  * Ex. Red + green = yellow 
  * Blue - Magenta = Cyan
  * Yellow is closer to banana (yellow color) than to green
  * Grey is the average of black and white
  * Royal is to yellow what sky is to blue
  
![](https://www.dropbox.com/s/aon76xh7qlu1z2y/colors.png?dl=1)

* Inspired by (https://gist.github.com/aparrish/2f562e3737544cf29aaf1af30362f469)




In [2]:
int("FF", 16)

255

### Operation on Colors

* RGB notation can be considered an embedding for the color name.
  * Each color name has 3 values representing the level red, green, and blue.
* Those values can be used to find similar colors but also to describe more complex relationships
  * The colors we will be working with can be viewed [here](https://xkcd.com/color/rgb/)
* Keep in mind that the colors were labeled by participants in the `xkcd` color name survey.
  * Color names and RGB values may not match colors and values
  * We chose this unexact color representation to convey the inexactitude of working with words

### XKCD and Emacs

![](https://imgs.xkcd.com/comics/real_programmers.png)

In [5]:
import json
color_data = json.loads(open("data/xkcd.json").read())
print(color_data.keys())
print("-" * 40)
print(color_data["colors"][:3])


dict_keys(['description', 'colors'])
----------------------------------------
[{'color': 'cloudy blue', 'hex': '#acc2d9'}, {'color': 'dark pastel green', 'hex': '#56ae57'}, {'color': 'dust', 'hex': '#b2996e'}]


In [6]:
x = '#acc2d9'
from textwrap import wrap
wrap(x[1:], 2)

['ac', 'c2', 'd9']

In [7]:
colors = {col_info["color"]:tuple(wrap(col_info["hex"][1:], 2)) for col_info in color_data["colors"]}
colors["cloudy blue"]

('ac', 'c2', 'd9')

In [12]:
int('ac', 16), int('c2', 16), int('d9', 16), 

(172, 194, 217)

In [13]:
colors = {name:np.array(list(int(hex_v, 16) for hex_v in hex_t)) for name,hex_t in colors.items()}
colors["cloudy blue"]

array([172, 194, 217])

In [14]:
print("These colors were manually labelled by participants")
print(f"Black is {colors['black']}, white is: {colors['white']} and red is {colors['red']}")


These colors were manually labelled by participants
Black is [0 0 0], white is: [255 255 255] and red is [229   0   0]


![](https://www.dropbox.com/s/9k2828pyr0nypla/red.png?dl=1)

In [15]:
np.array([1,2,3]) + np.array([1,1,1])

array([2, 3, 4])

In [16]:
np.array([1,2,3]) - np.array([1,1,1])

array([0, 1, 2])

In [17]:
# import numpy as np
# Compute the Euclidean Distance in numpy
def dist(coord1, coord2):
    # Euclidean distance in numpy. 
    # Function name is explicit and shorter to write
    return np.linalg.norm(coord1 - coord2)
    
dist(colors['red'], colors['blue'])



324.49036965678965

In [18]:
np.mean([[1,2,3], [1,2,3], [1,2,3]], axis =0)

array([1., 2., 3.])

In [133]:
dist(colors['red'], colors['green']) > dist(colors['red'], colors['pink'])

True

In [137]:
def closest(colors, coord, n=10):
    closest = []
    for key in sorted(colors.keys(),
                        key=lambda x: distance(coord, colors[x]))[:n]:
        closest.append(key)
    return closest

closest(colors, colors['red'], n=5)

['red', 'fire engine red', 'bright red', 'tomato red', 'cherry red']

### Using The Color "Embeddings"

* We can use the embeddings to carry out operations on colors
  * recall that the operations we were interested in:
     * Ex. Red + green = yellow 
     * Magenta - blue = (red + blue) - blue = red
     * Yellow is closer to Royal than to Green
     * Grey is the average of black and white
     * Banana is to yellow what hunter green is to green


In [141]:
### Red + green = yellow  
some_color = colors["red"] + colors["green"]
closest(colors, some_color, n=5)

['squash', 'orangey yellow', 'yellowish orange', 'saffron', 'amber']

In [144]:
some_color = colors["magenta"] - colors["blue"]
closest(colors, some_color, n=5)


['red', 'deep red', 'blood red', 'darkish red', 'dark red']

In [147]:
dist(colors["yellow"], colors["banana"]) < dist(colors["yellow"], colors["green"])


True

In [150]:
some_color =  np.mean([colors['black'], colors['white']], axis=0)
closest(colors, some_color, n=5)


['medium grey', 'purple grey', 'steel grey', 'battleship grey', 'grey purple']

In [None]:
some_color =  np.mean([colors['black'], colors['white']], axis=0)
closest(colors, some_color, n=5)


In [152]:
some_color

array([  0,   0, 106])

In [161]:
some_color = colors['yellow'] - colors['banana'] + colors['green']

closest(colors, some_color, n=5)


['true green',
 'grassy green',
 'vibrant green',
 'grass green',
 'dark grass green']

### Relationship Between Colors

* Banana yellow is to yellow what Hunter Green
  * Derived from the exact diagram
  
```
some_color = colors['yellow'] - colors['banana'] + colors['green']
closest(colors, some_color, n=5)

['true green',
 'grassy green',
 'vibrant green',
 'grass green',
 'dark grass green']
```
![](https://www.dropbox.com/s/tjgnw6cwf0kwju8/green_prediction.png?dl=1)

### From Colors to Words

* The operations on colors seen above are facilitated by the embedding which conveys a meaningful representation of the data.
* Similarly, Word2Vec conveys meaningful representations of words
  * Each distinct word has a distinct vector
  * Mathematical operations on the vectors convey semantic similarity 
    * Semantic means as they relate to the meaning in language
  
    

### Word2Vec Intuition

* "You shall know a word by the company it keeps" J.R Firth, Studies in Linguistic Analysis 1957

*  Paris is a city and the _ of France
  * Which of the words can be used to fill in the gap?
     * pretzel, pizza, capital, painting, shame

* We can answer by predicting the likelihood of observing the works in a given context

* You could understand in which context a word should arise, then you understand the meaning of a word.
  * Very powerful concept that is used for building language models
    





### Language Model


* A language model is central to many important natural language processing tasks.    

```Language modeling is the task of assigning a probability to sentences in a language. […] Besides assigning a probability to each sequence of words, the language models also assign a probability for the likelihood of a given word (or a sequence of words) to follow a sequence of words. ``` 

        From `Neural Network Methods in Natural Language Processing,`
* Word2Vec is not a language model but is a good tool for reasoning about the semantic relationship between words


# Word2VEc Intuition

* Given a large corpus (“body”) of text 
 * words in a fixed vocabulary are represented by a vector
* Go through each position t in the text, which has a center word c and context 
(“outside”) words o
* Use the similarity of the word vectors for c and o to calculate the probability of o given 
c (or vice versa)
* Keep adjusting the word vectors to maximize this probability

### Country and Capital Representations
* The sort of relationship we will observe in a corpus between Pairs and France will be similar to that observed between Tokyo and Japan 

 * Paris is the capital of France
 * X is the embassador of France in Pairs.
 * The summit was held at the residence of the French head of government in Paris 
* We can imagine scenarios where the sentences above can be used for any of the country's capitals.
![](https://www.dropbox.com/s/mwdh6z9qc0pflyy/capitals_example.png?dl=1)

See demo at: https://projector.tensorflow.org/


### From Linear Regression to Neural Networks


![](https://www.dropbox.com/s/1irycgrzu6rtk8c/simple_network.png?dl=1)

From [https://www.freecodecamp.org](https://www.freecodecamp.org/news/deep-learning-neural-networks-explained-in-plain-english/)

### Non-Linear Regression

![](https://www.dropbox.com/s/afjm2oqqa4nh7k8/non_linear.png?dl=1)

Neurons in the hidden contain non-linear transformations(activation function) that allow the model to mimim a non-lienar function

From [https://www.freecodecamp.org](https://www.freecodecamp.org/news/deep-learning-neural-networks-explained-in-plain-english/)



### Neural Networks For classification

![](https://www.dropbox.com/s/zv6vp6u6gghyw2h/network_classification.png?dl=1)
from[https://towardsdatascience.com/coding-up-a-neural-network-classifier-from-scratch-977d235d8a24](https://towardsdatascience.com/coding-up-a-neural-network-classifier-from-scratch-977d235d8a24)

### Skipgram Algorithms

![](https://www.dropbox.com/s/ykyjsroxu1utwd0/skipgram.png?dl=1)

### The CBOW Algorithm
![](https://www.dropbox.com/s/sae7f1sp84xuwwy/cbow.png?dl=1)

### Word2Vec Versus More Modern Representations

* Word2Vec are Static Embedding


* There is a unique embedding for each word
  * As opposed to recent methods that are context specific
  * Typically based on complex Deep Learning methods
  

* Fast and easy to compute 
  * no need to use labelled text. We use runnig text as examples
