### Intro

This idea that a word’s meaning can be understood by its context, or the words that surround it, is the basis for word embeddings. A word embedding is a representation of a word as a numeric vector, enabling us to compare and contrast how words are used and identify words that occur in similar contexts.

The applications of word embeddings include:
- entity recognition in chatbots
- sentiment analysis
- syntax parsing

### Vectors
Vectors can be many things in many different fields, but ultimately they are containers of information. Depending on the size, or the dimension, of a vector, it can hold varying amounts of data.

The simplest case is a 1-dimensional vector, which stores a single number. Say we want to represent the length of a word with a vector. We can do so as follows:

"cat" ----> [3]
"scrabble" ----> [8]
"antidisestablishmentarianism" ----> [28]

Instead of looking at these three words with our own eyes, we can compare the vectors that represent them by plotting the vectors on a number line.one-dimensional number line with vectorsWe can clearly see that the “cat” vector is much smaller than the “scrabble” vector, which is much smaller than the “antidisestablishmentarianism” vector.

Now let’s say we also want to record the number of vowels in our words, in addition to the number of letters. We can do so using a 2-dimensional vector, where the first entry is the length of the word, and the second entry is the number of vowels:

"cat" ----> [3, 1]
"scrabble" ----> [8, 2]
"antidisestablishmentarianism" ----> [28, 11]

To help visualize these vectors, we can plot them on a two-dimensional grid, where the x-axis is the number of letters, and the y-axis is the number of vowels.two-dimensional grid with vectors

Here we can see that the vectors for “cat” and “scrabble” point to a more similar area of the grid than the vector for “antidisestablishmentarianism”. So we could argue that “cat” and “scrabble” are closer together.

While we have shown here only 1-dimensional and 2-dimensional vectors, we are able to have vectors in any number of dimensions. Even 1,000! The tricky part, however, is visualizing them.

Vectors are useful since they help us summarize information about an object using numbers. Then, using the number representation, we can make comparisons between the vector representations of different objects!

### What is a Word Embedding?
Word embeddings are vector representations of a word.
They allow us to take all the information that is stored in a word, like its meaning and its part of speech, and convert it into a numeric form that is more understandable to a computer.

For example, we could look at a word embedding for the word “peace”.

[5.2907305, -4.20267, 1.6989858, -1.422668, -1.500128, ...]

Here “peace” is represented by a 96-dimension vector, with just the first five dimensions shown. Each dimension of the vector is capturing some information about how the word “peace” is used. We can also look at a word embedding for the word “war”:

[7.2966490, -0.52887750, 0.97479630, -2.9508233, -3.3934135, ...]

By converting the words “war” and “peace” into their numeric vector representations, we are able to have a computer more easily compare the vectors and understand their similarities and differences.

We can load a basic English word embedding model using `spaCy` as follows:

`nlp = spacy.load('en')`

**Note**: the convention is to load spaCy models into a variable named nlp.

To get the vector representation of a word, we call the model with the desired word as an argument and can use the .vector attribute.

`nlp('love').vector`

But how do we compare these vectors? And how do we arrive at these numeric representations?

### Distance
The key at the heart of word embeddings is distance. Before we explain why, let’s dive into how the distance between vectors can be measured. 
When working with vectors that have a large number of dimensions, such as word embeddings, the distances calculated by Manhattan and Euclidean distance can become rather large. Thus, calculations using cosine distance are preferred!

**Cosine distance** is concerned with **the angle between two vectors**, rather than by looking at the distance between the points, or ends, of the vectors. Two vectors that point in the same direction have no angle between them, and have a cosine distance of 0. Two vectors that point in opposite directions, on the other hand, have a cosine distance of 1.

We can easily calculate the Manhattan, Euclidean, and cosine distances between vectors using helper functions from SciPy:
```
from scipy.spatial.distance import cityblock, euclidean, cosine

vector_a = np.array([1,2,3])
vector_b = np.array([2,4,6])

# Manhattan distance:
manhattan_d = cityblock(vector_a,vector_b) # 6

# Euclidean distance:
euclidean_d = euclidean(vector_a,vector_b) # 3.74

# Cosine distance:
cosine_d = cosine(vector_a,vector_b) # 0.0
```

The idea behind word embeddings is a theory known as the **distributional hypothesis**. This hypothesis states that words that co-occur in the same contexts tend to have similar meanings. With word embeddings, we map words that exist with the same context to similar places in our vector space (math-speak for the area in which our vectors exist).

The numeric values that are assigned to the vector representation of a word are not important in their own right, but gather meaning from how similar or not words are to each other.

Thus the cosine distance between words with similar contexts will be small, and the cosine distance between words that have very different contexts will be large.

The literal values of a word’s embedding have no actual meaning. We gain value in word embeddings from comparing the different word vectors and seeing how similar or different they are.

**I am just saving this function here which will provide the closest 10 words to a word based on a word list and a vector list**


In [1]:
def find_closest_words(word_list, vector_list, word_to_check):
    return sorted(word_list,
                  key=lambda x: cosine(vector_list[word_list.index(word_to_check)], vector_list[word_list.index(x)]))[:10]