## Glove vectors
**Glove** vectors is another ML technique to learn vector representations of words. Difference between **GloVe** vectors and **word2vec** is word2vec looks at a moving window of context around a word. **Glove** vector looks at the full co-occurrence matrix. Both methods give similar results. 

**Glove** vector's were first proposed in the following papers

https://nlp.stanford.edu/pubs/glove.pdf

Below we will download pre-trained **Glove** vectors on Wikipedia corpus, and we will study some of the semantic relationships between them. **Glove** vectors can be downloaded from here https://nlp.stanford.edu/projects/glove/

In [1]:
import numpy as np
import pandas as pd
import csv
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
#Load pre traines glove vector
words = pd.read_table("./glove.6B.50d.txt", sep=" ", index_col=0, header=None, quoting=csv.QUOTE_NONE)

In [3]:
#lets write a function to find closest word vector
def cosine_similarity(A, B):
    dot = np.dot(A,B)
    norma = np.linalg.norm(A)
    normb = np.linalg.norm(B)
    cos = dot/(norma*normb)
    return cos

def get_analogy(word1, anology1, word2, embeddings):
    group = set((word1, anology1, word2))
    word1_emb = embeddings.loc[word1]
    anology1_emb = embeddings.loc[anology1]
    word2_emb = embeddings.loc[word2]
    vec = (anology1_emb - word1_emb + word2_emb).values
    vec = vec.reshape((1,-1))
    dot = words.values * vec
    norma = np.linalg.norm(vec)
    normb = np.linalg.norm(words.values,axis=1)
    cos_sim = np.sum(dot,axis=1)/(normb*norma)
    top_five = words.index[np.argsort(cos_sim)[::-1][0:5]]
    for c in top_five:
        if c not in group:
            return c
    

In [4]:
print(get_analogy('man', 'king', 'woman', words))
print(get_analogy('man', 'boy',  'woman', words))
print(get_analogy('india', 'delhi', 'china', words))
print(get_analogy('car', 'road', 'ship', words))
print(get_analogy('car', 'road', 'airplane', words))

queen
girl
beijing
harbour
route


You can see from above this simple model can predict some very interesting analogies.