- What is Transfer Learning ?
- What are embeddings ?
- How are Word2Vec vectors learned ?
- What types of relationships do Word2Vec vectors capture ?

**Transfer Learning:**  is the concept of transfering the knowledge by using past model and training it to new dataset. When we have a small dataset but very diffcult problem wherein similar problem was earlier solved with a large dataset of similar distribtion, we can levergae the model learnt on the large dataset to train over the newer dataset. 

**Embeddings:** are those feature representations that are extracted from pre trained networks
- Are dense continous vector representation from neural networks that contain pertinent information & exhibit semantic relationships
- Can be learned in supervised or unsupervised manner
- Often useful as unsupervised data representation scheme (like PCA)
- Sometimes also are called as 'thought vectors', 'x vectors', 'feature vectors'...

In [None]:
%matplotlib notebook

import numpy as np
import numpy
from tqdm import tqdm
import gensim
import random


### Download Word2Vec Vectors [from](https://drive.google.com/uc?id=0B7XkCwpI5KDYNlNUTTlSS21pQmM&export=download)

### Download analogies data [from](http://download.tensorflow.org/data/questions-words.txt)

![alt text](https://www.tensorflow.org/images/linear-relationships.png "title")

In [None]:
model = gensim.models.KeyedVectors.load_word2vec_format('data/GoogleNews-vectors-negative300.bin', binary=True)

In [None]:
king_vector = model['king']
print(len(king_vector))
print(king_vector)

In [None]:
analogy_vector = model['king'] - model['man'] + model['queen']
print(analogy_vector)

In [None]:
# example analogy task like king - man + woman = queen
answer = model.most_similar(positive=['woman', 'king'], negative=['man'])
print("king - man + woman = {}".format(answer))

In [None]:
analogy_words = [line.rstrip('\n').split(' ') for line in open('data/questions-words.txt')]
analogy_words = [words for words in analogy_words if len(words) == 4]
np.random.seed(0)
analogy_words = random.sample(analogy_words, 100)
X = [words[:3] for words in analogy_words]
y = [words[3] for words in analogy_words]

In [None]:
print(X[0], y[0])
print(X[10], y[10])
print(X[50], y[50])

In [None]:
is_correct_list = []
top_5_predictions_list = []
for i in tqdm(range(len(X))):
    components = X[i]
    answer = y[i]
    predictions = model.most_similar(positive=[components[1], components[2]], negative=[components[0]])
    top_5_predictions = [p[0].lower() for p in sorted(predictions, key=lambda x : -x[1])[:10]]
    top_5_predictions_list.append(top_5_predictions)
    is_in_top_5 = 1.0 if answer.lower() in top_5_predictions else 0.0
    is_correct_list.append(is_in_top_5)

In [None]:
for i in range(10):
    components = X[i]
    answer = y[i]
    top5 = top_5_predictions_list[i]
    correct = is_correct_list[i]
    print("Components: {}, Answer: {} Top5: {} is_correct: {}".format(components, answer, top5, correct))

In [None]:
print("Accuracy in Analogy Task is", np.mean(is_correct_list))