# Explore Words Similarity with Embeddings

This tutorial explores the word similarities with respect to the learnt embeddings.

The following are the steps of this tutorial:

1. Implement Cosine similarity function
2. Load learnt word embeddings
3. Get top similar words given a word


<a href="https://colab.research.google.com/github/ksalama/data2cooc2emb2ann/blob/master/text2emb/03-Explore_Word_Similarity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Setup

In [None]:
import os
import numpy as np

In [None]:
WORKSPACE = './workspace'
embeddings_file_path = os.path.join(WORKSPACE,'embeddings.tsv')

## 1. Consine Similarity Function

In [None]:
def calculate_consine_similarty(emb1, emb2):
    return np.dot(emb1, emb2)/(np.linalg.norm(emb1) * np.linalg.norm(emb2))
    

## 2. Load Embeddings

In [None]:
def load_embeddings(embedding_file_path):
    embedding_lookup = {}
    with open(embeddings_file_path) as embedding_file:
        while True:
            line = embedding_file.readline()
            if not line: break
                
            parts = line.split('\t')
            word = parts[0]
            embedding = [float(v) for v in parts[1:]]
            embedding_lookup[word] = embedding
    return embedding_lookup
            
        

In [None]:
embedding_lookup = load_embeddings(embeddings_file_path)
len(embedding_lookup)

## 3. Get Top Similar Words

In [None]:
from bisect import insort

def top_similar(word, k):
    outputs = []
    
    input_word_embedding = embedding_lookup[word.lower()]
    for word in embedding_lookup:
        embedding = embedding_lookup[word]
        similarity = calculate_consine_similarty(input_word_embedding, embedding)
        insort(outputs, (round(similarity, 3), word))

    return sorted(outputs, reverse=True)[:k]
    

In [None]:
word = 'king'
top_similar(word, 15)