# Explore Words Similarity with Embeddings

This tutorial explores the word similarities with respect to the learnt embeddings.

The following are the steps of this tutorial:

1. Implement Cosine similarity function
2. Load learnt word embeddings
3. Get top similar words given a word


<a href="https://colab.research.google.com/github/ksalama/data2cooc2emb2ann/blob/master/text2emb/03-Explore_Word_Similarity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Setup

In [1]:
import os
import numpy as np

In [2]:
WORKSPACE = './workspace'
embeddings_file_path = os.path.join(WORKSPACE,'embeddings.tsv')

## 1. Consine Similarity Function

In [3]:
def calculate_consine_similarty(emb1, emb2):
    return np.dot(emb1, emb2)/(np.linalg.norm(emb1) * np.linalg.norm(emb2))
    

## 2. Load Embeddings

In [4]:
def load_embeddings(embedding_file_path):
    embedding_lookup = {}
    with open(embeddings_file_path) as embedding_file:
        for line in embedding_file:   
            parts = line.split('\t')
            word = parts[0]
            embedding = [float(v) for v in parts[1:]]
            embedding_lookup[word] = embedding
    return embedding_lookup


In [5]:
embedding_lookup = load_embeddings(embeddings_file_path)
len(embedding_lookup)

4632

## 3. Get Top Similar Words

In [6]:
def top_similar(word, k):
    outputs = []
    
    input_word_embedding = embedding_lookup[word.lower()]
    
    for word in embedding_lookup:
        embedding = embedding_lookup[word]
        similarity = calculate_consine_similarty(input_word_embedding, embedding)
        outputs.append((similarity, word))

    return sorted(outputs, reverse=True)[:k]
    

In [18]:
words = ['man', 'girl', 'happy', 'sad', 'movie', 'good', 'king', 'car']
for word in words:
    print("Input word: {}".format(word))
    print("==================")
    print([item[1] for item in top_similar(word, 10)])
    print("")

Input word: man
['man', 'named', 'woman', 'person', 'boy', 'guy', 'young', 'girl', 'older', 'who']

Input word: girl
['girl', 'boy', 'woman', 'young', 'named', 'teenage', 'daughter', 'beautiful', 'man', 'sexy']

Input word: happy
['happy', 'quick', 'sad', 'horny', 'bored', 'believing', 'awake', 'listening', 'ending', 'roll']

Input word: sad
['sad', 'ending', 'happy', 'touching', 'sweet', 'regardless', 'truth', 'uplifting', 'incredibly', 'strangely']

Input word: movie
['movie', 'film', 'this', 'it', 'movies', 'horror', 'so', 'just', 'but', 'films']

Input word: good
['good', 'pretty', 'bad', 'very', 'great', 'job', 'decent', 'guy', 'funny', 'but']

Input word: king
['king', 'stephen', 'arthur', 'captain', 'jimmy', 'hopper', 'kennedy', 'eugene', 'george', 'philip']

Input word: car
['car', 'chase', 'chases', 'accident', 'crash', 'boat', 'crashes', 'driving', 'gun', 'foot']

