## Visualizing Word Embeddings in TensorBoard



In the last section, we learned how to build word2vec model for generating word embeddings using gensim. 
Now, we will see how to visualize those embeddings using TensorBoard. Visualizing word embeddings help us to understand the projection space and also helps us to easily validate the embeddings. TensorBoard provides us a built-in visualizer called the embedding projector for interactively visualizing and analyzing the high-dimensional data like our word embeddings. We will learn how can we use the tensorboard's projector for visualizing the word embeddings step by step. 


Import the required libraries:

In [1]:
import warnings
warnings.filterwarnings(action='ignore')


import tensorflow as tf
from tensorflow.contrib.tensorboard.plugins import projector
tf.logging.set_verbosity(tf.logging.ERROR)

import numpy as np
import gensim
import os

Load the saved model:

In [2]:
file_name = "model/word2vec.model"
model = gensim.models.keyedvectors.KeyedVectors.load(file_name)

Once after loading the model, we will save length of the vocaublary (number of words in our vocabulary) into a variable called max_size:

In [3]:
max_size = len(model.wv.vocab)-1

We learned that the dimension of word vectors will be $ V \times N$. That is, Length of the vocabulary ($V$) $\times$ Number of neurons in the hidden layer ($N$). So, we initialize a matrix named  w2v with the shape as max_size which is the vocabulary size and the model's first layer size which is the number of neurons in the hidden layer:

In [4]:
w2v = np.zeros((max_size,model.layer1_size))

Now we create a new file called metadata.tsv where we save all the words in our model and we also store the embedding of each word in the w2v matrix:

In [5]:
if not os.path.exists('projections'):
    os.makedirs('projections')
    
with open("projections/metadata.tsv", 'w+') as file_metadata:
    
    for i, word in enumerate(model.wv.index2word[:max_size]):
        
        #store the embeddings of the word
        w2v[i] = model.wv[word]
        
        #write the word to a file 
        file_metadata.write(word + '\n')

Next, we initialize the tensorflow session:

In [6]:
sess = tf.InteractiveSession()

Initialize the tensorflow variable called embeddings that holds the word embeddings:

In [7]:
with tf.device("/cpu:0"):
    embedding = tf.Variable(w2v, trainable=False, name='embedding')

Initialize all variables:

In [8]:
tf.global_variables_initializer().run()

Create an object to the saver class which is actually used for saving and restoring variables to and from our checkpoints:

In [9]:
saver = tf.train.Saver()

Using FileWriter, we save our summaries and events to our event file: 

In [10]:
writer = tf.summary.FileWriter('projections', sess.graph)

Initialize the projectors and add the embeddings:

In [11]:
config = projector.ProjectorConfig()
embed= config.embeddings.add()

Next, we specify our tensor_name as embedding and metadata_path to the metadata.tsv file where we have the words:

In [12]:
embed.tensor_name = 'embedding'
embed.metadata_path = 'metadata.tsv'

And finally, save the model:

In [13]:
projector.visualize_embeddings(writer, config)

saver.save(sess, 'projections/model.ckpt', global_step=max_size)

'projections/model.ckpt-28070'

Now, open the terminal and type the following command to open the tensorboard, 

tensorboard --logdir=projections --port=8000

Thus, visualizing word embeddings in TensorBoard helps us to easily validate them. In the next section, We will how to convert paragraphs/documents to vectors using two different algorithms called PV-DM and PV-DBOW.