## Venezuela Word2Vec Example

In [1]:
import gensim

Next we will load the word2vec model, Google News

In [2]:
model = gensim.models.KeyedVectors.load_word2vec_format('./GoogleNews-vectors-negative300.bin', binary=True)

Next, we want the result vector of 15 words most similar to "Venezuela" and then displayed in columns. You can change 15 to however many data points you want displayed.

In [3]:
import pandas as pd
result = model.most_similar (positive=['Venezuela'],topn=15)

df = pd.DataFrame(result)
df_fixed = df.to_string(index=False, header=False)
print(df_fixed)

  Venezuelan  0.826737
     Bolivia  0.745722
     Ecuador  0.719962
    Colombia  0.696936
      Chávez  0.689667
      Chavez  0.686324
   Nicaragua  0.681040
  Vene_zuela  0.680571
 Venezuelans  0.677943
   Venezuala  0.674734
    Venzuela  0.672085
        Cuba  0.670436
    Venezula  0.664404
     Caracas  0.664318
   Venezeula  0.661068


With that result, we need to fit and transform the data 

In [4]:
from sklearn.decomposition import PCA
import numpy as np

X = {}
labels=[]

for item in result:
    key = item[0]
    X[key] = model.vocab[key]
    labels.append(key)
                          
X = model[X]

pca = PCA(n_components=3)
result = pca.fit_transform(X)

In [5]:
# Check that your result returns the right shape
print(result.shape)
# You should get 15 (or whatever topn was equal to) data points and 3-dimensions

(15, 3)


Next, we want to set up that data for the 3D visualization. First we need to create a metadata file and put our data into it.

In [6]:
import os
import tensorflow as tf
from tensorflow.contrib.tensorboard.plugins import projector
# Un-comment out the next two lines, if the last cell has an issue with renaming the files
# This might happen when you re-run all of the cells
#import shutil
#shutil.rmtree('C:\\Users\josie\logs') 

LOG_DIR = 'logs'
metadata = os.path.join(LOG_DIR, 'metadata.tsv')

images = tf.Variable(result)

In [7]:
# Writes a metadata file, labels the points that are projected
with open(metadata,'w', encoding="utf-8") as f:
    f.write ("\"Index\"\t\"Label\"\n")
    
    c = 0
    for label in labels:
        text = str(c) + "\t" + '"' + label + '"' + "\n"
        f.write (text)
        c = c +1

In [9]:
# Saves the files and configures embedding projector
with tf.Session() as sess:
    saver = tf.train.Saver([images])

    sess.run(images.initializer)
    saver.save(sess, os.path.join(LOG_DIR, 'images.ckpt'))

    config = projector.ProjectorConfig()
    embedding = config.embeddings.add()
    embedding.tensor_name = images.name
    embedding.metadata_path = 'metadata.tsv'
    projector.visualize_embeddings(tf.summary.FileWriter(LOG_DIR), config)

You can open your metadata file with your notepad to check it configured correctly. You should see the headers "Index" and "Label" with their corresponding values

In [10]:
# In a terminal opened from your Anaconda environment, run tensorboard --logdir=logs --host localhost
# Once that runs, open http://localhost:6006/ on your web browser