# Latent Vector Exploration

The VDSH model learns latent vector representations of documents. These are then used to hash documents, but the vectors by themselves are still very useful for similarity search. This is also a good sanity check that the network is learning.

In this notebook, I pass each document through the trained encoder to get the latent vector representations, and then I use [FAISS](https://github.com/facebookresearch/faiss) for similarity search. FAISS lets us query for nearest neighbors among the nearly 300k documents near instantaneously.

In [1]:
import faiss
import numpy as np
from gensim.corpora import Dictionary
from src.utils.corpus import load_corpus, generate_tfidf
from src.models.vdsh import VDSH

Using TensorFlow backend.


In [2]:
corpus = load_corpus()
dictionary = Dictionary(corpus.bag_of_words)
dictionary.filter_extremes(no_below=100)
dictionary.compactify()
X = generate_tfidf(corpus, dictionary)
vdsh = VDSH()
vdsh.build_model(X.shape[1])
vdsh.load_weights('vdsh.hdf5')
latent_vectors = vdsh.encoder_predict(X)

In [3]:
index = faiss.IndexFlatL2(latent_vectors.shape[1])
index.add(latent_vectors)

## Querying

In [4]:
target = 283322
k = 10
D, I = index.search(latent_vectors[target].reshape((1,-1)), k)

In [5]:
def print_paragraph(i):
    doc = corpus.iloc[i]
    print(doc.country_name, doc.year)
    print(doc.text)
    print('\n\n')

In [6]:
print_paragraph(target)
print('Nearest Neighbors:\n')
for i in I[0]:
    if i == target:
        continue
    print_paragraph(i)

United States Of America 2015
We know that ISIL — which emerged out of the chaos of Iraq and Syria — depends on perpetual war to survive, but we also know that they gain adherents because of a poisonous ideology. Part of our job, together, is to work to reject such extremism that infects too many of our young people. Part of that effort must be a continued rejection by Muslims of those who distort Islam to preach intolerance and promote violence. It must also involve a rejection by non-Muslims of the ignorance that equates Islam with terrorism.



Nearest Neighbors:

Bolivia 2014
It is also a discourse of extremist fanaticism.



Jordan 2014
Another critical global focus must be a decisive 
affirmation of mutual respect within and among 
religions and peoples. The teachings of true Islam 
are clear: sectarian conflict and strife are utterly 
condemned. Islam prohibits violence against Christians 
and the other communities that make up each country.



Pakistan 1993
Instead of civility 