## Check out what sentences are similar to your test sentence
How similar is "Can I reset the cat" to a sentence in our quora dataset?
Just run the notebook and it will return similar sentences for a test sentence.
Then simply add your own in the **Enter you own test sentence** cell and then re-run the cell.

### Lets import some libraries

In [1]:
import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import pandas as pd
import json

### Download the USE module
Althought we have created the embeddings for our test dataset already we still need to download the USE module to encode the new test sentences<br>
The first time you run this the USE module will be downloaded but after that it will be cached.<br>
As a result the first time you run this it might take a few seconds to download the module but after that it 
will run much faster. <br>
For more info on this see https://www.tensorflow.org/hub/basics

In [10]:
module_url = "https://tfhub.dev/google/universal-sentence-encoder-large/3"
embed = hub.Module(module_url)

### Create an interactive TF session

In [3]:
session = tf.InteractiveSession()
session.run(tf.global_variables_initializer())
session.run(tf.tables_initializer())

### Restore the saved embeddings
Get the embeddings which are saved in a pickle file

In [4]:
df = pd.read_pickle("./embeddings.pkl") 

In [5]:
df.head()

Unnamed: 0,questions,embeddings
0,Astrology: I am a Capricorn Sun Cap moon and c...,"[-0.043053239583969116, 0.05912359803915024, -..."
1,How can I be a good geologist?,"[0.04468201845884323, 0.0032161467242985964, 0..."
2,How do I read and find my YouTube comments?,"[0.031296346336603165, -0.002411748981103301, ..."
3,What can make Physics easy to learn?,"[0.0734066367149353, 0.055123668164014816, 0.0..."
4,What was your first sexual experience like?,"[0.05466366186738014, -0.003594229696318507, -..."


### Get the similarity score for our test sentence and the saved embeddings

In [6]:
sts_input1 = tf.placeholder(tf.string, shape=(None))
sts_encode2 = tf.placeholder(tf.float32)

# For evaluation we use exactly normalized rather than
# approximately normalized.
sts_encode1 = tf.nn.l2_normalize(embed(sts_input1), axis=1)

cosine_similarities = tf.reduce_sum(tf.multiply(sts_encode1, sts_encode2), axis=1)
clip_cosine_similarities = tf.clip_by_value(cosine_similarities, 0.0, 1.0)
sim_scores = 1.0 - tf.divide(tf.acos(clip_cosine_similarities), 3.14)

def get_scores(session, text_a, text_b):
    """Returns the similarity scores"""
    scores= session.run(
        [sim_scores],
        feed_dict={
            sts_input1: text_a,
            sts_encode2: text_b
        })
    return(scores)

### Check the similarity of the sentence pairs

In [7]:
def get_results(sessions, sentence, num):
    examples = [e for e in df['embeddings']]
    scores = get_scores(session, [sentence], examples)
    df['cosine'] = scores[0].tolist()
    return(df.sort_values('cosine', ascending=False).head(n=num))

### Print out the result in a nice format

In [8]:
def print_res(test, num=20):
    res = get_results(session, test, num).round(4)
    res = (res.set_index('cosine')).drop(columns=['embeddings'])
    print('{}\n'.format(test))
    print('\x1b[31mScore{:<1} \x1b[0m: \x1b[34m Matching sentence\x1b[0m'.format(''))

    for i in res.iterrows():
        print('\x1b[31m{:<6} \x1b[0m: \x1b[0m \x1b[34m{}\x1b[0m'.format(i[0], i[1][0]))

### Enter your own test sentece
Open the test dataset and look at some of the sentences. <br>
Then try and enter your own sentences and see if the matches make sences. <br>
What about mis-spelling? Ordering? Using different words with similar meaning? <br>
How do these impact the cosine similarity score? <br>

In [9]:
"""
Simply use print_res("what is purpose?") to return the top 20 best matches
Or use print_res("what is purpose?", 100) to choose how many best matches to return
"""
print_res("what is purpose?", 25)

what is purpose?

[31mScore  [0m: [34m Matching sentence[0m
[31m0.8483 [0m: [0m [34mWhat is purpose of life?[0m
[31m0.8197 [0m: [0m [34mWhat is the meaning and purpose to life?[0m
[31m0.8129 [0m: [0m [34mWhat's the purpose of life? What is life actually about?[0m
[31m0.7922 [0m: [0m [34mWhat do you feel is the purpose of life?[0m
[31m0.7912 [0m: [0m [34mWhat the meaning of this all life?[0m
[31m0.7879 [0m: [0m [34mWhat is the meaning of life? Whats our purpose on Earth?[0m
[31m0.7865 [0m: [0m [34mWhat's are the meaning of life?[0m
[31m0.762  [0m: [0m [34mWhat is the exact meaning of life?[0m
[31m0.7548 [0m: [0m [34mDo we truly have any purpose in life? Or do we create a purpose to make ourselves feel significant in the very vast world, or to make ourselves feel that our existence in the vast world is required?[0m
[31m0.742  [0m: [0m [34mWhy is creativity important?[0m
[31m0.7267 [0m: [0m [34mWhat is the essence of enlightenment?

### Need some inspiration? 
Generate some random sentences from the Quora dataset and see if you can alter them and stil get a high similarity score?

In [34]:
for i, s in enumerate(df.sample(n=5).iterrows()):
    print('{:2}: {}'.format(i+1, s[1][0]))

 1: Have you ever been fired?
 2: Why do we get sea breezes in summer but rarely in winter?
 3: How will the implementation of GST bill impact the lives of common people?
 4: Is my boyfriend lying about his true feelings for his friend and is he secretly attracted to her?
 5: You have given all statement as correct in UNCCD question in CSE prelim 2016. While many coachings have taken 2nd statement as wrong.?
