# Experimenting with Deep Scite

Recall the model, for reference.

![](deep-scite-model-1.png)

In [1]:
import os
import tensorflow as tf
from deepscite import model
from deepscite import utils
from deepscite import train
import ruamel.yaml

In [2]:
base_dir = "../"
data_dir = os.path.join(base_dir, "data/noon/")

Let's define the parameters we want to use during training/inference.

In [3]:
# Update the `conf` global state that we use in various places in the model.
conf = tf.app.flags.FLAGS

conf.embedded_word_size  = 150
conf.word_vector_size    = 200
conf.conv_size           = 3
conf.conv_stride         = 1
conf.conv_features       = 1
conf.iterations          = 300
conf.learning_rate       = 1e-3
conf.weights_reg_scale   = 1e-6
conf.activity_reg_scale  = 1e-6
conf.embedding_reg_scale = 1e-6
conf.save_path           = os.path.join(base_dir, "./checkpoints/noon")
conf.log_path            = "/tmp/tf-checkpoints/deepscite-noon"
conf.data_dir            = data_dir

checkpoint_path    = os.path.join(base_dir, "checkpoints/noon/")

## Training Step

In [4]:
conf.minibatch_size = 1000
tf.reset_default_graph()
train.main(_)

Initialising new model...
Iteration #0, Loss: 0.6931463479995728, α: 0.5.
Checkpointed: /tmp/tf-checkpoints/deepscite-noon/checkpoint-0.
Iteration #0, Validation-set accuracy: 0.49799999594688416.
Iteration #10, Loss: 1.023919701576233, α: 0.5001515746116638.
Checkpointed: /tmp/tf-checkpoints/deepscite-noon/checkpoint-10.
Iteration #10, Validation-set accuracy: 0.8220000267028809.
Iteration #20, Loss: 0.7489159107208252, α: 0.5026029944419861.
Checkpointed: /tmp/tf-checkpoints/deepscite-noon/checkpoint-20.
Iteration #20, Validation-set accuracy: 0.8389999866485596.
Iteration #30, Loss: 0.6852636337280273, α: 0.5056034922599792.
Checkpointed: /tmp/tf-checkpoints/deepscite-noon/checkpoint-30.
Iteration #30, Validation-set accuracy: 0.8429999947547913.
Iteration #40, Loss: 0.6179088950157166, α: 0.5090567469596863.
Checkpointed: /tmp/tf-checkpoints/deepscite-noon/checkpoint-40.
Iteration #40, Validation-set accuracy: 0.8479999899864197.
Iteration #50, Loss: 0.566624104976654, α: 0.5127959

Let's feed in a single paper (title, abstract) into DeepScite and see what it thinks.

## Inference step

Re-arrange or enter your own titles to see what DeepScite, training on Noons data, will think of it!

In [29]:
arxiv_id = "1609.05011"
title = "Convex separation from convex optimization for large-scale problems"
abstract = r"""
    We present a scheme, based on Gilbert's algorithm for quadratic minimization
    [SIAM J. Contrl., vol. 4, pp. 61-80, 1966], to prove separation between a
    point and an arbitrary convex set S⊂ℝnS⊂Rn via calls to an oracle able to
    perform linear optimizations over SS. Compared to other methods, our scheme
    has almost negligible memory requirements and the number of calls to the
    optimization oracle does not depend on the dimensionality nn of the underlying
    space. We study the speed of convergence of the scheme under different
    promises on the shape of the set SS and/or the location of the point,
    validating the accuracy of our theoretical bounds with numerical
    examples. Finally, we present some applications of the scheme in
    quantum information theory. There we find that our algorithm
    out-performs existing linear programming methods for certain large
    scale problems, allowing us to certify nonlocality in bipartite
    scenarios with upto 4242 measurement settings. We apply the algorithm
    to upper bound the visibility of two-qubit Werner states, hence
    improving known lower bounds on Grothendieck's constant KG(3)KG(3).
    Similarly, we compute new upper bounds on the visibility of GHZ
    states and on the steerability limit of Werner states for a fixed
    number of measurement settings.
    """



# Bad one
title = "Quantum integration technique for closed quantum groups"
abstract = "Here we present a new integration technique in functional analysis that ..."




# Very good
title = "Quantum algorithm for computing the determinant"
abstract = r"""
    We present a quantum algorithm to compute the determinant in Polynomial
    time. It has long been known ...
"""



We need to convert the text into the format needed by the model. Each word is mapped to the index of the vector in the word embedding matrix (i.e. it's index in the `vocab.txt` file.)

![](deep-scite-model-with-vectors.png)

In [30]:
vocab_list = utils.load_vocabulary(data_dir)
vocab_dict = {}
for k, w in enumerate(vocab_list):
    vocab_dict[w] = k

In [31]:
def get_wordids_for(s):
    r = [vocab_dict[w] for w in utils.to_words(s) if w in vocab_dict ]
    if r == []:
        raise Exception("Found no words at all!")
    return " ".join(map(str, r))

In [32]:
inputs = [ {"id": arxiv_id, 
            "wordset_1_ids": get_wordids_for(title), 
            "wordset_2_ids": get_wordids_for(abstract) } ]

## Load the model and emit a prediction

In [33]:
m = model.JointEmbeddingModelForBinaryClassification(conf.embedded_word_size)

# TensorFlow is uses a lot of global state. As a result, if we 
# wish to re-run this cell many times, we need to have this
# statement here to ensure nothing is kept over.
tf.reset_default_graph()

conf.minibatch_size      = 1 # We're only inputting one piece of data - a single paper.

with tf.Session() as sess:
    
    model_params = m.graph(
        conf.minibatch_size,
        len(vocab_list),
        conf.word_vector_size,
        conf.conv_size,
        conf.conv_stride,
        conf.conv_features
    )
    
    # Load the trained weights
    saver = tf.train.Saver()
    checkpoint = tf.train.latest_checkpoint(checkpoint_path)
    
    if not checkpoint:
        raise Exception("Couldn't find checkpoint at: {}".format(checkpoint_path))
    
    saver.restore(sess, checkpoint)
    
    X1, X2, _, M1, M2, S1, S2, subset = train.get_datapoints(inputs)
    data = {model_params.wordset_1: X1,
            model_params.wordset_2: X2,
            model_params.wordset_1_masks: M1,
            model_params.wordset_2_masks: M2,
            model_params.wordset_1_lengths: S1,
            model_params.wordset_2_lengths: S2}
    

    # Calculate the recommendations
    set1_activations, set2_activations, final_probs, alpha = sess.run([
        tf.squeeze(model_params.conv_wordset_1_activity, [2,3]),
        tf.squeeze(model_params.conv_wordset_2_activity, [2,3]),
        model_params.final_probs,
        model_params.alpha], 
        feed_dict=data)

## With what probability would Noon *scite* this paper?

In [34]:
final_probs[0]

0.95597804

## Why?

In [40]:
title_words    = utils.to_words(title)
abstract_words = utils.to_words(abstract)

threshold = 5

good_title_words = [ title_words[k] 
                    for k, v in enumerate(set1_activations[0]) if v > threshold]
bad_title_words  = [ title_words[k] 
                    for k, v in enumerate(set1_activations[0]) if v < -threshold]

good_abstract_words = [ abstract_words[k] 
                       for k, v in enumerate(set2_activations[0]) if v > threshold]
bad_abstract_words  = [ abstract_words[k] 
                       for k, v in enumerate(set2_activations[0]) if v < -threshold]

In [41]:
good_title_words

['quantum', 'algorithm']

In [42]:
bad_title_words

[]

In [43]:
good_abstract_words

['quantum', 'algorithm']

In [44]:
bad_abstract_words

[]

## Weighting parameter

$$
    p = \alpha * \text{titles} + (1-\alpha) * \text{abstracts}
$$

In [28]:
alpha

0.57741034