This is a mockup of node2vec model created with stellar graph ml library

In [None]:
import tensorflow as tf
import numpy as np
import networkx as nx

import stellar_ml as sml

Steps:
- Load a graph with GraphLoader
- Create an instance of random walk generator (class imported from the library); generate random walks on the graph, return random_walks
- Create an instance of (node,context) pairs generator (class imported from the library), connect to random_walks
- Build a tensorflow model for word2vec:
    - create placeholders for the input (node,context) pairs
    - create the layers: embeddings, context_softmax - use layers from the library, or layers from tf or keras?
    - create the loss:
            nce_loss = tf.reduce_mean(
                       tf.nn.nce_loss(weights=nce_weights,
                       biases=nce_biases,
                       labels=train_context,
                       inputs=embed,
                       num_sampled=num_sampled,
                       num_classes=vocabulary_size))
    - create an optimizer
- Train the word2vec model in a session
    - create minibatches of (node,context) pairs using the generator
    - loop through the minibatches, feeding them to the model
    - optimise the model's loss
    - repeat for n_epoch epochs
- Extract the final node embeddings

Load the graph:

In [None]:
g = sml.graphloader.load('path/to/graph')  # this should support multiple graph formats: EPGM, networkx, graphml, and
                                            # the result should be a networkx graph (probably the most general MultiDiGraph)
    
assert hasattr(g, 'graph')   # G should be a networkx graph object

Create an instance of graph random walk generator, and use it to generate random walks on g:

In [None]:
rw = sml.graph_exploration.UniformRandomWalk(g, numWalks, walkLength, p, q, seed)
walks = rw.generate()   # perform random walks on g and return a list of walks (node sequences)

In [None]:
RSEED = 42
batch_size = 128
num_skips = 2
skip_window = 1

embedding_size = 128

vocab_size = len(set(walks))    # number of unique nodes

skipgram_batchgen = sml.SkipGram(data=walks, batch_size=batch_size, num_skips=num_skips, skip_window=skip_window)  # create an instance of skipgram batch generator

Create a tensorflow model for word2vec using sml.W2V_Sampled() class (e.g., the one defined in Andrew's node2vec_pregen.py code https://github.com/adocherty/node2vec_experiments), if this class is part of the sml library:

In [None]:
word2vec = sml.W2V_Sampled(
        embedding_size=embedding_size,
        vocabulary_size=vocab_size,
        batch_size=batch_size,
        val_batch_size=None,
        neg_samples=2,
        save_path="n2v_{}".format(datetime.date.today()),
        learning_rate=0.2
        )

Alternatively, if W2V_Sampled() is NOT a part of the sml library, build a word2vec model (the tensorflow computation graph and the .train method) using tensorflow or Keras layers...

Train the word2vec model by feeding to it batches generated with skipgram_batchgen:

In [None]:
freeze_context_indices = None
freeze_indices = None
checkpoint_file = None

with tf.Session() as session, tf.device('/cpu:0'):
    tf.set_random_seed(RSEED)

    word2vec.train(session, skipgram_batchgen,
                   freeze_indices=freeze_indices,
                   freeze_context_indices=freeze_context_indices,
                   restore_from_file=checkpoint_file,
                   n_epochs=10)