# Node Embeddings and Skip Gram Examples

**Purpose:** - To explore embedding methods used in label prediction for social networks. This will include a short exposition on the relation of natural language processing to network analysis.

**Introduction-** Node embedding methods are a commonly used method for node classification for social networks. This modeling method employs a feature engineering method call skip-gram modeling to represent the relationship of each node in the network in $N$ dimensional vector space. This vector can be thought of as a low dimensional representation of each node. Nodes which are more closely associated with one another will be clustered more closely together in our vector representation.




## Root in natural language processing

This method draws from research in natural language processing where we try to anticipate nearby words based on a corpus used to build the model. Consider the following example system:
> The Guadeloupe amazon is a hypothetical extinct species of parrot that is thought to have been endemic to the Lesser Antillean island region of Guadeloupe. Described by 17th- and 18th-century writers, it is thought to have been related to, or possibly the same as, the extant imperial amazon. 

In natural language processing, one strategy we can use is to define a window of size $w$. These will be used to define an association of words. For example, if we use a window of 3 words we get the following:

In [34]:
import io
import os
import re
import shutil
import string
import tensorflow as tf
import numpy as np
from datetime import datetime
from tensorflow.keras import Model, Sequential
from tensorflow.keras.layers import Activation, Dense, Embedding, GlobalAveragePooling1D
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
from tensorflow.keras.layers import Activation, Dense, Dot, Embedding, Flatten, GlobalAveragePooling1D, Reshape
 

text_example = "The Guadeloupe amazon is a hypothetical extinct species of parrot that is thought to have been endemic to the Lesser Antillean island region of Guadeloupe. Described by 17th- and 18th-century writers, it is thought to have been related to, or possibly the same as, the extant imperial amazon."
text_example = text_example.split(" ")

In [2]:
embedding_layer = tf.keras.layers.Embedding(1000, 3)
result = embedding_layer(tf.constant([1,2,3]))
result.numpy()

array([[ 0.04512674,  0.02591044,  0.03332372],
       [ 0.00830823, -0.03147123, -0.00134744],
       [ 0.02704053, -0.03413739, -0.03112757]], dtype=float32)

The next step is to find a way of representing each of these words. This often involves just coding each word to a number due to brevity. After coding each words into a number, we define a window, $w$ to hold each word association. Below, we define the mapping *to* the number. Afterwards, we will define the mapping back. Finally, we translate out original sentence into the numbers.

In [3]:
grams = np.unique(text_example)
word_to_num = {grams[x]:x for x in range(len(grams))}
num_to_word = {x:grams[x] for x in range(len(grams))}
num_sentence = [word_to_num[x] for x in text_example]
print(num_sentence)

[7, 4, 9, 21, 8, 19, 17, 31, 24, 26, 32, 21, 34, 35, 18, 13, 15, 35, 33, 6, 2, 22, 28, 24, 5, 3, 14, 0, 11, 1, 37, 23, 21, 34, 35, 18, 13, 29, 36, 25, 27, 33, 30, 12, 33, 16, 20, 10]


In [4]:
window_size = 2
positive_skip_grams, _ = tf.keras.preprocessing.sequence.skipgrams(
      num_sentence, 
      vocabulary_size=len(grams),
      window_size=3,
      negative_samples=0)
print(len(positive_skip_grams))

264


The operation above broke the results into a list of correspondences in that 3 word window. For instance, if we look up 36, we can see that it corresponds with th

In [5]:
thirty_six = [x for x in positive_skip_grams if 36 in x]

In [6]:
thirty_six

[[25, 36],
 [13, 36],
 [29, 36],
 [36, 13],
 [18, 36],
 [33, 36],
 [27, 36],
 [36, 18],
 [36, 27],
 [36, 33],
 [36, 25],
 [36, 29]]

We have one final step before proceeding onto the final product. An additional processing step is to intentionally create "bad" data-points which will stand in contrast to the associations we just created. This is referred to as negative sampling. For the sake of this example, we will use the tensorflow function tf.random.log_uniform_candidate_sampler which will let us omit the correct context words when sampling:

In [7]:
target_word, context_word = positive_skip_grams[0]

# Set the number of negative samples per positive context. 
num_ns = 4

context_class = tf.reshape(tf.constant(context_word, dtype="int64"), (1, 1))
negative_sampling_candidates, _, _ = tf.random.log_uniform_candidate_sampler(
    true_classes=context_class, # class that should be sampled as 'positive'
    num_true=1, # each positive skip-gram has 1 positive context class
    num_sampled=num_ns, # number of negative context words to sample
    unique=True, # all the negative samples should be unique
    range_max=len(grams), # pick index of the samples from [0, vocab_size]
    seed=13, # seed for reproducibility
    name="negative_sampling" # name of this operation
)
print(negative_sampling_candidates)
print([num_to_word[index.numpy()] for index in negative_sampling_candidates])

tf.Tensor([11  1 14  7], shape=(4,), dtype=int64)
['and', '18th-century', 'by', 'The']


In [8]:
# Get target and context words for one positive skip-gram.
target_word, context_word = positive_skip_grams[0]

# Set the number of negative samples per positive context. 
num_ns = 4

context_class = tf.reshape(tf.constant(context_word, dtype="int64"), (1, 1))
negative_sampling_candidates, _, _ = tf.random.log_uniform_candidate_sampler(
    true_classes=context_class, # class that should be sampled as 'positive'
    num_true=1, # each positive skip-gram has 1 positive context class
    num_sampled=num_ns, # number of negative context words to sample
    unique=True, # all the negative samples should be unique
    range_max=len(grams), # pick index of the samples from [0, vocab_size]
    seed=13, # seed for reproducibility
    name="negative_sampling" # name of this operation
)
print(negative_sampling_candidates)
print([num_to_word[index.numpy()] for index in negative_sampling_candidates])

tf.Tensor([13  0 20  1], shape=(4,), dtype=int64)
['been', '17th-', 'imperial', '18th-century']


In [9]:
negative_sampling_candidates

<tf.Tensor: shape=(4,), dtype=int64, numpy=array([13,  0, 20,  1])>

In [10]:
# Add a dimension so you can use concatenation (on the next step).
negative_sampling_candidates = tf.expand_dims(negative_sampling_candidates, 1)

# Concat positive context word with negative sampled words.
context = tf.concat([context_class, negative_sampling_candidates], 0)

# Label first context word as 1 (positive) followed by num_ns 0s (negative).
label = tf.constant([1] + [0]*num_ns, dtype="int64") 

# Reshape target to shape (1,) and context and label to (num_ns+1,).
target = tf.squeeze(target_word)
context = tf.squeeze(context)
label =  tf.squeeze(label)

In [11]:
print(f"target_index    : {target}")
print(f"target_word     : {num_to_word[target_word]}")
print(f"context_indices : {context}")
print(f"context_words   : {[num_to_word[c.numpy()] for c in context]}")
print(f"label           : {label}")

target_index    : 25
target_word     : or
context_indices : [36 13  0 20  1]
context_words   : ['to,', 'been', '17th-', 'imperial', '18th-century']
label           : [1 0 0 0 0]


Now that we have represented our variables, 

Following the template described in the Tensorflow tutorial, we will define a function that performs all of this data transformation for us. This function is described below

In [13]:
window = 3
negative_samples = 4
SEED = 13
sentence = "The Guadeloupe amazon is a hypothetical extinct species of parrot that is thought to have been endemic to the Lesser Antillean island region of Guadeloupe. Described by 17th- and 18th-century writers, it is thought to have been related to, or possibly the same as, the extant imperial amazon."
text_example = sentence.split(" ")
grams = np.unique(text_example)
vocab_size = len(grams)
word_to_num = {grams[x]:x for x in range(len(grams))}
num_to_word = {x:grams[x] for x in range(len(grams))}
num_sentence = [word_to_num[x] for x in text_example]    
sample_table = positive_skip_grams, _ = tf.keras.preprocessing.sequence.skipgrams( num_sentence,  vocabulary_size=len(grams), window_size=window, negative_samples=negative_samples)
print(sample_table)

([[15, 7], [32, 7], [18, 22], [33, 16], [4, 9], [4, 25], [22, 4], [2, 22], [30, 25], [24, 31], [33, 34], [31, 21], [24, 33], [9, 22], [5, 29], [33, 33], [36, 26], [34, 7], [14, 12], [28, 5], [21, 13], [5, 14], [35, 24], [13, 35], [18, 6], [17, 1], [33, 16], [23, 34], [18, 29], [34, 13], [35, 2], [22, 8], [13, 20], [21, 24], [32, 27], [24, 5], [5, 24], [26, 24], [18, 26], [23, 1], [30, 12], [29, 13], [6, 23], [14, 14], [13, 11], [13, 27], [37, 12], [18, 13], [24, 37], [35, 12], [15, 4], [26, 13], [37, 26], [24, 10], [34, 35], [21, 17], [35, 18], [1, 37], [21, 11], [21, 23], [35, 22], [33, 17], [31, 12], [24, 29], [15, 19], [30, 18], [27, 13], [10, 23], [26, 31], [36, 27], [19, 17], [24, 11], [7, 21], [6, 15], [15, 31], [2, 26], [21, 29], [21, 21], [3, 17], [4, 18], [21, 32], [13, 6], [22, 24], [11, 19], [20, 16], [33, 6], [35, 10], [21, 8], [18, 1], [21, 27], [9, 6], [11, 10], [26, 13], [34, 9], [35, 9], [2, 35], [32, 35], [23, 37], [27, 30], [29, 37], [34, 6], [24, 25], [30, 22], [29, 

In [29]:
def generate_training_data(sentence, window , negative_samples = 4 ):
    # [1] Tokenize the sentence that is provided:
    text_example = sentence.split(" ")     
    # [2] create from/to mappings using dictionary objects.
    grams = np.unique(text_example)
    word_to_num = {grams[x]:x for x in range(len(grams))}
    num_to_word = {x:grams[x] for x in range(len(grams))}
    num_sentence = [word_to_num[x] for x in text_example]          
    # [3] Perform preprocessing step:  
    skip_vals=tf.keras.preprocessing.sequence.skipgrams( num_sentence, vocabulary_size=len(grams), window_size=window, negative_samples=negative_samples) 
    labels = [x for x in skip_vals[1]]
    targets = [x[0] for x in skip_vals[0] ]
    contexts = [x[1] for x in skip_vals[0]]   
    return targets, contexts, labels

In [37]:
class Word2Vec(Model):
  def __init__(self, vocab_size, embedding_dim):
    super(Word2Vec, self).__init__()
    self.target_embedding = Embedding(vocab_size, 
                                      embedding_dim,
                                      input_length=1,
                                      name="w2v_embedding", )
    self.context_embedding = Embedding(vocab_size, 
                                       embedding_dim, 
                                       input_length=num_ns+1)
    self.dots = Dot(axes=(3,2))
    self.flatten = Flatten()

  def call(self, pair):
    target, context = pair
    we = self.target_embedding(target)
    ce = self.context_embedding(context)
    dots = self.dots([ce, we])
    return self.flatten(dots) 

embedding_dim = 128
word2vec = Word2Vec(vocab_size, embedding_dim)
word2vec.compile(optimizer='adam',
              loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

In [43]:
targets, contexts, labels = generate_training_data(sentence,window  = 3)
BATCH_SIZE = 1024
BUFFER_SIZE = 10000
dataset = tf.data.Dataset.from_tensor_slices(((targets, contexts), labels))
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
AUTOTUNE = tf.data.AUTOTUNE
dataset = dataset.cache().prefetch(buffer_size=AUTOTUNE)
print(dataset)

<BatchDataset shapes: (((1024,), (1024,)), (1024,)), types: ((tf.int32, tf.int32), tf.int32)>
<PrefetchDataset shapes: (((1024,), (1024,)), (1024,)), types: ((tf.int32, tf.int32), tf.int32)>


In [47]:
dataset.

tensorflow.python.data.ops.dataset_ops.PrefetchDataset

In [46]:
#tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir="logs")
word2vec.fit(dataset, epochs=20 )

Epoch 1/20


TypeError: 'NoneType' object is not callable

In [9]:
import plotly.express as px
df = px.data.iris()
fig = px.scatter_3d(df, x='sepal_length', y='sepal_width', z='petal_width',
              color='species')
fig.show()

NameError: name 'df' is not defined

In [12]:
import plotly.express as px
df = px.data.iris()
fig = px.scatter_3d(df, x='sepal_length', y='sepal_width', z='petal_width',
              color='species')
fig.show()

In [14]:
type(df)

pandas.core.frame.DataFrame

## Random Walks:



. This based in the fact that one approach for natural language processing views the ordering of words in a manner similar to a graph since each n-gram has a set of words that follow it. Strategies that treat text this way are naturally amenable to domains where we are explicitly working on a network structure.

Methods which employ node embeddings have several fundamental steps:
1. Create a "corpus" of node connections using a random walk.
2. Define a transformation on the list of node connections from **1** which groups node values that are close together with a high number, and nodes that have less of a relationship with a small number.
3. Run a standard machine learning method on the new set of factors from step **2**.

Here we explore the first step in this process: The random choosing of node values in the graph structure. This step is taken to approximate the connections each node has as a list. This carries two advantages:
1. Each node similarity measure has both local (direct) connections, and also expresses higher order connections (indirect). This is known as **Expressivity**.
2. All node pairs don't need to be encoded; we don't have to worry about coding the zero probabilities. This is **Efficiency**.

We will discuss some of the methods used for random walks in the sections below in reference to the paper where they were originally discussed.

### DeepWalk Method

*DeepWalk: Online Learning of Social Representations* uses short random walks. In this case, we define a random walk starting at vertex $V_i$ as $W_i$. This random walk is a stochastic process composed of random variables $W_i^k$ where k denotes the step in the sequence of each random walk.

For this method, a stream of random walks is created. This method has the added advantage of being easy to parallelize and is also less sensitive to changes in the underlying graph than using a larger length random walk.

The implementation of the DeepWalk method is used in the function below:

# Sources:

* [ An Illustrated Explanation of Using SkipGram To Encode The Structure of A Graph  ](  https://medium.com/@_init_/an-illustrated-explanation-of-using-skipgram-to-encode-the-structure-of-a-graph-deepwalk-6220e304d71b#:~:text=DeepWalk%20is%20an%20algorithm%20that,community%20structure%20of%20the%20graph.&text=However%2C%20SkipGram%20is%20an%20algorithm,used%20to%20create%20word%20embeddings)
* [ Word2Vec Tutorial - The Skip-Gram Model ]( http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/)
* [DeepWalk: Online Learning of Social Representations](http://www.perozzi.net/publications/14_kdd_deepwalk.pdf)







