# Node Embeddings and Skip Gram Examples

**Purpose:** - To explore embedding methods used in label prediction for social networks. This will include a short exposition on the relation of natural language processing to network analysis.

**Introduction-** Node embedding methods are a commonly used method for node classification for social networks. This modeling method employs a feature engineering method call skip-gram modeling to represent the relationship of each node in the network in $N$ dimensional vector space. This vector can be thought of as a low dimensional representation of each node. Nodes which are more closely associated with one another will be clustered more closely together in our vector representation.




## Root in natural language processing

This method draws from research in natural language processing where we try to anticipate nearby words based on a corpus used to build the model. Consider the following example system:
> The Guadeloupe amazon is a hypothetical extinct species of parrot that is thought to have been endemic to the Lesser Antillean island region of Guadeloupe. Described by 17th- and 18th-century writers, it is thought to have been related to, or possibly the same as, the extant imperial amazon. 

In natural language processing, one strategy we can use is to define a window of size $w$. These will be used to define an association of words. For example, if we use a window of 3 words we get the following:

In [24]:
import numpy as np
text_example = "The Guadeloupe amazon is a hypothetical extinct species of parrot that is thought to have been endemic to the Lesser Antillean island region of Guadeloupe. Described by 17th- and 18th-century writers, it is thought to have been related to, or possibly the same as, the extant imperial amazon."
text_example = text_example.split(" ")
[[text_example[0+x],text_example[1+x],text_example[2+x]] for x in range(len(text_example)-2)]

[['The', 'Guadeloupe', 'amazon'],
 ['Guadeloupe', 'amazon', 'is'],
 ['amazon', 'is', 'a'],
 ['is', 'a', 'hypothetical'],
 ['a', 'hypothetical', 'extinct'],
 ['hypothetical', 'extinct', 'species'],
 ['extinct', 'species', 'of'],
 ['species', 'of', 'parrot'],
 ['of', 'parrot', 'that'],
 ['parrot', 'that', 'is'],
 ['that', 'is', 'thought'],
 ['is', 'thought', 'to'],
 ['thought', 'to', 'have'],
 ['to', 'have', 'been'],
 ['have', 'been', 'endemic'],
 ['been', 'endemic', 'to'],
 ['endemic', 'to', 'the'],
 ['to', 'the', 'Lesser'],
 ['the', 'Lesser', 'Antillean'],
 ['Lesser', 'Antillean', 'island'],
 ['Antillean', 'island', 'region'],
 ['island', 'region', 'of'],
 ['region', 'of', 'Guadeloupe.'],
 ['of', 'Guadeloupe.', 'Described'],
 ['Guadeloupe.', 'Described', 'by'],
 ['Described', 'by', '17th-'],
 ['by', '17th-', 'and'],
 ['17th-', 'and', '18th-century'],
 ['and', '18th-century', 'writers,'],
 ['18th-century', 'writers,', 'it'],
 ['writers,', 'it', 'is'],
 ['it', 'is', 'thought'],
 ['is', '

## Random Walks:



. This based in the fact that one approach for natural language processing views the ordering of words in a manner similar to a graph since each n-gram has a set of words that follow it. Strategies that treat text this way are naturally amenable to domains where we are explicitly working on a network structure.

Methods which employ node embeddings have several fundamental steps:
1. Create a "corpus" of node connections using a random walk.
2. Define a transformation on the list of node connections from **1** which groups node values that are close together with a high number, and nodes that have less of a relationship with a small number.
3. Run a standard machine learning method on the new set of factors from step **2**.

Here we explore the first step in this process: The random choosing of node values in the graph structure. This step is taken to approximate the connections each node has as a list. This carries two advantages:
1. Each node similarity measure has both local (direct) connections, and also expresses higher order connections (indirect). This is known as **Expressivity**.
2. All node pairs don't need to be encoded; we don't have to worry about coding the zero probabilities. This is **Efficiency**.

We will discuss some of the methods used for random walks in the sections below in reference to the paper where they were originally discussed.

### DeepWalk Method

*DeepWalk: Online Learning of Social Representations* uses short random walks. In this case, we define a random walk starting at vertex $V_i$ as $W_i$. This random walk is a stochastic process composed of random variables $W_i^k$ where k denotes the step in the sequence of each random walk.

For this method, a stream of random walks is created. This method has the added advantage of being easy to parallelize and is also less sensitive to changes in the underlying graph than using a larger length random walk.

The implementation of the DeepWalk method is used in the function below:

# Sources:

* [ An Illustrated Explanation of Using SkipGram To Encode The Structure of A Graph  ](  https://medium.com/@_init_/an-illustrated-explanation-of-using-skipgram-to-encode-the-structure-of-a-graph-deepwalk-6220e304d71b#:~:text=DeepWalk%20is%20an%20algorithm%20that,community%20structure%20of%20the%20graph.&text=However%2C%20SkipGram%20is%20an%20algorithm,used%20to%20create%20word%20embeddings)
* [ Word2Vec Tutorial - The Skip-Gram Model ]( http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/)
* [DeepWalk: Online Learning of Social Representations](http://www.perozzi.net/publications/14_kdd_deepwalk.pdf)







