# Basic Node2Vec  and Embeddings


<a href="https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Node2VecIntro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook we will explore a basic example using our favorite Karate Club Graph and Node2Vec.



In [None]:
%%capture
!pip3 install  node2vec

In [None]:
import networkx as nx
import pandas as pd
from node2vec import Node2Vec as n2v

In [None]:
# Load the Zachary's Karate Club as a NetworkX Graph object
KCG = nx.karate_club_graph()

print(KCG.nodes[1])
print(KCG.nodes[33])

# print final assignments
#for node in KCG.nodes:
#  print(str(node+1)+"," + str(KCG.nodes[node]['club']))

nx.draw(KCG, with_labels=True, font_weight='bold')

Next, let us run Node2Vec to create embeddings.

In [None]:
# Generate Random walks
g_emb = n2v(KCG, dimensions=2)
WINDOW = 1 # Node2Vec fit window
MIN_COUNT = 1 # Node2Vec min. count
BATCH_WORDS = 4 # Node2Vec batch words

# Fit model
model = g_emb.fit(
    vector_size = 2,
    window=WINDOW,
    min_count=MIN_COUNT,
    batch_words=BATCH_WORDS
)

Let us find similar members/nodes:

In [None]:
input_node = '1'
for s in model.wv.most_similar(input_node, topn = 10):
    print(s)

Let us combine the embedding with the actual club name after the split.

In [None]:
embeddings = []
for node in KCG.nodes:
  embedding = list(model.wv.get_vector(str(node)))
  club = KCG.nodes[node]['club']
  embeddings.append(embedding + [club])

df = pd.DataFrame(embeddings, columns=['x', 'y', 'club'])
print(df)

Let us print the embedded nodes together with the color of the actual resulting club.

In [None]:
colors = ['red' if x == 'Mr. Hi' else 'blue' for x in df.club]
df.plot.scatter(x='x', y='y', s=50, c=colors)

# Bonus:  Dimensionality Reduction

Normally, we want embeddings dimensions greater than two. Unfortunately, high dimensional spaces are ... hard to visualize. Luckily we have tools such as [TSNE](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) or [UMAP](https://umap-learn.readthedocs.io/en/latest/basic_usage.html) to reduce dimensionality.

In [None]:
import numpy as np
from sklearn.manifold import TSNE

In [None]:
X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
X_embedded = TSNE(n_components=2, learning_rate='auto',
             init='random', perplexity=3).fit_transform(X)
X_embedded.shape