# Basic Node2Vec  and Embeddings


<a href="https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Node2VecIntro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook we will explore a basic example using our favorite Karate Club Graph and Node2Vec.

As usual, setup and install everything required. Due to requirements of the node2vec we need an older version of numpy than preinstalled in Colab, hence we need to restart the runtime (and cannot capute the annoying output...).

In [None]:
!pip3 install numpy==1.19 
!pip3 install node2vec==0.4.6
!pip3 install gensim==3.5.0
!pip3 install pandas

In [None]:
import networkx as nx
from node2vec import Node2Vec
import pandas as pd

In [None]:
# Load the Zachary's Karate Club as a NetworkX Graph object
KCG = nx.karate_club_graph() # Bachkup shortcut :-)

print(KCG.nodes[1])
print(KCG.nodes[33])
#for node in KCG.nodes:
#  print(str(node+1)+"," + str(KCG.nodes[node]['club']))

nx.draw(KCG, with_labels=True, font_weight='bold')

In [None]:
# Generate Random walks
node2vec = Node2Vec(KCG, dimensions=2, walk_length=10, num_walks=50, workers=4)  # Use temp_folder for big graphs

# Embed nodes
model = node2vec.fit(window=4, min_count=1, batch_words=4)  

# Look for most similar nodes for Mr Hi
model.wv.most_similar('1')  # Output node names are always strings

Let us combine the embedding with the actual club name after the split.

In [None]:
embeddings = []
for node in KCG.nodes:
  embedding = list(model.wv.get_vector(str(node)))
  club = KCG.nodes[node]['club']
  embeddings.append(embedding + [club])

df = pd.DataFrame(embeddings, columns=['x', 'y', 'club'])
print(df)

Let us print the embedded nodes together with the color of the actual resulting club.

In [None]:
colors = ['red' if x == 'Mr. Hi' else 'blue' for x in df.club]
df.plot.scatter(x='x', y='y', s=50, c=colors)