# Graph embeddings

As embedding method we chose Node2Vec. We believe it is appropriate embedding algorithm, as
it is efficient, scalable and provides strong performance. The disadvantages are that it is
trunsductive and only support homogenous graphs. The first limitation is not a problem, while with second we deal by encoding node type in its label.

In [1]:
import networkx as nx
import re

TRIPLE_RE = re.compile(r"<([^>]*)><([^>]*)><([^>]*)>")

def read_graph(filename = "graph.txt"):
    G = nx.Graph()
    with open(filename, 'r') as file:
        for line in file:
            m = TRIPLE_RE.match(line.strip())
            v, e, u = m.groups()
            G.add_edge(v, u, relation=e)
    return G

G = read_graph()
len(G.nodes)


5042

Unfortunately whole graph seems to be too big for successful Node2Vec model training, so we will use its subset.

In [2]:
import random

def sample_subgraph(G, fraction=0.1):
    nodes = set(G.nodes)
    character_nodes = [n for n in nodes if n.startswith('C:')]
    skipped_characters = set(random.sample(character_nodes, int(len(character_nodes) * (1 - fraction))))
    return G.subgraph(nodes - skipped_characters)

small_G = sample_subgraph(G, fraction=0.1)
len(small_G.nodes)


3148

In [3]:
import pickle

with open('small_graph.pkl', 'wb') as file:
    pickle.dump(small_G, file)


In [4]:
from node2vec import Node2Vec

node2vec = Node2Vec(small_G, dimensions=64, walk_length=20, num_walks=100, workers=4, p=4.0, q=0.5)
model = node2vec.fit(window=10, min_count=1, batch_words=4)

Computing transition probabilities:   0%|          | 0/3148 [00:00<?, ?it/s]

Generating walks (CPU: 1): 100%|██████████| 25/25 [00:22<00:00,  1.09it/s]
Generating walks (CPU: 2): 100%|██████████| 25/25 [00:22<00:00,  1.09it/s]
Generating walks (CPU: 3): 100%|██████████| 25/25 [00:22<00:00,  1.09it/s]
Generating walks (CPU: 4): 100%|██████████| 25/25 [00:22<00:00,  1.09it/s]
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.

In [5]:
import pickle

with open("small_model.pkl", "wb") as f:
    pickle.dump(model, f)

In [6]:
node2vec = Node2Vec(G, dimensions=64, walk_length=20, num_walks=100, workers=4, p=4.0, q=0.5)
model = node2vec.fit(window=10, min_count=1, batch_words=4)

Computing transition probabilities:   0%|          | 0/5042 [00:00<?, ?it/s]

Exception ignored in: <function ResourceTracker.__del__ at 0x7fd4ea71b060>
Traceback (most recent call last):
  File "/usr/lib/python3.13/multiprocessing/resource_tracker.py", line 84, in __del__
  File "/usr/lib/python3.13/multiprocessing/resource_tracker.py", line 93, in _stop
  File "/usr/lib/python3.13/multiprocessing/resource_tracker.py", line 118, in _stop_locked
ChildProcessError: [Errno 10] No child processes
Exception ignored in: <function ResourceTracker.__del__ at 0x7f981cd23060>
Traceback (most recent call last):
  File "/usr/lib/python3.13/multiprocessing/resource_tracker.py", line 84, in __del__
  File "/usr/lib/python3.13/multiprocessing/resource_tracker.py", line 93, in _stop
  File "/usr/lib/python3.13/multiprocessing/resource_tracker.py", line 118, in _stop_locked
ChildProcessError: [Errno 10] No child processes
Exception ignored in: <function ResourceTracker.__del__ at 0x7f9033f13060>
Traceback (most recent call last):
  File "/usr/lib/python3.13/multiprocessing/reso

KeyboardInterrupt: 

In [None]:
with open("model.pkl") as fp:
    pickle.dump(model, fp)