# Nomination Tutorial

Vertex nomination is a useful method to effiencinty propose verticies in graph B that correspond to a vertex of interest in graph A. A list of verticies in graph B will be returned with associated probabilites that they match the vertex of interest. 

Vertex nomination works by repeatedly calling a seeded graph matching algorithm. As a result, the two input graphs of vertex nomination (A and B) dont need to be the same size, but a subset of their verticies must have a known correspondence between A and B. 

In [None]:
from graspologic.nominate import VNviaSGM
from graspologic.simulations import er_np
from graspologic.plot import heatmap
import numpy as np
import matplotlib.pyplot as plt

np.set_printoptions(suppress=True)

In [None]:
# Define parameters
n = 50
p = 0.3
num_seeds = 4

In [None]:
np.random.seed(2)
G1 = er_np(n=n, p=p)
node_shuffle_input = np.random.permutation(n)

G2 = G1[np.ix_(node_shuffle_input, node_shuffle_input)]

heatmap(G1, title = "Origional ER Graph (unshuffled)")
heatmap(G2, title = "Shuffled ER graph")

In [None]:
kklst= [(xx, yy) for xx, yy in zip(node_shuffle_input, np.arange(len(node_shuffle_input)))]
kklst.sort(key=lambda x:x[0])
print("Association from (node G1, node G2): ", kklst)
kklst = np.array(kklst)

kklst_dict = {}
for kk in kklst:
    kklst_dict[kk[0]]=kk[1]

In [None]:
voi = 5 # choose a vertex of interest

VNalg = VNviaSGM()
print(VNalg.fit_predict(G1, B=G2, voi, [kklst[0:num_seeds, 0], kklst[0:num_seeds, 1]]))

The algorithm produces a nomination list in the following format (index j in G2, probability that j matches voi). Note: the output is sorted with the largest probability coming first in the output lsit. As seen, the actual correspondence is 5--37 and the model predicts that 5 (in graph G1) matches with 37 (in graph G2) with >90% confidence. 

#### Treat verts num_seeds --> G1.shape[0] as vois and examine thier predictions

In [None]:
probs = []
is_max = []
vois = []

VNalg2 = VNviaSGM()
for voi in range(num_seeds, G1.shape[0]):
    nom_lst = VNalg2.fit_predict(G1, G2, voi, [kklst[0:num_seeds, 0], kklst[0:num_seeds, 1]])
    if nom_lst is not None:
        expected_g2_vert = kklst_dict[voi]

        if nom_lst[0][0] == expected_g2_vert:
            is_max.append(True)
        else:
            is_max.append(False)
            print(nom_lst, expected_g2_vert)
        
        pred = [nom for nom in nom_lst if nom[0]==expected_g2_vert]

        if len(pred) > 0:
            probs.append(pred[0][1])
        else:
            probs.append(0.0)

        vois.append(voi)

In [None]:
plt.figure()
plt.bar(vois, probs)
plt.xlabel("Vertex of Interest")
plt.ylabel("Probability matches to correct vertex in G2")
plt.title("Probability correct vs. Vertex of Interest")

print("Fraction of vois where correct vertex in G2 had max prob = {}/{}".format(len([im for im in is_max if im]), len(is_max)))

As seen above, all nominations lists produced by the algorithm have the correct VOI as the largest probability. Note, some vois must be skipped becuase they were not in the induced subgraph generated by the seeds. If this is the case the model returns None. 