# Nomination Tutorial

This class implements Vertex Nomination via Seeded Graph Matching
(VNviaSGM) with the algorithm described in [1].

VNviaSGM is a nomination algorithm, so instead of completely matching two graphs `A` and `B`, it proposes a nomination list of potential matches in graph `B` to a vertex of interest (voi) in graph `A` with associated probabilities. 

Let $A_L(a)$ be the induced subgraph derived from `A`, and centered about vertex `a` with a maximum distance from `a` of `L`. 

VNviaSGM first find the seeds that are contained a subgraph around voi $close\_seeds\_A = A_{order\_voi\_subgraph}(voi)$. If no seeds are in this subgraph, the algorithm stops early and returns a nomination list of None. 

Two subgraphs are then generated around $close\_seeds\_A$ for graph A and the associated seeds for graph B ($close\_seeds\_B$). 

$SG1 = A_{order\_seeds\_subgraph}(close\_seeds\_A)$ and $SG2 = B_{order\_seeds\_subgraph}(close\_seeds\_B)$

These subgraphs ($SG1$ and $SG2$) are matched using graspy's graph matching algorithm. The probabilities returned by the algorithm are used to generate a nomination list sorted by probability. 



[1] Patsolic, HG, Park, Y, Lyzinski, V, Priebe, CE. Vertex nomination via seeded graph matching. Stat Anal Data Min: The ASA Data Sci Journal. 2020; 13: 229– 244. https://doi.org/10.1002/sam.11454


In [None]:
from graspologic.nominate import VNviaSGM
from graspologic.simulations import er_np
from graspologic.plot import heatmap
import numpy as np
import matplotlib.pyplot as plt

np.set_printoptions(suppress=True)

In [None]:
# Define parameters
n = 50
p = 0.3
num_seeds = 4

In [None]:
np.random.seed(2)
G1 = er_np(n=n, p=p)
node_shuffle_input = np.random.permutation(n)

G2 = G1[np.ix_(node_shuffle_input, node_shuffle_input)]

heatmap(G1, title = "Origional ER Graph (unshuffled)")
heatmap(G2, title = "Shuffled ER graph")

In [None]:
kklst= [(xx, yy) for xx, yy in zip(node_shuffle_input, np.arange(len(node_shuffle_input)))]
kklst.sort(key=lambda x:x[0])
print("Association from (node G1, node G2): ", kklst)
kklst = np.array(kklst)

kklst_dict = {}
for kk in kklst:
    kklst_dict[kk[0]]=kk[1]

In [None]:
voi = 5 # choose a vertex of interest

VNalg = VNviaSGM()
print(VNalg.fit_predict(G1, G2, voi, [kklst[0:num_seeds, 0], kklst[0:num_seeds, 1]]))

The algorithm produces a nomination list in the following format (index j in G2, probability that j matches voi). Note: the output is sorted with the largest probability coming first in the output lsit. As seen, the actual correspondence is 5--37 and the model predicts that 5 (in graph G1) matches with 37 (in graph G2) with >90% confidence. 