An example that computes the representation, as in step 2 of the xNetMF algorithm given on pages 4 and 5 of the paper.

To run this notebook, ensure that config.py and xnetmf.py are in the same folder as this file. We probably shouldn't put them on GitHub, since we don't want to use any of their code.

In [1]:
import numpy as np
from scipy import sparse

from config import * # Defines Graph and RepMethod classes that we use in step 1 placeholder
from xnetmf import get_features # Computes the feature matrix in the step 1 placeholder

from representation import *

# Step 1: Node identity extraction (placeholder)

We compute the feature matrix using the paper's authors' source code.

Run only this block for two $1000 \times 1000$ adjacency matrices. Run only the next block for a specific small example found in figure 2 of the paper.

In [2]:
"""np.random.seed(1)

A = sparse.csr_matrix( np.random.randint(2,size=(1000,1000)) )
B = sparse.csr_matrix( np.random.randint(2,size=(1000,1000)) )
comb = sparse.block_diag([A, B])

graph = Graph(adj = comb.tocsr())
rep_method = RepMethod(max_layer = 2)"""

'np.random.seed(1)\n\nA = sparse.csr_matrix( np.random.randint(2,size=(1000,1000)) )\nB = sparse.csr_matrix( np.random.randint(2,size=(1000,1000)) )\ncomb = sparse.block_diag([A, B])\n\ngraph = Graph(adj = comb.tocsr())\nrep_method = RepMethod(max_layer = 2)'

In [3]:
np.random.seed(1)

A = sparse.csr_matrix(np.array([[0., 1., 1., 1., 0.],
                                [1., 0., 0., 0., 0.],
                                [1., 0., 0., 0., 0.],
                                [1., 0., 0., 0., 1.],
                                [0., 0., 0., 1., 0.]]))

B = sparse.csr_matrix(np.array([[0., 1., 0., 0., 0., 0.],
                                [1., 0., 0., 1., 0., 0.],
                                [0., 0., 0., 1., 1., 0.],
                                [0., 1., 1., 0., 1., 1.],
                                [0., 0., 1., 1., 0., 0.],
                                [0., 0., 0., 1., 0., 0.]]))
comb = sparse.block_diag([A, B])

graph = Graph(adj = comb.tocsr())
rep_method = RepMethod(max_layer=2)

Get the feature matrix, as computed in their source code.

In [4]:
feature_matrix = get_features(graph, rep_method, True)

max degree:  4
got k hop neighbors in time:  0.0022678375244140625
got degree sequences in time:  0.00013875961303710938


In [5]:
print(feature_matrix)

[[0.   0.21 0.1  1.   0.  ]
 [0.   1.01 0.01 0.1  0.  ]
 [0.   1.01 0.01 0.1  0.  ]
 [0.   0.12 1.   0.1  0.  ]
 [0.   1.   0.1  0.01 0.  ]
 [0.   1.   0.1  0.   0.01]
 [0.   0.11 1.02 0.   0.1 ]
 [0.   0.01 1.11 0.   0.1 ]
 [0.   0.11 0.3  0.   1.  ]
 [0.   0.01 1.11 0.   0.1 ]
 [0.   1.   0.03 0.   0.1 ]]


# Step 2: Efficient similarity-based representation

See representation.py for the custom code (based on the source code).

First, compute the number of landmark nodes. Recall: this is, by default, the minimum of $p$ and $10\log_{2}(n)$, where $n$ is the total number of nodes of the two graphs.

In [6]:
print('Number of landmark nodes:', rep_method.p)

Number of landmark nodes: None


In [7]:
get_number_of_landmarks(graph, rep_method)
print('Number of landmark nodes:', rep_method.p)

Number of landmark nodes: 11


Get the landmark nodes, which are chosen randomly.

In [8]:
landmarks = get_random_landmarks(graph, rep_method)
print(landmarks)

[ 2  3  4  9  1  6  0  7 10  8  5]


Compute the similarity matrix between all $n$ nodes and all $p$ landmark nodes.

In [9]:
C = compute_C_matrix(feature_matrix, landmarks)
print(f'Shape of C: {len(C)} x {len(C[0])}')
print('C:')
print(C)

Shape of C: 11 x 11
C:
[[0.23267794 0.19630219 0.20105033 0.12617316 0.23267794 0.15467951
  1.         0.12617316 0.19417412 0.1287349  0.19706927]
 [1.         0.16995867 0.98383213 0.10752843 1.         0.15722144
  0.23267794 0.10752843 0.9797087  0.14895664 0.98186643]
 [1.         0.16995867 0.98383213 0.10752843 1.         0.15722144
  0.23267794 0.10752843 0.9797087  0.14895664 0.98186643]
 [0.16995867 1.         0.20341643 0.95676259 0.16995867 0.9797087
  0.19630219 0.95676259 0.17634729 0.22310785 0.20301001]
 [0.98383213 0.20341643 1.         0.13394848 0.98383213 0.19231897
  0.20105033 0.13394848 0.98511194 0.16006105 0.99980002]
 [0.98186643 0.20301001 0.99980002 0.13421665 0.98186643 0.19270399
  0.19706927 0.13421665 0.98708414 0.16329449 1.        ]
 [0.15722144 0.9797087  0.19231897 0.98206282 0.15722144 1.
  0.15467951 0.98206282 0.16995867 0.26490076 0.19270399]
 [0.10752843 0.95676259 0.13394848 1.         0.10752843 0.98206282
  0.12617316 1.         0.11689257 0

Lastly, compute representations.

In [10]:
representations_1, representations_2 = compute_representation(C, landmarks, A.shape[0])
print('Representations of nodes from first graph:')
print(representations_1)

Representations of nodes from first graph:
[[-1.24779431e-07  3.88762298e-04 -6.29061414e-04  1.11262294e-02
   8.63518936e-03 -5.90462261e-01  7.48798035e-01  3.64296534e-02
   2.98546137e-01 -4.44869391e-18 -4.12424518e-21]
 [-7.65052822e-05  8.13762154e-03 -3.12519934e-03 -9.11184468e-02
   4.05700058e-02 -1.72651094e-02 -4.14625549e-02 -4.16916661e-01
   9.02296392e-01 -1.64922503e-17 -1.49554466e-20]
 [-7.65052822e-05  8.13762154e-03 -3.12519934e-03 -9.11184468e-02
   4.05700058e-02 -1.72651094e-02 -4.14625549e-02 -4.16916661e-01
   9.02296392e-01 -1.64922503e-17 -1.49554466e-20]
 [-7.26524877e-06 -3.79739523e-02  2.47522713e-02 -5.24130100e-02
  -1.65423199e-01 -6.95455462e-02 -5.20185929e-02  8.04807226e-01
   5.59085107e-01 -5.28404289e-18 -5.22763976e-21]
 [ 1.66769023e-03 -1.20918602e-02 -5.49945850e-02  5.20945059e-02
  -3.86513962e-02  1.07889981e-02 -6.89839481e-02 -3.89413032e-01
   9.14385660e-01 -1.65686446e-17 -1.50359114e-20]]


# Step 3: Fast node representation alignment

Now, we can plug this into step 3, as found in aligning.py and Example Alignments.ipynb.

In [11]:
from aligning import *

In [12]:
similarity_matrix = get_similarity_matrix(representations_1, representations_2, 4)
print(similarity_matrix)

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 20 stored elements and shape (5, 6)>
  Coords	Values
  (0, 0)	0.2816117602320688
  (0, 5)	0.2809696922236644
  (0, 1)	0.27246461956466755
  (0, 3)	0.26712212863530455
  (1, 0)	0.8265958946975658
  (1, 5)	0.8175434299532052
  (1, 1)	0.27299820119679885
  (1, 3)	0.27127005165809465
  (2, 0)	0.8265958946975658
  (2, 5)	0.8175434299532052
  (2, 1)	0.27299820119679885
  (2, 3)	0.27127005165809465
  (3, 1)	0.8175434299551341
  (3, 2)	0.7452272845376627
  (3, 4)	0.7452272845376627
  (3, 3)	0.28750677720374995
  (4, 0)	0.9801996534648412
  (4, 5)	0.8415098274793559
  (4, 1)	0.28055964236754904
  (4, 3)	0.27359645753121126
