# node2vec2rank
In this notebook we demonstrate node2vec2rank. We compare two 1000x1000 graphs that differ in a degree-naive manner, i.e., connections change but not the degree. Then we compute the recall in retrieving the most changing nodes. Make sure to set up the environment with all required packages. We recommend creating a conda environment as specified in the  [README](https://github.com/pmandros/n2v2r/blob/main/README.md).

## Import Required Libraries


In [1]:
import warnings
warnings.filterwarnings('ignore')

import sys
import os

sys.path.insert(1, os.path.realpath(os.path.pardir))
sys.path.append("../node2vec2rank/")


import pandas as pd

## Read the configuration file 
Use the configuration file to define the parameters needed for data loading, model fitting and ranking. Foe example, is_edge_list should be True if the graphs are represented as edge lists.

In [2]:
import json

#read the config file
config = json.load(open('../configs/config_demo_edge.json', 'r'))
#flatten
config = {param: value for _, params in config.items()
          for param, value in params.items()}

## Run node2vec2rank:
Create dataloader, load the graphs in memory and get some properties. It also extracts the node labels and order to be used as index. The dataloader can handle symmetric graphs in adjacency or weighted edge list (through networkx) format, and bipartite graphs in adjacency format (represented as rectangular matrices). If in adjacency format, ensure that index and header exist. If the input is bipartite graphs, the graphs will be symmetriced by projecting in row node or column node space (through matrix multiplication).

In [3]:
from node2vec2rank.dataloader import DataLoader

dataloader = DataLoader(config=config)
graphs = dataloader.get_graphs()
nodes = dataloader.get_nodes()

There are 1000 row nodes and 1000 column nodes in graph 1
There are 1000 row nodes and 1000 column nodes in graph 2



### Create node2Vec2Rank model

In [4]:
from node2vec2rank.model import N2V2R
model = N2V2R(graphs=graphs, config=config, nodes=nodes)

### Fit node2Vec2Rank
fit_transform_rank() generates all the rankings from all the parameters. <br>
aggregate_transform() aggregates the rankings with Borda into one ranking. <br>
degree_difference_ranking() generates the degree difference ranking. <br>
The output and config file is written to disk in the folder specifed in the config file with a timestamp attached.

In [5]:
# train Node2Vec2Rank and generate rankings
rankings = model.fit_transform_rank()

# generate ranking based on borda ranking
borda_ranking = model.aggregate_transform()

# get DeDi ranking
DeDi_rankings = model.degree_difference_ranking()

../output/12_29_2023_02_41_55

Running n2v2r with dimensions [4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24] and distance metrics ['euclidean', 'cosine'] ...
	Multi-layer embedding in 0.14 seconds
n2v2r computed 22 rankings for 1 comparison(s) in 0.63 seconds

Rank aggregation with Borda ...
	Finished aggregation in 0.85 seconds


## Recall for most changing nodes
Nodevec2rank generates one ranking per pair-wise comparison. For two graphs, there is only one comparison. If there are three graphs and the comparison strategy is one-versus-rest, there will be three comparisons generated.
In this example with two graphs, node2vec2rank performs one comparison and we access this comparison with key '1'.
We sort the ranks, get the ground truth of which nodes changed the most (we know it is the nodes of the first community) and compute recall. In this case we have simulated a  degree-naive difference on the graphs, i.e., the connections change but the degree stays roughly the same. Note that the output of node2vev2rank is not sorted by default, but rather respects the original index. 



In [6]:
comparison = '1'

borda_ranking_pd = borda_ranking[comparison]
absDeDi_ranking_pd = DeDi_rankings[comparison][['absDeDi']]

borda_ranking_pd.sort_values(by='borda_ranks',ascending=False, inplace=True)
absDeDi_ranking_pd.sort_values(by='absDeDi',ascending=False, inplace=True)

comm_assignments = pd.read_csv("../data/networks/demo/comm_asiggnments.csv", index_col=0, header=0)
comm_assignments.index = comm_assignments.index.astype(str, copy = False)
most_changing_nodes = set(comm_assignments.index[comm_assignments['0'] == 0].tolist())
num_rel = len(most_changing_nodes)

print(f'n2v2r recall: {round(len(set(borda_ranking_pd.index.to_list()[:num_rel]).intersection(most_changing_nodes))/num_rel,2)}')
print(f'DeDi recall: {round(len(set(absDeDi_ranking_pd.index.to_list()[:num_rel]).intersection(most_changing_nodes))/num_rel,2)}')


n2v2r recall: 0.68
DeDi recall: 0.0
