This repository contains methods an experiments from the paper: A Simple and Powerful Framework for Stable Dynamic Network Embedding.
This repository implements a collection of spectral and skip-gram dynamic network embedding methods. This repo compares how well different methods can encode planted structure in simulated dynamic networks through the use of a hypothesis test. The methods are also applied to a dynamic flight network to see whether they can encode both the periodic nature of the network, as well as represent the disruption caused by the COVID-19 pandemic on air traffic.
This code has been tested using Python 3.8.10. To install the required packages for this repo use the following command.
pip install -r requirements.txt
We also include the GloDyNE embedding, for which the METIS package is required. If you do not want to install the METIS package, you can use the flag --no-glodyne
when running experiments to avoid using GloDyNE.
We consider the following datasets in this analysis.
- The OpenSky network, which is a 17,388 node dynamic network of flights which occurred from the start of 2019 to the end of 2021.
airports.csv
from OurAirports, which contains information about airports around the world.- The C5 regulations data, from the Oxford COVID-19 Government Response Tracker. This data appears as
c5m_close_public_transport.csv
. This lists the travel restrictions in place per country in response to the COVID-19 pandemic.
embedding_functions.py
: Contains functions for several spectral and skip-gram dynamic embedding methods (listed below). Also contains functions to perform hypothesis testing on a dynamic embedding.
>>> from embedding_functions import *
>>> from experiment_setup import *
>>>
>>> # Generate random dynamic network where community one moves half way through
>>> n = 100
>>> T = 10
>>> As, tau, _ = make_temporal_simple(n=n, T=T, move_prob=0.9)
>>>
>>> # Compute a dynamic embedding
>>> ya = embed(As, d=2, method="URLSE")
>>>
>>> ya.shape
(1000, 2)
experiment_setup.py
: Contains functions to generate various dynamic networks with planted structure.experiments.py
: Checks if various dynamic embedding methods can encode planted structure through the use of a hypothesis test. For each, a cumulative p-value distribution is plot and coloured based on if the dynamic embedding correctly encoded the planted structure or not. The plots are coloured green if correct, red if incorrect and blue if correct but the test was conservative.
python experiments.py --methods "URLSE" --experiments "moving-static-community" --check-types "community"
> URLSE had power 0.035
python experiments.py --methods "URLSE" --experiments "moving-community" --check-types "graph"
> URLSE had power 0.57
optional arguments:
-h, --help show this help message and exit
--methods METHODS [METHODS ...]
List of dynamic embedding methods to run. Select from:
URLSE, UASE, OMNI, ISE, ISE Procrustes, GloDyNE,
Unfolded Node2Vec, Independent Node2Vec
--experiments EXPERIMENTS [EXPERIMENTS ...]
Select which dynamic network systems to test on.
Select from: static, moving-static-community, moving-
community, merge, static-spatial, power-moving, power-
static
--check-types CHECK_TYPES [CHECK_TYPES ...]
Run experiments at the community, graph, or node level
--n-runs N_RUNS Number of p-values to compute for each test
--n N Number of nodes in each (non-power) experiment
--n-power N_POWER Number of nodes in each power experiment (reccomend at
least 2000)
--all Runs all experiments with all methods at all check
types.
--no-save Use to bypass saving an experiment.
--no-plots Use to bypass plotting an experiment.
--plot-only Plot the result of a previously saved experiment.
Don't compute another test.
--no-glodyne In case you don't have the METIS package installed,
you can run the code without the GloDyNE method
(overrides the --all flag).
prep_flight_data.py
: Generates a sparse adjacency series and labelled node set from the OpenSky network. Note that we have not included the rawflightlist_.csv
files due to their size, but the sparse adjacency series that are generated by this script are small enough to include.flight_network_analysis.py
: Computes a dynamic embedding of the OpenSky network. Follow-up analysis on this embedding includes the generation of dissimilarity and p-value matrices to study the encoded temporal structure of the network and hierarchical clustering.flight_and_c5_regulations.py
: Produces a neat plot of the dissimilarity and p-value matrices to compare the encoded temporal structure of the network to a plot of travel restrictions in Europe in response to the COVID-19 pandemic. There are two waves of dissimilarity in the dissimilarity matrix which line up with the two main European waves of the pandemic.
This repository implements a collection of spectral and skip-gram dynamic embedding methods. Here, we consider the problem of embedding discrete-time dynamic networks, i.e. those that can be represented as a series of adjacency matrix ``snapshots" over time,
-
Unfolded adjacency spectral embedding (UASE): Constructs a column-concatenated adjacency matrix,
$\mathcal{A} = \left[\mathbf{A}^{(1)}, \dots, \mathbf{A}^{(T)} \right]$ , called the rectangular unfolded adjacency matrix. The dynamic embedding is then achieved through the right SVD embedding [1,2]. - Unfolded Regularised Laplacian Spectral Embedding (URLSE): computes a spectral embedding the matrix,
where
are the left and right degree matrices and
-
Omnibus embedding (OMNI): Constructs a single block matrix containing pairwise-averaged adjacency snapshots,
$\mathbf{M}_{s,t} = (\mathbf{A}^{(s)} + \mathbf{A}^{(t)})/2$ , and then computes a spectral embedding on this matrix to achieve a dynamic embedding [3]. - Independent spectral embedding (ISE): Each
$\hat{\mathbf{Y}}^{(t)}$ is the independent spectral embedding of each$\mathbf{A}^{(t)}$ . Here, we add the option to ensure that the eigenvector orientation is consistent to remove random flipping between embedding time points. We additionally add the option of aligning subsequent embedding time points via a Procrustes rotation.
-
Unfolded node2vec: Computes a node2vec embedding on the
$(n + nT) \times (n + n T)$ dilated unfolded adjacency matrix,
This computation gives both a time-invariant anchor embedding (first
-
GloDyNE: The method aims to preserve global topology by only updating a subset of nodes which accumulate the largest topological changes over subsequent network snapshots. Then, to compute an embedding at
$t$ , its training is initialised using the pre-trained weights of the SGNS at$t-1$ in order to keep the time points similar [4]. -
Independent node2vec: Computes each
$\hat{\mathbf{Y}}^{(t)}$ as an independent static node2vec embedding [5].
System | Description | Ideal Testing Power at 5% Level |
---|---|---|
static |
Two static communities. | 0.05 |
moving-community |
Community 2 changes, community 1 is static. The moving community is selected for testing. | 1 |
moving-static-community |
Community 2 changes, community 1 is static. The static community is selected for testing. | 0.05 |
merge |
Community 1 and 2 merge over time. | 0.05 |
power-static |
Static sparse network with a single community. | 0.05 |
power-moving |
Changing sparse network with a single community. | 1 |
[1] Ian Gallagher, Andrew Jones, and Patrick Rubin-Delanchy. Spectral embedding for dynamic networks with stability guarantees. Advances in Neural Information Processing Systems, 34:10158–10170, 2021.
@article{gallagher2021spectral,
title={Spectral embedding for dynamic networks with stability guarantees},
author={Gallagher, Ian and Jones, Andrew and Rubin-Delanchy, Patrick},
journal={Advances in Neural Information Processing Systems},
volume={34},
pages={10158--10170},
year={2021}
}
[2] Andrew Jones and Patrick Rubin-Delanchy. The multilayer random dot product graph. arXiv preprint arXiv:2007.10455, 2020.
@article{jones2020multilayer,
title={The multilayer random dot product graph},
author={Jones, Andrew and Rubin-Delanchy, Patrick},
journal={arXiv preprint arXiv:2007.10455},
year={2020}
}
[3] Keith Levin, Avanti Athreya, Minh Tang, Vince Lyzinski, Youngser Park, and Carey E Priebe. A central limit theorem for an omnibus embedding of multiple random graphs and implications for multiscale network inference. arXiv preprint arXiv:1705.09355, 2017.
@article{levin2017central,
title={A central limit theorem for an omnibus embedding of multiple random graphs and implications for multiscale network inference},
author={Levin, Keith and Athreya, Avanti and Tang, Minh and Lyzinski, Vince and Park, Youngser and Priebe, Carey E},
journal={arXiv preprint arXiv:1705.09355},
year={2017}
}
[4] Chengbin Hou, Han Zhang, Shan He, and Ke Tang. Glodyne: Global topology preserving dynamic network embedding. IEEE Transactions on Knowledge and Data Engineering, 2020.
@article{hou2020glodyne,
title={Glodyne: Global topology preserving dynamic network embedding},
author={Hou, Chengbin and Zhang, Han and He, Shan and Tang, Ke},
journal={IEEE Transactions on Knowledge and Data Engineering},
volume={34},
number={10},
pages={4826--4837},
year={2020},
publisher={IEEE}
}
[5] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855–864, 2016.
@inproceedings{grover2016node2vec,
title={node2vec: Scalable feature learning for networks},
author={Grover, Aditya and Leskovec, Jure},
booktitle={Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining},
pages={855--864},
year={2016}
}