This repository contains a Python library supporting the paper Sanna Passino, F. and Heard, N. A. (2023) "Mutually exciting point process graphs for modelling dynamic networks", Journal of Computational and Graphical Statistics, 32:1, 116-130 (link, arXiv preprint).
The library meg can be installed in edit mode as follows:
pip install -e lib/
The library can then be imported in any Python session:
import megThe repository contains multiple directories:
libcontains the Python library;notebookscontains Jupyter notebooks with examples on how to use the library;scriptscontains Python scripts to reproduce the results in the paper;resultscontains some of the results described in the paper;plotscontains Python scripts for reproducing the plots in the paper;tikz_processcontains .tex files for reproducing Figure 1;foxcontains additional scripts for implementing the methodology of Fox et al. (2016).
Update [December 2023] - If Python 3.12 is used, then the instructions at this link should be followed to install numba for Python 3.12, which would then avoid errors with the installation of sparse (required for the meg installation).
The model and datasets are described in Sanna Passino, F. and Heard, N. A. (2023).
The main part of the code is contained in the file meg.py, which contains a Python class for the MEG model and inference using gradient ascent and expectation-maximisation methods.
For the simulation in Section 5.2, the file simulation_erdos.py is used, using the arguments in simulation_erdos.sh in scripts. For fitting the model on the Enron and ICL data, the files enron.py and icl.py are used. Details about the possible options are given by the help function for each file. For example, running python3 scripts/simulation_erdos.py --help returns:
-f: name of the destination folder for the output files,-m: Boolean variable for the main effects (default: FALSE),-i: Boolean variable for the interactions (default: FALSE),-pm: Boolean variable (used only if-mTRUE), if TRUE a Poisson process is fitted for the main effects (default: FALSE),-pi: Boolean variable (used only if-iTRUE), if TRUE a Poisson process is fitted for the interactions (default: FALSE),-hm: Boolean variable (used only if-mTRUE), if TRUE a Hawkes process is fitted for the main effects, otherwise a Markov process is used (default: FALSE),-hi: Boolean variable (used only if-iTRUE), if TRUE a Hawkes process is fitted for the interactions, otherwise a Markov process is used (default: FALSE),-d: number of latent features for the interaction term (default: 1),-n: number of nodes of the graph in the simulation (default: 10),-T: maximum time of observation for each simulated graph (default: 1000000),-M: number of simulated events for each graph (default: 10000),-p: probability of a link in the Erdős–Rényi graph (default: 0.5).
For example, the first simulation is obtained running the following command line:
python3 scripts/simulation_erdos.py -f simulation_1 -M 5000 -p 0.25 -n 10 -d 1 -m -i -hm -hi &
Similar commands are used for the application on the Enron and ICL data. Running ./enron.py --help gives two additional options:
-z: Boolean variable, if TRUEfor
, and
if
(default:
set to its MLE);
-fl: Boolean variable, if TRUE,for all links (default: FALSE).
For example, to obtain the best performing model on the Enron data, the following command line should be run:
python3 scripts/enron.py -m -hm -i -d 5 -z -f 'enron_results/tau_Aij/mi_hm_wi_5' &
Since many of the simulations are computationally expensive to run, the output has been stored in the repository in the directories simulation_main, simulation_inter, simulation_1 and simulation_2 in results. Details on how to obtain such outputs are given in the following paragraphs.
The results, tables and figures in the paper could be reproduced using the following files:
- Figure 1 - Source
.texfiles to reproduce the figures are in the directorytikz_process. - Figure 2 - It can be reproduced running the following three files in succession:
simulation_main_effects.sh(WARNING: computationally demanding), which usessimulation_main_effects.pywith argument-s SEED, and stores the simulated graphs in.npyfiles insimulation_main, with namemeg_simulate_SEED.npy;- after simulating the grahs, the parameter estimation procedure is run using
estimate_simulation_main_effects.py, which takes as argument-n NUMBEREVENTSthe number of events to use for estimation. The output is saved in a directorysimulation_main/estimate_NUMBEREVENTS. For reproducing Figure 2, the argument-n 3000should be used; - plots are obtained from
plots_simulation_main.py, run with-M NUMBEREVENTScorresponding to the number of events used for inference (for Figure 2,-M 3000).
- Figure 3 - The procedure is similar to Figure 2:
simulation_interaction.sh(WARNING: computationally demanding), which usessimulation_interaction.pywith argument-s SEED, and stores the simulated graphs in.npyfiles insimulation_inter, with namemeg_simulate_SEED.npy;- after simulating the grahs, the parameter estimation procedure is run using
estimate_simulation_interactions.py, which takes as argument-n NUMBEREVENTSthe number of events to use for estimation. The output is saved in a directorysimulation_inter/estimate_NUMBEREVENTS. For reproducing Figure 3, the argument-n 3000should be used; - plots are obtained from
plots_simulation_inter.py, run with-M NUMBEREVENTScorresponding to the number of events used for inference (for Figure 3,-M 3000).
- Figure 4 - The plots can be obtained running
estimate_simulation_main_effects.pyandplots_simulation_main.pymultiple times with arguments-nand-M250, 500, 1000, and 2000. - Figure 5 - The boxplots can be reproduced running
./simulation_erdos.sh, followed byestimate_simulation_erdos.sh(both computationally expensive). The plot is then obtained by running followed byboxplots.py.
- Table 1 - The results can be reproduced running
./enron_calls.sh(running the entire file is not recommended, since the file contains command lines for all the 117 combinations of models in Table 1), which uses the fileenron.py. Comparisons with the model of Fox et al. (2016) can be run using the filesfox_model.pyandfox_enron.py.
- The Enron data can be downloaded running
scripts/enron_filter.sh; - For security reasons, the ICL network data have not been made available, but the code to run the model on such networks (
scripts/icl.py) is available.