This project contains the code to produce, analyze and plot the networks created by PATCH, a network model of [P]referential [A]ttachment, [T]riadic [C]losure and [H]omophily.
This is an interface that simplifies the interaction with the NetIn software package.
Check the linked Zenodo page to download the intermediate results data.
To make this code independent of the underlying OS and software dependencies, we make us of docker compose (>= v.2.12.2, see official docs for installation steps).
Docker takes care of installing required software packages, and the setup of the local code.
Configuration of the docker builds is handled by .env-files.
An example of such a file can be found in /config/sample.env.
This file is sufficient to reproduce the findings.
It defines environment variables, such as folders for in- and output.
The high-level execution scripts in /scripts/ additionally read configuration parameters from the command line.
If no argument is given, a default value is typically taken from patch/constants.py.
These scripts also provide a description of what they do when calling the -h flag, e.g.,
scripts/create_graphs.py -hAfter adjusting the configuration, build the docker image by running
docker compose --env-file config/sample.env buildreplacing config/sample.env to the path to your custom .env-file in case you created a new one (from here on, we will continue to use config/sample.env).
Start the container by executing
docker compose --env-file config/sample.env up -dwith the -d flag signaling docker to run the containers in a background process, allowing you to continue using the same terminal.
All executable, high-level scripts are located in the scripts/ folder.
To execute a script at location <script.py> and its arguments <arguments>, simply run (see below for examples):
docker exec -t patch python <script.py> <arguments>The -t forwards the output of the container immediately to your local terminal.
Running the executable scripts in the presented order will reproduce the figures of the paper. For instance, to compute a subset of graphs, run
docker exec -t patch python scripts/create_graphs.py -H .2 .5 -tau 0.0 0.2 -r 5 -lfm-g UNIFORM PAH -lfm-t UNIFORMWe explain the code structure by the folder tree
├── README.md # this file
├── data # folder containing data
│ ├── empirical # empirical datasets
│ └── graphs # folder containing raw graphs if present
├── notebooks # jupyter notebooks
│ ├── plot_elfi_inference_results.ipynb # empirical results plots
│ ├── plot_fig1_subplots.ipynb # toy networks and connection probabilities
│ ├── plot_simulation_results.ipynb # Simulation based results
│ └── plot_synthetic_inference_validation.ipynb # inference validation SI plots
├── patch # local project containing helper functions
│ ├── constants.py # constant strings and values
│ ├── elfi.py # package code to handle ELFI inference
│ ├── empirical.py # processing of empirical datasets
│ ├── io.py # functions for input/output
│ ├── model_config.py # defines and validates the configuration of a model
│ ├── model_config.py # defines and validates the configuration of a model
│ └── statistics.py # aggregate network inequality statistics
├── output # folder that contains output plots and stats
├── requirements.txt # required software packages
├── scripts # interface to run inference and simulation studies
│ ├── inference_study # ELFI inference experiments
│ │ ├── 01_elfi_validation.py # validate approach on synthetic data
│ │ ├── 02_elfi_inference.py # run inference on empirical data
│ │ └── 03_elfi_predictive.py # run predictive analysis with inferred params
│ └── simulation_study # Simulation experiments
│ ├── 01_create_graphs.py # creates graphs and stores them as JSON files
│ └── 02_compute_aggregate_stats.py # compute aggregate statistics from sim. nets
│ ├── elfi_inference.py # run ELFI inference on empirical datasets
│ ├── elfi_predictive.py # run ELFI predictive analysis
│ ├── elfi_validation.py # run ELFI synthetic validation analysis
│ └── compute_aggregate_stats.py # reads graphs and computes aggregate stats
└── setup.py
The output folder contains aggregated network statistics (stats/), figures (plots/), and inference results (inference/).
Aggregated statistics are provided as a stats/*.csv-file with the following columns
- Parameters
N: Number of nodes.m: Number of new links per node.lfm_global: the global link formation mechanism. Values can be found inutils/constants.py.lfm_tcthe local link formation mechanism (for triadic closure links). Values can be found inutils/constants.py.homophily: the homophily parameters. Higherhmeans that nodes prefer to connect to nodes of the same group.tau: the triadic closure probability. Higher values mean that more links will be formed according to thelocallink formation mechanism.realization: the realization (typically multiple simulation runs are executed).
- Network metrics
gini: the global degree inequality. Higher values indicate that degrees are distributed unequally: few nodes have most of the links.gini_minandgini_maj: the degree inequality within each group.ei: network segregation. Values towards-1indicate a segregated network in which nodes connect mainly to their own group. Values close to+1indicate the opposite.mann_whitney: the inequity in terms of the minority. Values close to+1indicate that the minority generally has higher degrees than the majority group. Values towards0.5suggest equity and values towards0.0indicate majority advantage.
The data folder contains the simulated and empirical networks (in sub-folders graphs/ and empirical/).
The simulated graphs are stored as a zipped file (typically as data/graphs/N-<N>_m-<m>_f-<f>.tar.gz) with <N>, <m>, and <f> being placeholders for the respective variable values.
Run tar -xf data/graphs/N-<N>_m-<m>_f-<f>.tar.gz to extract the graph files.
Each graph is contained in a JSON generated by NetworkX (see docs for details and this link for an example using D3).
The parameters used to create each graphs can be found in the file name or as values in the respective files.
The structure of each JSON file is explained by the example of data/graphs/N-5000_m-3_f-0.2/N-5000_m-3_f-0.2_h-0.25_tau-0.0_lfm-g-HOMOPHILY_lfm-t-HOMOPHILY_r-0.json:
{
"N": 5000,
"m": 3,
"f_m": 0.2,
"tau": 0.0,
"lfm_global": "HOMOPHILY",
"lfm_tc": "HOMOPHILY",
"realization": 0,
"h_M": 0.25,
"h_m": 0.25,
"minority": [...],
"edge_list": [...]
}with "minority" containing minority node IDs and "edge_list" containing tuples (u,v)of connected nodes.
See notebooks for examples how to load these datasets and scripts to create your own data.
Contains notebooks that create the plots of the paper.
You can run a local jupyter server by running jupyter notebook in your terminal and the open the presented weblink in your browser to run the notebooks (see documentation for details).
This folder has general functions that simplify the interaction with the created data and the NetIn package. They can be imported in an arbitrary Python script as (see notebooks or scripts for examples):
from patch.model_config import ModelConfigContains Python scripts to simulate networks, compute the aggregate statistics required to produces the presented plots and model inference using the elfi software package.
The aggregated statistics will also be provided as a .csv-file in Zenodo.
In case of questions or bugs, open an issue in github or contact me directly at bachmann \at csh.ac.at.