Research code for "Choosing to grow a graph" project. Contains code for network generation and model estimation.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Choosing to Grow a Graph

This code and data repository accompanies the paper:

For questions, please email Jan at

The code for fitting logit models, as well as the code to generate the synthetic graphs for section 4.1, is written in Python 3. The code for the plots is written in R.

We used the following versions of external python libraries:

  • networkx=2.1
  • numpy=1.14.3
  • pandas=0.23.0
  • scipy=1.1.0
  • plfit - install from here, but remove before building, for Python 3 compatibility.

Reproducing results and figures

To reproduce the results from Section 4.1 and 4.2, follow these steps (from the /src folder):

  1. Generate synthetic graphs with python This generates 10 graphs for each (r, p) combination, and writes them to data_path/graphs, as defined in
  2. Extract, for each edge, the relevant choice data with python The choice set data is written to data_path/choices.
  3. Run the analysis code with python

For the analysis in Section 4.3, follow these steps:

  1. Download the Flickr data with curl -O -4 data/. This file is about 141 Mb large.
  2. Process the Flickr data with python This code takes a while to run.
  3. Build the RMarkdown report with R -e "rmarkdown::render('../reports/flicrk_data.Rmd', output_file='../reports/flicrk_data.pdf')".

For the analysis in Section 4.4, follow these steps:

  1. Download the Microsoft Academic Graph. Warning, the uncompressed size of this data set is over 165Gb. Download it with the following Bash code:
    mkdir ~/mag_raw
    cd mag_raw
    for i in {0..8}
       curl -O -4$
       unzip mag_papers_$
  2. Process the data with python Note that you can change the field of study to process. This code takes a while to run.
  3. Build the RMarkdown report with R -e "rmarkdown::render('../reports/mag_climatology.Rmd', output_file='../reports/mag_climatology.pdf')".

Finally, to produce the figures of the paper, run the R code to make the plots with Rscript make_plots.R.

Other software libraries

Because discrete choice models are widely studied in other fields, there are many other software libraries available for the major statistical programming languages. For Python, there is an implementation in statsmodels, as well as the larch, pylogit, and choicemodels packages. For R, there are the mlogit and mnlogit libraries. Stata has the clogit and xtmelogit routines build-in, and there are a number of user written routes as well. We haven't tested these libraries, but they might be useful.