In [1]:
from IPython.display import Markdown, display
display(Markdown(open("./SM_header.md", "r").read()))

Copyright © 2025 Université Paris Cité

Author: [Guillaume Rousseau](https://www.linkedin.com/in/grouss/), Laboratoire Matières et Systèmes Complexes, UMR 7057, CNRS and Université Paris Cité, CNRS, UMR7057, 10 rue Alice Domon et Léonie Duquet, F-75013, Paris cedex 13, France (email: guillaume.rousseau@u-paris.fr)

This archive contains the supplemental materials and replication package associated with the preprint, "*Growth Regime Shifts in Empirical Networks: Evidence and Challenges from the Software Heritage and APS Citation Case Studies*", available on [arXiv](https://arxiv.org/abs/2501.10145) and [ssrn](http://ssrn.com/abstract=5191689).

**The latest version of the preprint (timestamped arXiv:2501.10145v4) is downloadable here https://arxiv.org/pdf/2501.10145**

The current version of the Python scripts and associated resources is available on the [author's GitHub page](https://github.com/grouss/growing-network-study).

This work is currently licensed under the [Creative Commons CC BY-NC-SA 4.0 license](https://creativecommons.org/licenses/by-nc-sa/4.0).

To give appropriate credit and cite this work ([BibTeX entry](./rousseau2025temporal)):
Rousseau, G. (2025). *Growth Regime Shifts in Empirical Networks: Evidence and Challenges from the Software Heritage and APS Citation Case Studies* [Preprint]. arXiv:2501.10145. https://arxiv.org/abs/2501.10145; also available on SSRN: http://ssrn.com/abstract=5191689

 
# A) Replication Packages

[Open the Replication Package notebook related to the datasets.](./Replication_Package_Datasets.ipynb)

[Open the Replication Package notebook related to the figures.](./Replication_Package_Figures.ipynb)

# B) QuickStart Guide

[Open the QuickStart Guide notebook](./SM00_QuickStart.ipynb)

# C) Table of Contents

- 1. [Function Definitions](./SM01_Functions.ipynb)
- 2. [Dataset Import](./SM02_DatasetImport.ipynb)
- 3. [Building the Transposed Graph](./SM03_BuildingTransposedGraph.ipynb)
- 4. [Temporal Information Quality and Summary Statistics](./SM04_TemporalInformationMainStats.ipynb)
- 5. [Growth Relationship Between Nodes and Edges](./SM05_GrowingRules.ipynb)
- 6. [Topological Partitioning($RV$ Nodes)](./SM06_TopologicalPartitioning.ipynb)
- 7. [In-Degree and Out-Degree Distributions Over Time](./SM07_DegreeDistributionOverTime.ipynb)
- 8. [Distribution Tail Analysis](./SM08_DistributionTailAnalysis.ipynb)
- 9. [Temporal Partitioning](./SM09_TemporalPartitioning.ipynb)
- 10. [Derived $O-(RV/RL)-O$ Graph Construction](./SM10_DerivedGrowingNetwork.ipynb)
- 11. [Building the $TSL$ Partitioning](./SM11_TSLPartitioning.ipynb)
- 12. [Price / Directed Barabási–Albert Model Use Case](./SM12_BarabasiAlbertUseCase.ipynb)
- 13. [APS citation network](./SM13_APSCitationNetwork.ipynb)
- 14. [Generic Report Template](./SM14_GenericReport.ipynb)


**NB :** As of 2025/09/15, the QuickStart guide, the replication packages, and SM01 to SM14 are available. The Python scripts are also provided under `local_utils` directory, but they are not in their final form and should be considered an alpha release. 

The main graph datasets used in the study are available in a distinct Zenodo Deposit 10.5281/zenodo.15260640 ($\sim50$ Go), including the main dataset $O/RV/RL-O/RV/RL$ (2+ billions of nodes, $\sim4$ billions of edges), and two derived $O-(RV/RL)-O$ graphs ($\sim150$ millions nodes and edges).

The APS Citation Dataset (2022 export) has been included in the study.

More release notes are available in the [dedicated notebook](./SM_ReleaseNote.ipynb).

# 1. Functions

## a) Generic import

In [2]:
%load_ext autoreload
%autoreload 2

import importlib,sys,local_utils
from local_utils import *

print("___ Import data from graphpath=",config.graphpath)
print("___ Export data to exportpath=",config.exportpath)   

DisplayCopyrightInfo()


___ Import data from graphpath= ./ImportData/
___ Export data to exportpath= ./ExportData/
--------------------------------------------------------------------------------
Copyright 2025 Université Paris Cité, France 
Author: Guillaume Rousseau, MSC Lab, Physics Department, Paris, France 

(https://www.linkedin.com/in/grouss/)

This archive contains the supplemental materials and replication package associated with the preprint available on :
- arXiv (https://arxiv.org/abs/2501.10145)
- SSRN  (http://ssrn.com/abstract=5191689

Current version of python scripts and associated ressources are available on author's github page
(https://github.com/grouss/growing-network-study)

This work is currently licensed under CC BY-NC-SA 4.0
(https://creativecommons.org/licenses/by-nc-sa/4.0)
--------------------------------------------------------------------------------



**Comment** In the current implementation, NumPy arrays are held in memory. If necessary, the scripts can be adapted to leverage NumPy's native `memmap` functionality without significant effort.


## b) Display Info of the `config` module

This module centralizes all configuration parameters, specifying, for instance, input and output variables used throughout the analysis pipeline.

In [3]:
DisplayAllDoc(local_utils.config)

# path where output files will be saved
print("exportpath",config.exportpath)
# path where input files will be loaded
print("graphpath",config.graphpath)


Imported functions defined in /Volumes/X9_Pro_B/github.com/growing-network-study/local_utils/config.py :
--------------------------------------------------------------------------------
exportpath ./ExportData/
graphpath ./ImportData/


## c) Display Info of the `generic` module

This module centralizes generic utility functions.

In [4]:
DisplayAllDoc(generic)

Imported functions defined in /Volumes/X9_Pro_B/github.com/growing-network-study/local_utils/generic.py :
--------------------------------------------------------------------------------
Function: DisplayAllDoc()
Display the name and documentation of all user-defined functions within a given module.
This utility filters and lists all functions defined in the provided module, 
and prints their associated docstring, if available.
    Parameters :
        locallib : module
                   The module object to inspect.
--------------------------------------------------------------------------------
Function: DisplayCopyrightInfo()
This function display copyright and license information.
    Parameters :
        Verbose : Default=True. 
                  Print info
    Returns :
        outputstring :string
--------------------------------------------------------------------------------


## d) Display Info of the `dataset` module

This module contains all functions used to load the various datasets.

In [5]:
DisplayAllDoc(dataset)

Imported functions defined in /Volumes/X9_Pro_B/github.com/growing-network-study/local_utils/dataset.py :
--------------------------------------------------------------------------------
Function: DisplayDatasetInfo()
No documentation available
--------------------------------------------------------------------------------
Function: LoadAllArray()
Return nodes, edges and nodesad array using Compressed Sparse Row format. 
edges[Nodes[index]:Nodes[index+1]] returns the list of source node indexes for a given target node index.
    Parameters:
        transpose : bool, optional (default=Falss)
              if equals True will load the transpose.
    Returns:
        nodes : array (1D, len=Nnodes+1)
        edges : array (1D, len=Nedges)
        nodesad : array (1D, len=Nodes)
        Nnodes : int 
                 Number of nodes.
        Nedges : int
                 Number of edges.
--------------------------------------------------------------------------------
Function: LoadAllArray

## e) Display Info of the `graph` module

In [6]:
DisplayAllDoc(graph)

Imported functions defined in /Volumes/X9_Pro_B/github.com/growing-network-study/local_utils/graph.py :
--------------------------------------------------------------------------------
Function: ArgSort()
wrapping of the numpy argsort function
--------------------------------------------------------------------------------
Function: EPOCH2Epoch()
No documentation available
--------------------------------------------------------------------------------
Function: GetEdgeTs()
Build timestamp array of all source nodes and target nodes
--------------------------------------------------------------------------------
Function: GetEdgesTranspose()
Build edge graph transpose using numpy lib
--------------------------------------------------------------------------------
Function: GetEdgesTypeException()
Check if edgetype exists and return 
     False (no exception,nodetype exists)
     True (exception, nodetype does not exist)
-------------------------------------------------------------------

## f) Display Info of the `stat` module

In [7]:
DisplayAllDoc(stat)

Imported functions defined in /Volumes/X9_Pro_B/github.com/growing-network-study/local_utils/stat.py :
--------------------------------------------------------------------------------
Function: BinArray2dict()
Converts an array into a dictionary by mapping each non-zero value to its index.
    Parameters:
        array: Input array of values.
    Returns:
        dict: Dictionary where keys are indices and values are the corresponding non-zero elements of the array.
--------------------------------------------------------------------------------
Function: BuildEdgesTimeStampHisto()
Builds a histogram of edge types based on the timestamps of their source and target nodes.
    Parameters:
        nodes: Nodes in CSR format.
        edges: Edges in CSR format.
        nodesTS: Timestamps associated with each node.
        d (dic): Parameter used for edge type classification.
        stat (dict, optional): Dictionary to accumulate time-based type statistics. Default is empty.
        depth

## e) Display Info of the `mle` module

This module contains all functions implementing the first two steps of the method proposed by Clauset et al. (2009) for estimating the exponent associated with the tail of a distribution.

In [8]:
DisplayAllDoc(mle)

Imported functions defined in /Volumes/X9_Pro_B/github.com/growing-network-study/local_utils/mle.py :
--------------------------------------------------------------------------------
Function: GetXmin()
This function returns the list of threshold values x_c corresponding to 
the beginning of the distribution's tail, over which the power-law exponent 
alpha[x, x_c] will be estimated.
    Parameters:
        x (array) : Observed values from the distribution.
        AllValues : bool, optional (default=False).
                    If True, return all values of x for which the empirical distribution y[x] != 0.
    Returns:
        x_min (array) : Threshold values x_c to be used for estimating the power-law exponent.
--------------------------------------------------------------------------------
Function: Get_yfit_yc_y_x()
This function call f_exposant_hat, determine the best value for the cut-off value 
associated with the min of the max of the KS distance, and return a fitted function, 
p

## f) Display Info of the `model` module

This module implements minimal generative models of growing networks to produce synthetic datasets for comparative analysis.

In [9]:
DisplayAllDoc(model)

Imported functions defined in /Volumes/X9_Pro_B/github.com/growing-network-study/local_utils/model.py :
--------------------------------------------------------------------------------
Function: BarabasiAlbertGraph()
Implementation of the Barabasi-Albert Model, 
using Compressed Sparse Row format to store nodes and edges. 
Initial graph is a complete graph with m nodes.
    Parameters:
        n : int. 
            Number of nodes in the graph.
        m : int.
            Number of new edges per new nodes.
        seed : int. (Optional, Default=None)
               Seed of the random generator
        Verbose : bool. (Optional, Default=False)
                  Display elapsed time information 
        SelfLoop : bool. (Optional, Default=False)
                   if True, nodes link to themself.
    Returns:
        nodes : array of int. 
        edges : array of int.
        Nnodes : int.
            Number of nodes
        Nedges : int.
            Number of edges
        SelfLoop : 

## g) Display Info of the `plot` module

This module provides specific yet reusable plotting functions developed for this study. These functions are currently undocumented, but will be documented in the near future.

In [10]:
DisplayAllDoc(plot)

Imported functions defined in /Volumes/X9_Pro_B/github.com/growing-network-study/local_utils/plot.py :
--------------------------------------------------------------------------------
Function: DisplayDegreeMetricsPerOrigin()
No documentation available
--------------------------------------------------------------------------------
Function: DisplayTSLGraph_Delta_1()
No documentation available
--------------------------------------------------------------------------------
Function: DisplayTSLGraph_Delta_1_2()
No documentation available
--------------------------------------------------------------------------------
Function: DisplayTSstat()
No documentation available
--------------------------------------------------------------------------------
Function: PlotTSHisto()
No documentation available
--------------------------------------------------------------------------------
Function: PlotTSoverTimeEdges()
No documentation available
---------------------------------------------------

## h) Display Info of the `report` module

This module provides specific yet reusable plotting functions developed to generate report (see SM 14). These functions are currently undocumented, but will be documented in the near future.

In [12]:
DisplayAllDoc(report)

Imported functions defined in /Volumes/X9_Pro_B/github.com/growing-network-study/local_utils/report.py :
--------------------------------------------------------------------------------
Function: GetArrayFromSandBox()
No documentation available
--------------------------------------------------------------------------------
Function: GetTSMFromArrayTS()
No documentation available
--------------------------------------------------------------------------------
Function: GetTS_Exception()
No documentation available
--------------------------------------------------------------------------------
Function: ReportMain_A()
No documentation available
--------------------------------------------------------------------------------
Function: ReportMain_B()
No documentation available
--------------------------------------------------------------------------------
Function: ReportMain_C()
No documentation available
--------------------------------------------------------------------------------
F