Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

Ambiguity in Barabási–Albert possibly worth investigating #1534

Open
mlhetland opened this issue Feb 18, 2021 · 1 comment
Open

Ambiguity in Barabási–Albert possibly worth investigating #1534

mlhetland opened this issue Feb 18, 2021 · 1 comment
Labels
discussion requires in-depth discussion - participation encouraged doc related to documentation

Comments

@mlhetland
Copy link
Contributor

This is not an issue with LightGraphs in itself, but possibly worth looking at, nonetheless. A recent preprint by Stamatelatos and Efraimidis (About Weighted Random Sampling in Preferential Attachment Models) discusses an ambiguity that seems pervasive in descriptions of the Barabási–Albert model. For example, they say:

In particular, concerning the common definitions of the BA model, the expression “select node i with probability pᵢ” may be interpreted in at least 3 different ways:

  1. Each individual node i gets a link with an independent probability pᵢ.
  2. Each node i has a selection probability pᵢ in a scheme where the m nodes are selected one by one.
  3. The inclusion probability of i appearing in the collection of m nodes is pᵢ.

These interpretations are just 3 examples that are not equivalent to each other but all abide by the “probability proportional to degree” requirement imposed by the BA model.

[…]

[T]he majority of the BA algorithms do not follow the BA model strictly as imprinted in the analytical formulation of the model; most of the theoretical algorithms in the literature as well as the implementations in major open source frameworks consider the draw-by-draw scheme.

They argue that one cannot say that there is a single correct sampling scheme. Rather, in their conclusion, they say:

In this paper we have identified a subtle ambiguity in the definition of the Barabasi-Albert model by taking into consideration the field of unequal probability random sampling. We showed that the definition of the model, in particular the part where m nodes are selected based on their degrees, is open to interpretation and can lead to graphs with unexpected properties. For this reason, we proposed an extension to the definition in the form of a new parameter that dictates the exact weighted random sampling scheme.

They refer to previous publications that describe “[s]everal unequal probability sampling designs”, but they limit themselves to three (conditional Poisson, draw-by-draw and strπps).

It may not be a high priority to implement multiple sampling schemes, but to begin with it might be useful to at least document the choice being made (possibly even referring to Stamatelatos and Efraimidis for a discussion about the ambiguity in the definition), describing explicitly how the random sampling is done.

@sbromberger
Copy link
Owner

This sounds like a good approach (clarifying in the documentation which choice was made).

@sbromberger sbromberger added discussion requires in-depth discussion - participation encouraged doc related to documentation labels Feb 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
discussion requires in-depth discussion - participation encouraged doc related to documentation
Projects
None yet
Development

No branches or pull requests

2 participants