Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calibration of kernel parameter inference for large number of nodes #6

Open
tillahoffmann opened this issue Jun 29, 2023 · 2 comments

Comments

@tillahoffmann
Copy link

I've experimented with inferring the power exponent $\gamma$ of a preferential attachment kernel (attachment probability is $\propto k^\gamma$ for degree $k$), as discussed in the PRL paper (preprint). Everything works as expected when the number of nodes is small (tens of nodes), but I've struggled to get consistent inference when the number of nodes is large (more than a few hundred).

If we simulate from the joint distribution of parameters $\gamma$ and data $G$ and infer the posterior, then the posterior CDF evaluated at the parameter value used to generate the data should be uniformly distributed (Cook, Gelman and Rubin, 2006). However, in experiments, I don't observe such a uniform distribution.

Would love to get your input; this may just be a coding error on my part.

Here's the setup:

  1. Sample $\gamma$ from a uniform distribution on $[0,2)$.
  2. Generate a directed tree with $n$ nodes and attachment kernel $k^\gamma$ using nx.gn_graph and convert to an undirected graph.
  3. Set up a HistorySampler and evaluate the posterior on a dense grid of $\gamma$ on the prior support by calling set_kernel and get_log_posterior repeatedly; average the posteriors over all histories as in eq. (13) of the preprint.
  4. Normalize the log posterior on the support of the prior interval numerically, integrate to get the CDF on the grid, interpolate the CDF, and evaluate the interpolated CDF at the parameter value $\gamma$ used to generate the graph; let's call that value $c$.
  5. Repeat steps 1–4 to get a large number of $c$ s.

Here's a figure of the results for 50 nodes, 200 history samples, 501 grid points for evaluating the log posterior, and 101 repeats. Panel (a) show a randomly chosen posterior with the true value as a vertical line as a sanity check; they usually look reasonable. Panel (b) shows the sorted $c$ s plotted against the uniformly increasing values we'd expect. There's a small bump around 0.3, but things look reasonable. Panel (c) shows posterior means with 1.96 posterior standard deviation error bars plotted against parameters used to generate the data; again, things look good. Panel (d) shows the data-averaged posteriors which should recover the prior (dotted horizontal line; see eq. (1) of Talts et al. (2018) for details); maybe some slight deviation but no obvious problems.

Here's the same figure for 500 nodes, 1000 history samples, and 101 repeats. I increased the number of history samples because I thought there might be more uncertainty regarding histories for large graphs; not sure whether that's actually the case. But more is more, right? The $c$ s in panel (b) diverge significantly from what we'd expect: the posterior CDF evaluated at the "true" parameter value is too large. This means that the posterior tends to underestimate the "true" parameter values. This is not particularly obvious in panel (c), but, for $\gamma\gtrsim 1.5$ one can see the departure from the diagonal (colors indicate $c$ s).

I've also tried increasing the number of grid points because more precise posteriors for larger graphs may suffer from discretization artifacts. This didn't seem to have an effect though (see below; although with number of history samples reduced to 200 because it would've taken a few hours otherwise).

Do you know what might be going on here or where I've misunderstood? Code to reproduce is here. Thanks for your time!

@tillahoffmann
Copy link
Author

tillahoffmann commented Aug 15, 2023

I've done some more digging, and it appears the problem is that inferring the history degrades as the kernel exponent grows. I've averaged the output of sampler.get_histories and evaluated the Spearman rank correlation with the known history for simulated graphs (shown in panels (a) and (b) for 100 and 748 nodes, respectively). The rank correlation is very high but degrades for $\gamma \gtrsim 1.2$, especially for the larger graphs. The bottom row shows the $z$-score of posterior samples, i.e., standardized residuals between the known kernel exponent and the posterior samples. Maybe as expected, the kernel exponent inference degrades as the history inference degrades. This is particularly obvious for the larger graphs, but one might see a hint of it for the smaller graphs if one squints.

image

Hypothesis: Inferring the histories becomes very hard when the exponent is high because the graph looks more like a star than a tree.

Edit: Yes, it looks like there's a winner-takes-it-all situation for $\gamma>1$ (cf. section IV of https://arxiv.org/abs/cond-mat/0011094) such that it is challenging to constrain histories.

@jg-you
Copy link
Collaborator

jg-you commented Aug 18, 2023

Thanks for the write-up Till.

Yes, indeed, there is a phase transition in the "reconstructability" of the history: the rank correlation is bounded above by a curve that goes to zero with growing gamma; see https://journals.aps.org/prx/abstract/10.1103/PhysRevX.9.041056 for an in-depth discussion. This can be traced back to increasingly large stars, as you suggest. The bound can be derived from graph orbits.

And corroborating your findings, a scaling analysis of the correlation between a (approximate) MAP history and the truth for different values of the exponent shows the correlation going to zero with growing network sizes for large enough gamma
image.

This seems not to matter that much for kernel learning, see for example, Table 1 of https://www.sciencedirect.com/science/article/pii/S0378437112004128 (where the exponent is inferred not too badly with completely random orderings of nodes.)
This is because even if the true history is difficult to infer, most histories roughly give the same progression of summary statistics, so getting the history wrong doesn't change the inferred kernel.

I'm not familiar with these CDF plots (panel b), can you spell out what this implies for the sampler? Is it over-dispersed in the large gamma regime?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants