-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calibration of kernel parameter inference for large number of nodes #6
Comments
I've done some more digging, and it appears the problem is that inferring the history degrades as the kernel exponent grows. I've averaged the output of Hypothesis: Inferring the histories becomes very hard when the exponent is high because the graph looks more like a star than a tree. Edit: Yes, it looks like there's a winner-takes-it-all situation for |
Thanks for the write-up Till. Yes, indeed, there is a phase transition in the "reconstructability" of the history: the rank correlation is bounded above by a curve that goes to zero with growing gamma; see https://journals.aps.org/prx/abstract/10.1103/PhysRevX.9.041056 for an in-depth discussion. This can be traced back to increasingly large stars, as you suggest. The bound can be derived from graph orbits. And corroborating your findings, a scaling analysis of the correlation between a (approximate) MAP history and the truth for different values of the exponent shows the correlation going to zero with growing network sizes for large enough gamma This seems not to matter that much for kernel learning, see for example, Table 1 of https://www.sciencedirect.com/science/article/pii/S0378437112004128 (where the exponent is inferred not too badly with completely random orderings of nodes.) I'm not familiar with these CDF plots (panel b), can you spell out what this implies for the sampler? Is it over-dispersed in the large gamma regime? |
I've experimented with inferring the power exponent$\gamma$ of a preferential attachment kernel (attachment probability is $\propto k^\gamma$ for degree $k$ ), as discussed in the PRL paper (preprint). Everything works as expected when the number of nodes is small (tens of nodes), but I've struggled to get consistent inference when the number of nodes is large (more than a few hundred).
If we simulate from the joint distribution of parameters$\gamma$ and data $G$ and infer the posterior, then the posterior CDF evaluated at the parameter value used to generate the data should be uniformly distributed (Cook, Gelman and Rubin, 2006). However, in experiments, I don't observe such a uniform distribution.
Would love to get your input; this may just be a coding error on my part.
Here's the setup:
nx.gn_graph
and convert to an undirected graph.HistorySampler
and evaluate the posterior on a dense grid ofset_kernel
andget_log_posterior
repeatedly; average the posteriors over all histories as in eq. (13) of the preprint.Here's a figure of the results for 50 nodes, 200 history samples, 501 grid points for evaluating the log posterior, and 101 repeats. Panel (a) show a randomly chosen posterior with the true value as a vertical line as a sanity check; they usually look reasonable. Panel (b) shows the sorted$c$ s plotted against the uniformly increasing values we'd expect. There's a small bump around 0.3, but things look reasonable. Panel (c) shows posterior means with 1.96 posterior standard deviation error bars plotted against parameters used to generate the data; again, things look good. Panel (d) shows the data-averaged posteriors which should recover the prior (dotted horizontal line; see eq. (1) of Talts et al. (2018) for details); maybe some slight deviation but no obvious problems.
Here's the same figure for 500 nodes, 1000 history samples, and 101 repeats. I increased the number of history samples because I thought there might be more uncertainty regarding histories for large graphs; not sure whether that's actually the case. But more is more, right? The$c$ s in panel (b) diverge significantly from what we'd expect: the posterior CDF evaluated at the "true" parameter value is too large. This means that the posterior tends to underestimate the "true" parameter values. This is not particularly obvious in panel (c), but, for $\gamma\gtrsim 1.5$ one can see the departure from the diagonal (colors indicate $c$ s).
I've also tried increasing the number of grid points because more precise posteriors for larger graphs may suffer from discretization artifacts. This didn't seem to have an effect though (see below; although with number of history samples reduced to 200 because it would've taken a few hours otherwise).
Do you know what might be going on here or where I've misunderstood? Code to reproduce is here. Thanks for your time!
The text was updated successfully, but these errors were encountered: