New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add modified greedy topK centrality heuristic to autopilot #4384
Conversation
2026f0f
to
382e3c5
Compare
Just to confirm, you say "approximates" since it'll heavily weight towards the node that gives the largest improvement, but may fail to select them, correct?
Hmm, yeah implementing this may require some tweaks to the way the scoring system works atm. I think we still do want the stochastic aspect to avoid all nodes opening a channel to the exact same set of nodes (say they start with zero channels, or one channel to the same starting node). One alternative here would maybe be attempting to use the pubkey of the node driving the agent to add some jitter (by removing a sub-set of nodes?). |
I think we'd also want to have a proper incremental calculation algo before we did ghost edge simulation since we'd need to re-calculate the entire graph with each iteration with the current algo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to leave the draft stage? Will start to use it to bootstrap some new nodes on testnet.
This type of greedy algo approximates as for a single new connection simply selecting the node with the largest centrality may not result in the best MBI. In practice this means that this algo may need more connections to reach the best possible centrality improvement than if we were to select the actual node which would result in the max improvement for our node (which requires recalculation for each/most nodes). The stochastic selection just adds another layer of distortion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass looks good!
autopilot/top_centrality.go
Outdated
continue | ||
} | ||
|
||
// Skip passed nodes not in the graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would a peer not be in the centrality graph but be passed into this function? Happy with the check, perhaps just a comment explaining when this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, comment added, ptal.
autopilot/top_centrality_test.go
Outdated
// TestTopCentralityNonEmptyGraph tests that we return the correct normalized | ||
// centralitiy values given a non empty graph, correctly filtered down to the | ||
// passed nodes and omitting nodes which we have channels with. | ||
func TestTopCentralityNonEmptyGraph(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these tests be flattened into one? Since we're getting node scores and asserting length for both? The tests could just have a buildGraph
function which provides the graph we want, rather than having two similar tests except for this one input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You raised a great question. Actually they were already kind of flattened as by adding the empty node set and empty channel set we can include the other test in in this one. Made some structural changes to (hopefully) make it more readable too and extended the tests to cover connectivity from none, to full.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice changes to the tests 🥇 Just two nitty-nits from me, change looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big ACK on this approach. I think the "non-exact" top K algorithm is good enough for now, as we do want some jitter in channel selection anyway.
We can explore a more exact greedy variant later, but I think for now the simplicity and speed of this approach is a big pro 👍
autopilot/top_centrality.go
Outdated
|
||
// As we don't currently support incremental graph updates, we | ||
// don't need to cache anything. | ||
bc, err := NewBetweennessCentralityMetric( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved to NewTopCentrality
, such that only a refresh
is needed when this method is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed construction of the metric a bit to be able to do this cleanly. PTAL
autopilot/top_centrality.go
Outdated
|
||
result[nodeID] = &NodeScore{ | ||
NodeID: nodeID, | ||
Score: centrality[nodeID], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do score, ok := centrality[nodeID]
above for readability and one less lookup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
autopilot/top_centrality.go
Outdated
|
||
// Name returns the name of the heuristic. | ||
func (g *TopCentrality) Name() string { | ||
return "topk_centrality" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we rather name the heuristic simply "centrality"? Feels more right to not have the user have to care about the underlying algorithm, and we can change to the greedy algorithm later without changing the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point. Done
b7dd44c
to
2172de6
Compare
665ac18
to
1d574c1
Compare
// Calculate betweenness centrality for the whole graph. | ||
if err := bc.Refresh(graph); err != nil { | ||
if err := g.centralityMetric.Refresh(graph); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Squash with or move this change before the initial TopCentrality
commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
This commit removes an extra filter on address availability which is not needed as the scored nodes are a already prefiltered subset of the whole graph where address availability has already been checked.
This commit creates a new autopilot heuristic which simply returns normalized betweenness centrality values for the current graph. This new heuristic will make it possible to prefer nodes with large centrality when we're trying to open channels. The heuristic is also somewhat dumb as it doesn't try to figure out the best nodes, as that'd require adding ghost edges to the graph recalculating the centrality as many times as many nodes there are (minus the one we already have channels with).
The commit also reindents the source to conform with ts=8 guideline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ✅
This PR adds a very simple autopilot heuristic to the already existing heuristics, that simply calculates the betweenness centrality of the current graph and returns normalized node scores with the exception of the ones we already have channels with.
This method successfully approximates Maximum Betweenness Improvement (MBI) given sufficient number of new edges added but is considerably worse than estimating MBI with a real greedy algo which would add ghost edges to all nodes and select the one which would result in the largest centrality value for our node.