Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add modified greedy topK centrality heuristic to autopilot #4384

Merged
merged 5 commits into from Jul 17, 2020

Conversation

bhandras
Copy link
Collaborator

This PR adds a very simple autopilot heuristic to the already existing heuristics, that simply calculates the betweenness centrality of the current graph and returns normalized node scores with the exception of the ones we already have channels with.

This method successfully approximates Maximum Betweenness Improvement (MBI) given sufficient number of new edges added but is considerably worse than estimating MBI with a real greedy algo which would add ghost edges to all nodes and select the one which would result in the largest centrality value for our node.

@bhandras bhandras force-pushed the atpl_bc_topk branch 3 times, most recently from 2026f0f to 382e3c5 Compare June 23, 2020 15:58
@Roasbeef
Copy link
Member

This method successfully approximates Maximum Betweenness Improvement (MBI) given sufficient number of new edges added but is considerably worse than estimating

Just to confirm, you say "approximates" since it'll heavily weight towards the node that gives the largest improvement, but may fail to select them, correct?

but is considerably worse than estimating MBI with a real greedy algo which would add ghost edges to all nodes and select the one which would result in the largest centrality value for our node.

Hmm, yeah implementing this may require some tweaks to the way the scoring system works atm. I think we still do want the stochastic aspect to avoid all nodes opening a channel to the exact same set of nodes (say they start with zero channels, or one channel to the same starting node). One alternative here would maybe be attempting to use the pubkey of the node driving the agent to add some jitter (by removing a sub-set of nodes?).

@Roasbeef
Copy link
Member

I think we'd also want to have a proper incremental calculation algo before we did ghost edge simulation since we'd need to re-calculate the entire graph with each iteration with the current algo.

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to leave the draft stage? Will start to use it to bootstrap some new nodes on testnet.

autopilot/agent.go Show resolved Hide resolved
@Roasbeef Roasbeef added the v0.11 label Jun 30, 2020
@Roasbeef Roasbeef added this to In progress in v0.11.0-beta via automation Jun 30, 2020
@Roasbeef Roasbeef added this to the 0.11.0 milestone Jun 30, 2020
@cfromknecht cfromknecht added v0.12 and removed v0.11 labels Jul 7, 2020
@cfromknecht cfromknecht modified the milestones: 0.11.0, 0.12.0 Jul 7, 2020
@cfromknecht cfromknecht removed this from In progress in v0.11.0-beta Jul 7, 2020
@bhandras
Copy link
Collaborator Author

bhandras commented Jul 7, 2020

This method successfully approximates Maximum Betweenness Improvement (MBI) given sufficient number of new edges added but is considerably worse than estimating

Just to confirm, you say "approximates" since it'll heavily weight towards the node that gives the largest improvement, but may fail to select them, correct?

but is considerably worse than estimating MBI with a real greedy algo which would add ghost edges to all nodes and select the one which would result in the largest centrality value for our node.

Hmm, yeah implementing this may require some tweaks to the way the scoring system works atm. I think we still do want the stochastic aspect to avoid all nodes opening a channel to the exact same set of nodes (say they start with zero channels, or one channel to the same starting node). One alternative here would maybe be attempting to use the pubkey of the node driving the agent to add some jitter (by removing a sub-set of nodes?).

This type of greedy algo approximates as for a single new connection simply selecting the node with the largest centrality may not result in the best MBI. In practice this means that this algo may need more connections to reach the best possible centrality improvement than if we were to select the actual node which would result in the max improvement for our node (which requires recalculation for each/most nodes).

The stochastic selection just adds another layer of distortion.

@bhandras bhandras marked this pull request as ready for review July 7, 2020 14:07
@bhandras bhandras requested a review from halseth as a code owner July 7, 2020 14:07
@bhandras bhandras requested review from Roasbeef and carlaKC July 7, 2020 14:07
Copy link
Collaborator

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass looks good!

autopilot/top_centrality.go Outdated Show resolved Hide resolved
autopilot/interface.go Show resolved Hide resolved
continue
}

// Skip passed nodes not in the graph.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would a peer not be in the centrality graph but be passed into this function? Happy with the check, perhaps just a comment explaining when this happens.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, comment added, ptal.

// TestTopCentralityNonEmptyGraph tests that we return the correct normalized
// centralitiy values given a non empty graph, correctly filtered down to the
// passed nodes and omitting nodes which we have channels with.
func TestTopCentralityNonEmptyGraph(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these tests be flattened into one? Since we're getting node scores and asserting length for both? The tests could just have a buildGraph function which provides the graph we want, rather than having two similar tests except for this one input.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You raised a great question. Actually they were already kind of flattened as by adding the empty node set and empty channel set we can include the other test in in this one. Made some structural changes to (hopefully) make it more readable too and extended the tests to cover connectivity from none, to full.

Copy link
Collaborator

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes to the tests 🥇 Just two nitty-nits from me, change looks good!

autopilot/centrality_testdata_test.go Outdated Show resolved Hide resolved
autopilot/top_centrality_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big ACK on this approach. I think the "non-exact" top K algorithm is good enough for now, as we do want some jitter in channel selection anyway.

We can explore a more exact greedy variant later, but I think for now the simplicity and speed of this approach is a big pro 👍


// As we don't currently support incremental graph updates, we
// don't need to cache anything.
bc, err := NewBetweennessCentralityMetric(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved to NewTopCentrality, such that only a refresh is needed when this method is called?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed construction of the metric a bit to be able to do this cleanly. PTAL


result[nodeID] = &NodeScore{
NodeID: nodeID,
Score: centrality[nodeID],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do score, ok := centrality[nodeID] above for readability and one less lookup

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Name returns the name of the heuristic.
func (g *TopCentrality) Name() string {
return "topk_centrality"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rather name the heuristic simply "centrality"? Feels more right to not have the user have to care about the underlying algorithm, and we can change to the greedy algorithm later without changing the name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. Done

@bhandras bhandras force-pushed the atpl_bc_topk branch 2 times, most recently from b7dd44c to 2172de6 Compare July 17, 2020 09:52
@bhandras bhandras requested a review from halseth July 17, 2020 09:52
@bhandras bhandras requested a review from halseth July 17, 2020 11:46
@bhandras bhandras force-pushed the atpl_bc_topk branch 3 times, most recently from 665ac18 to 1d574c1 Compare July 17, 2020 14:00
// Calculate betweenness centrality for the whole graph.
if err := bc.Refresh(graph); err != nil {
if err := g.centralityMetric.Refresh(graph); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squash with or move this change before the initial TopCentrality commit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

This commit removes an extra filter on address availability which is not
needed as the scored nodes are a already prefiltered subset of the whole
graph where address availability has already been checked.
This commit creates a new autopilot heuristic which simply returns
normalized betweenness centrality values for the current graph. This
new heuristic will make it possible to prefer nodes with large
centrality when we're trying to open channels. The heuristic is also
somewhat dumb as it doesn't try to figure out the best nodes, as that'd
require adding ghost edges to the graph recalculating the centrality as
many times as many nodes there are (minus the one we already have
channels with).
The commit also reindents the source to conform with ts=8 guideline.
Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅

@bhandras bhandras merged commit 77549f1 into lightningnetwork:master Jul 17, 2020
@Roasbeef Roasbeef modified the milestones: 0.12.0, 0.11.0 Jul 17, 2020
@bhandras bhandras deleted the atpl_bc_topk branch September 12, 2023 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants