-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If a node is offline connect() no longers opens a channel with it #5633
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #5633 +/- ##
===========================================
+ Coverage 77.2% 80.64% +3.43%
===========================================
Files 132 132
Lines 15245 15274 +29
Branches 2333 2338 +5
===========================================
+ Hits 11770 12317 +547
+ Misses 2805 2303 -502
+ Partials 670 654 -16
Continue to review full report at Codecov.
|
raiden/connection_manager.py
Outdated
available_and_online_addresses = [ | ||
address | ||
for address in available_addresses | ||
if self.raiden.transport.force_check_address_reachability(address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will effectively add all addresses into the UserAddressManager
and do 2 REST API calls for each, currently we have a few hundred of addresses, which means the connection manager will be a bottle neck.
Please, change this to track the presence of fewer addresses and to do less HTTP requests.
@ulope do we need to limit the addresses which are being tracked to begin with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, change this to track the presence of fewer addresses and to do less HTTP requests.
Thank you for your review!
The way to do this would be to move the forced presence to right before opening the channels, so where the number of channel opens is limited by the user's n_to_join
setting?
Somewhere here:
raiden/raiden/connection_manager.py
Lines 417 to 427 in f9bbbc0
join_partners = (nonfunded_partners + possible_new_partners)[:n_to_join] | |
log.debug( | |
"Spawning greenlets to join partners", | |
node=to_checksum_address(self.raiden.address), | |
num_greenlets=len(join_partners), | |
) | |
greenlets = set(gevent.spawn(self._join_partner, partner) for partner in join_partners) | |
gevent.joinall(greenlets, raise_error=True) | |
return True |
It should be before spawning the greenlets but after having limited the list of potential partners. This way if someone is offline we can still try to see if others from the list are online.
Do you think it should be done this way? If not what would your suggestion be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any particular suggestion on how to implement it, what you wrote sound correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Want to make sure we are on the same page before implementing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to limit the addresses which are being tracked to begin with?
Yeah that's a valid question.
A quick test shows that per 1k addresses with 2 users each the size seems to grow pretty linearly (sampled at 100, 1k, 5k and 20k) by:
UserAddressManager
(including theDisplayNameCache
): 7MB- Just the internal state objects: 1MB
d89da46
to
e28b97c
Compare
This PR is ready for review again. |
The failed test passes locally and seems unrelated to the PR. Anyone else seen it before? Is it a flaky test? |
raiden/connection_manager.py
Outdated
) | ||
|
||
if qty_network_channels == 0: | ||
if not self._have_online_channels_to_connect_to(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that would mean that multiple nodes can/might open channels with BOOTSTRAP_ADDR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dominik1999 Is that OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would a node want to be connected to that Address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot find other nodes if they are not part of a channel. That's why the first node used connects to this address.
Now with the filtering this problem might appear at any time, because the nodes get filtered out when offline.
I don't have a better idea a think this is fine for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then any other alternative than BOOTSTRAP_ADDR
is better. Since the probability of making use of that channel in any time is with BOOTSTRAP_ADDR
0 and with the other addresses >0.
But if you cannot find online nodes to connect with, why don't we tell the user? If there are no online nodes to connect with, there might also not be a lot of fun using Raiden
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how the connection manager is designed to work. I would say that as far as this PR is concerned it should be merged as it solves a real problem.
If you want to change the behaviour of the connection manager it would make sense to make another issue and work on it further there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then let's do that. I think we should not use the connection manager if there is no online node to connect to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how the connection manager is designed to work. I would say that as far as this PR is concerned it should be merged as it solves a real problem.
@LefterisJP Can we change that so that the second node would not connect to the bootstrap address again, but to any offline node instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@palango sure that should be easy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LefterisJP Can we change that so that the second node would not connect to the bootstrap address again, but to any offline node instead?
Done. Check the last 2 commits to this PR.
Unfortunately this logic change made the test a bit more complicated as the test was supposed to test that nobody connects to an offline node. But I think the adjusted test now works. 😉
Is there anything missing here? Do you need any help? |
Handles the connection manager case where the node that bootstrapped the network goes itself offline. This way the nodes that want to subsequently connect to this token network need to also create a bootstrap connection.
When the connect endpoint is called a second time and there are potential offline channels to connect to, utilize them instead of making a useless channel with the bootstrap address
The test for not connecting to an offline node via the connection manager needed some adjustments. The second node to ever use the connection manager after the first node went offline will now also connect to an offline node. So the test needed to adjust for that.
fa9579e
to
9aa0d6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for pushing this through.
@czepluch Will this need adjustments in the scenarios? |
No, it should be fine since we don't try to use join while nodes are offline. We could add a test for it in the scenarios, but this shouldn't break anything at least. |
Description
Fixes: #5583
PR review check list
Quality check list that cannot be automatically verified.