If a node is offline connect() no longers opens a channel with it #5633

LefterisJP · 2020-01-10T21:12:15Z

Description

Fixes: #5583

PR review check list

Quality check list that cannot be automatically verified.

Safety
- The changes respect the necessary conditions for safety (https://raiden-network-specification.readthedocs.io/en/latest/smart_contracts.html#protocol-values-constraints)
Code quality
- Error conditions are handled
- Exceptions are propagated to the correct parent greenlet
- Exceptions are correctly classified as recoverable or unrecoverable
Compatibility
- State changes are forward compatible
- Transport messages are backwards and forward compatible
Commits
- Have good messages
- Squashed unecessary commits
If it's a bug fix:
- Regression test for the bug
  - Properly covers the bug
  - If an integration test is used, it could not be written as a unit test
Documentation
- A new CLI flag was introduced, is there documentation that explains usage?
Specs
- If this is a protocol change, are the specs updated accordingly? If so, please link PR from the specs repo.
Is it a user facing feature/bug fix?
- Changelog entry has been added

codecov · 2020-01-10T21:18:37Z

Codecov Report

Merging #5633 into develop will increase coverage by 3.43%.
The diff coverage is 94.73%.

@@             Coverage Diff             @@
##           develop    #5633      +/-   ##
===========================================
+ Coverage     77.2%   80.64%   +3.43%     
===========================================
  Files          132      132              
  Lines        15245    15274      +29     
  Branches      2333     2338       +5     
===========================================
+ Hits         11770    12317     +547     
+ Misses        2805     2303     -502     
+ Partials       670      654      -16

Flag	Coverage Δ
#integration	`74.78% <94.73%> (+6.25%)`	⬆️
#unit	`55.58% <15.78%> (-0.08%)`	⬇️

Impacted Files	Coverage Δ
raiden/network/transport/matrix/transport.py	`85.93% <100%> (+0.47%)`	⬆️
raiden/connection_manager.py	`76.88% <92.59%> (+37.24%)`	⬆️
raiden/network/proxies/secret_registry.py	`74.03% <0%> (-1.93%)`	⬇️
raiden/routing.py	`83.89% <0%> (-1.7%)`	⬇️
raiden/network/rpc/client.py	`73.91% <0%> (-0.32%)`	⬇️
raiden/waiting.py	`86.55% <0%> (ø)`	⬆️
raiden/transfer/channel.py	`89.42% <0%> (+0.75%)`	⬆️
raiden/transfer/state_change.py	`97.9% <0%> (+0.83%)`	⬆️
raiden/network/proxies/proxy_manager.py	`78.57% <0%> (+0.89%)`	⬆️
... and 29 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea3d8a5...9aa0d6d. Read the comment docs.

hackaugusto · 2020-01-13T09:51:27Z

raiden/connection_manager.py

+        available_and_online_addresses = [
+            address
+            for address in available_addresses
+            if self.raiden.transport.force_check_address_reachability(address)


This will effectively add all addresses into the UserAddressManager and do 2 REST API calls for each, currently we have a few hundred of addresses, which means the connection manager will be a bottle neck.

Please, change this to track the presence of fewer addresses and to do less HTTP requests.

@ulope do we need to limit the addresses which are being tracked to begin with?

Please, change this to track the presence of fewer addresses and to do less HTTP requests.

Thank you for your review!

The way to do this would be to move the forced presence to right before opening the channels, so where the number of channel opens is limited by the user's n_to_join setting?

Somewhere here:

raiden/raiden/connection_manager.py

Lines 417 to 427 in f9bbbc0

join_partners = (nonfunded_partners + possible_new_partners)[:n_to_join]

log.debug(

"Spawning greenlets to join partners",

node=to_checksum_address(self.raiden.address),

num_greenlets=len(join_partners),

)

greenlets = set(gevent.spawn(self._join_partner, partner) for partner in join_partners)

gevent.joinall(greenlets, raise_error=True)

return True

It should be before spawning the greenlets but after having limited the list of potential partners. This way if someone is offline we can still try to see if others from the list are online.

Do you think it should be done this way? If not what would your suggestion be?

I don't have any particular suggestion on how to implement it, what you wrote sound correct.

Thanks! Want to make sure we are on the same page before implementing anything.

do we need to limit the addresses which are being tracked to begin with?

Yeah that's a valid question.

A quick test shows that per 1k addresses with 2 users each the size seems to grow pretty linearly (sampled at 100, 1k, 5k and 20k) by:

UserAddressManager (including the DisplayNameCache): 7MB

Just the internal state objects: 1MB

raiden/network/transport/matrix/transport.py

LefterisJP · 2020-01-14T14:13:27Z

This PR is ready for review again.

raiden/tests/integration/long_running/test_token_networks.py

LefterisJP · 2020-01-15T18:15:41Z

The failed test passes locally and seems unrelated to the PR. Anyone else seen it before? Is it a flaky test?

palango · 2020-01-17T09:57:45Z

raiden/connection_manager.py

-            )
-
-            if qty_network_channels == 0:
+            if not self._have_online_channels_to_connect_to():


So that would mean that multiple nodes can/might open channels with BOOTSTRAP_ADDR

@Dominik1999 Is that OK?

Why would a node want to be connected to that Address?

We cannot find other nodes if they are not part of a channel. That's why the first node used connects to this address.

Now with the filtering this problem might appear at any time, because the nodes get filtered out when offline.

I don't have a better idea a think this is fine for the time being.

then any other alternative than BOOTSTRAP_ADDR is better. Since the probability of making use of that channel in any time is with BOOTSTRAP_ADDR 0 and with the other addresses >0.

But if you cannot find online nodes to connect with, why don't we tell the user? If there are no online nodes to connect with, there might also not be a lot of fun using Raiden

This is how the connection manager is designed to work. I would say that as far as this PR is concerned it should be merged as it solves a real problem.

If you want to change the behaviour of the connection manager it would make sense to make another issue and work on it further there.

ok, then let's do that. I think we should not use the connection manager if there is no online node to connect to

This is how the connection manager is designed to work. I would say that as far as this PR is concerned it should be merged as it solves a real problem.

@LefterisJP Can we change that so that the second node would not connect to the bootstrap address again, but to any offline node instead?

@palango sure that should be easy

@LefterisJP Can we change that so that the second node would not connect to the bootstrap address again, but to any offline node instead?

Done. Check the last 2 commits to this PR.

Unfortunately this logic change made the test a bit more complicated as the test was supposed to test that nobody connects to an offline node. But I think the adjusted test now works. 😉

Dominik1999 · 2020-01-21T17:22:01Z

Is there anything missing here? Do you need any help?

Fix raiden-network#5583

Handles the connection manager case where the node that bootstrapped the network goes itself offline. This way the nodes that want to subsequently connect to this token network need to also create a bootstrap connection.

When the connect endpoint is called a second time and there are potential offline channels to connect to, utilize them instead of making a useless channel with the bootstrap address

The test for not connecting to an offline node via the connection manager needed some adjustments. The second node to ever use the connection manager after the first node went offline will now also connect to an offline node. So the test needed to adjust for that.

palango

Looks good, thanks for pushing this through.

stale review

palango · 2020-01-27T10:09:46Z

@czepluch Will this need adjustments in the scenarios?

czepluch · 2020-01-27T10:42:09Z

No, it should be fine since we don't try to use join while nodes are offline. We could add a test for it in the scenarios, but this shouldn't break anything at least.

auto-assign bot requested a review from hackaugusto January 10, 2020 21:12

Dominik1999 added this to Backlog in Raiden Berlin Sprint Jan 13, 2020

hackaugusto previously requested changes Jan 13, 2020

View reviewed changes

Raiden Berlin Sprint automation moved this from Backlog to Review in progress Jan 13, 2020

LefterisJP force-pushed the workon_5583 branch from d89da46 to e28b97c Compare January 14, 2020 12:20

rakanalh assigned hackaugusto and LefterisJP Jan 14, 2020

palango reviewed Jan 15, 2020

View reviewed changes

palango reviewed Jan 17, 2020

View reviewed changes

LefterisJP added 9 commits January 22, 2020 09:28

Test that if a node is offline connect() wont open a channel with it

ca4c461

If a node is offline connect() no longers opens a channel with it

63ad996

Fix raiden-network#5583

Add changelog entry for 5583

01c1573

Connection manager: Check partner presence only for join addresses

6ed5824

Move cache warming of addresses to get_user_ids_for_address

8ddbb72

connect_does_not_open_channels_with_offline_nodes pr review

8ee5336

Handle the case where bootstrap connector goes offline

0dd2812

Handles the connection manager case where the node that bootstrapped the network goes itself offline. This way the nodes that want to subsequently connect to this token network need to also create a bootstrap connection.

If connect is called again,connect to offline node and not bootstrap

a962317

When the connect endpoint is called a second time and there are potential offline channels to connect to, utilize them instead of making a useless channel with the bootstrap address

LefterisJP force-pushed the workon_5583 branch from fa9579e to 9aa0d6d Compare January 22, 2020 09:57

palango approved these changes Jan 27, 2020

View reviewed changes

palango merged commit 871c21d into raiden-network:develop Jan 27, 2020

Raiden Berlin Sprint automation moved this from Review in progress to Done Jan 27, 2020

LefterisJP deleted the workon_5583 branch January 27, 2020 12:57

palango mentioned this pull request Mar 17, 2020

Always try to create room on healthchcheck #5992

Closed

hackaugusto mentioned this pull request Mar 25, 2020

Only open channels with nodes that we can communicate #1142

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If a node is offline connect() no longers opens a channel with it #5633

If a node is offline connect() no longers opens a channel with it #5633

LefterisJP commented Jan 10, 2020

codecov bot commented Jan 10, 2020 •

edited

hackaugusto Jan 13, 2020

LefterisJP Jan 13, 2020

hackaugusto Jan 14, 2020

LefterisJP Jan 14, 2020

LefterisJP Jan 14, 2020

ulope Jan 17, 2020

LefterisJP commented Jan 14, 2020

LefterisJP commented Jan 15, 2020

palango Jan 17, 2020

palango Jan 17, 2020

Dominik1999 Jan 17, 2020

palango Jan 17, 2020

Dominik1999 Jan 17, 2020

LefterisJP Jan 17, 2020 •

edited

Dominik1999 Jan 20, 2020

palango Jan 21, 2020

LefterisJP Jan 21, 2020

LefterisJP Jan 22, 2020

Dominik1999 commented Jan 21, 2020

palango left a comment

palango commented Jan 27, 2020

czepluch commented Jan 27, 2020

	join_partners = (nonfunded_partners + possible_new_partners)[:n_to_join]

	log.debug(
	"Spawning greenlets to join partners",
	node=to_checksum_address(self.raiden.address),
	num_greenlets=len(join_partners),
	)

	greenlets = set(gevent.spawn(self._join_partner, partner) for partner in join_partners)
	gevent.joinall(greenlets, raise_error=True)
	return True

If a node is offline connect() no longers opens a channel with it #5633

If a node is offline connect() no longers opens a channel with it #5633

Conversation

LefterisJP commented Jan 10, 2020

Description

PR review check list

codecov bot commented Jan 10, 2020 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LefterisJP commented Jan 14, 2020

LefterisJP commented Jan 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LefterisJP Jan 17, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dominik1999 commented Jan 21, 2020

palango left a comment

Choose a reason for hiding this comment

palango commented Jan 27, 2020

czepluch commented Jan 27, 2020

codecov bot commented Jan 10, 2020 •

edited

LefterisJP Jan 17, 2020 •

edited