Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If a node is offline connect() no longers opens a channel with it #5633

Merged
merged 9 commits into from Jan 27, 2020

Conversation

LefterisJP
Copy link
Contributor

Description

Fixes: #5583

PR review check list

Quality check list that cannot be automatically verified.

  • Safety
  • Code quality
    • Error conditions are handled
    • Exceptions are propagated to the correct parent greenlet
    • Exceptions are correctly classified as recoverable or unrecoverable
  • Compatibility
    • State changes are forward compatible
    • Transport messages are backwards and forward compatible
  • Commits
    • Have good messages
    • Squashed unecessary commits
  • If it's a bug fix:
    • Regression test for the bug
      • Properly covers the bug
      • If an integration test is used, it could not be written as a unit test
  • Documentation
    • A new CLI flag was introduced, is there documentation that explains usage?
  • Specs
    • If this is a protocol change, are the specs updated accordingly? If so, please link PR from the specs repo.
  • Is it a user facing feature/bug fix?
    • Changelog entry has been added

@codecov
Copy link

codecov bot commented Jan 10, 2020

Codecov Report

Merging #5633 into develop will increase coverage by 3.43%.
The diff coverage is 94.73%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5633      +/-   ##
===========================================
+ Coverage     77.2%   80.64%   +3.43%     
===========================================
  Files          132      132              
  Lines        15245    15274      +29     
  Branches      2333     2338       +5     
===========================================
+ Hits         11770    12317     +547     
+ Misses        2805     2303     -502     
+ Partials       670      654      -16
Flag Coverage Δ
#integration 74.78% <94.73%> (+6.25%) ⬆️
#unit 55.58% <15.78%> (-0.08%) ⬇️
Impacted Files Coverage Δ
raiden/network/transport/matrix/transport.py 85.93% <100%> (+0.47%) ⬆️
raiden/connection_manager.py 76.88% <92.59%> (+37.24%) ⬆️
raiden/network/proxies/secret_registry.py 74.03% <0%> (-1.93%) ⬇️
raiden/routing.py 83.89% <0%> (-1.7%) ⬇️
raiden/network/rpc/client.py 73.91% <0%> (-0.32%) ⬇️
raiden/waiting.py 86.55% <0%> (ø) ⬆️
raiden/transfer/channel.py 89.42% <0%> (+0.75%) ⬆️
raiden/transfer/state_change.py 97.9% <0%> (+0.83%) ⬆️
raiden/network/proxies/proxy_manager.py 78.57% <0%> (+0.89%) ⬆️
... and 29 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea3d8a5...9aa0d6d. Read the comment docs.

@Dominik1999 Dominik1999 added this to Backlog in Raiden Berlin Sprint Jan 13, 2020
available_and_online_addresses = [
address
for address in available_addresses
if self.raiden.transport.force_check_address_reachability(address)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will effectively add all addresses into the UserAddressManager and do 2 REST API calls for each, currently we have a few hundred of addresses, which means the connection manager will be a bottle neck.

Please, change this to track the presence of fewer addresses and to do less HTTP requests.

@ulope do we need to limit the addresses which are being tracked to begin with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, change this to track the presence of fewer addresses and to do less HTTP requests.

Thank you for your review!

The way to do this would be to move the forced presence to right before opening the channels, so where the number of channel opens is limited by the user's n_to_join setting?

Somewhere here:

join_partners = (nonfunded_partners + possible_new_partners)[:n_to_join]
log.debug(
"Spawning greenlets to join partners",
node=to_checksum_address(self.raiden.address),
num_greenlets=len(join_partners),
)
greenlets = set(gevent.spawn(self._join_partner, partner) for partner in join_partners)
gevent.joinall(greenlets, raise_error=True)
return True

It should be before spawning the greenlets but after having limited the list of potential partners. This way if someone is offline we can still try to see if others from the list are online.

Do you think it should be done this way? If not what would your suggestion be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any particular suggestion on how to implement it, what you wrote sound correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Want to make sure we are on the same page before implementing anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to limit the addresses which are being tracked to begin with?

Yeah that's a valid question.

A quick test shows that per 1k addresses with 2 users each the size seems to grow pretty linearly (sampled at 100, 1k, 5k and 20k) by:

  • UserAddressManager (including the DisplayNameCache): 7MB
  • Just the internal state objects: 1MB

raiden/network/transport/matrix/transport.py Show resolved Hide resolved
Raiden Berlin Sprint automation moved this from Backlog to Review in progress Jan 13, 2020
@LefterisJP
Copy link
Contributor Author

This PR is ready for review again.

@LefterisJP
Copy link
Contributor Author

The failed test passes locally and seems unrelated to the PR. Anyone else seen it before? Is it a flaky test?

)

if qty_network_channels == 0:
if not self._have_online_channels_to_connect_to():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that would mean that multiple nodes can/might open channels with BOOTSTRAP_ADDR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dominik1999 Is that OK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would a node want to be connected to that Address?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot find other nodes if they are not part of a channel. That's why the first node used connects to this address.

Now with the filtering this problem might appear at any time, because the nodes get filtered out when offline.

I don't have a better idea a think this is fine for the time being.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then any other alternative than BOOTSTRAP_ADDR is better. Since the probability of making use of that channel in any time is with BOOTSTRAP_ADDR 0 and with the other addresses >0.

But if you cannot find online nodes to connect with, why don't we tell the user? If there are no online nodes to connect with, there might also not be a lot of fun using Raiden

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the connection manager is designed to work. I would say that as far as this PR is concerned it should be merged as it solves a real problem.

If you want to change the behaviour of the connection manager it would make sense to make another issue and work on it further there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then let's do that. I think we should not use the connection manager if there is no online node to connect to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the connection manager is designed to work. I would say that as far as this PR is concerned it should be merged as it solves a real problem.

@LefterisJP Can we change that so that the second node would not connect to the bootstrap address again, but to any offline node instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palango sure that should be easy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LefterisJP Can we change that so that the second node would not connect to the bootstrap address again, but to any offline node instead?

Done. Check the last 2 commits to this PR.

Unfortunately this logic change made the test a bit more complicated as the test was supposed to test that nobody connects to an offline node. But I think the adjusted test now works. 😉

@Dominik1999
Copy link
Contributor

Is there anything missing here? Do you need any help?

Handles the connection manager case where the node that bootstrapped
the network goes itself offline.

This way the nodes that want to subsequently connect to this token
network need to also create a bootstrap connection.
When the connect endpoint is called a second time and there are
potential offline channels to connect to, utilize them instead of
making a useless channel with the bootstrap address
The test for not connecting to an offline node via the connection
manager needed some adjustments.

The second node to ever use the connection manager after the first
node went offline will now also connect to an offline node. So the
test needed to adjust for that.
Copy link
Contributor

@palango palango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for pushing this through.

@palango palango merged commit 871c21d into raiden-network:develop Jan 27, 2020
Raiden Berlin Sprint automation moved this from Review in progress to Done Jan 27, 2020
@palango
Copy link
Contributor

palango commented Jan 27, 2020

@czepluch Will this need adjustments in the scenarios?

@czepluch
Copy link
Contributor

No, it should be fine since we don't try to use join while nodes are offline. We could add a test for it in the scenarios, but this shouldn't break anything at least.

@LefterisJP LefterisJP deleted the workon_5583 branch January 27, 2020 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Avoid unnecessary fees due to channel partners being offline when using the connection manager
6 participants