Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dial all configured known relay and direct node addresses on schedule #622

Merged
merged 11 commits into from
Jun 18, 2024

Conversation

sandreae
Copy link
Member

@sandreae sandreae commented Jun 14, 2024

When a node is first launched it attempts to connect to any configured relay or direct node addresses, the steps are as follows:

  1. register at all relay/rendezvous nodes for peer discovery and connections
  2. connect to and initiate a replication sessions with all relay/rendezvous nodes
  3. connect to and initiate a replication sessions with all direct nodes

If the device where a node is running is offline when the app starts, all of these steps will fail and no further connection attempts are made. Similarly, if the node loses connectivity after it started, connections will be dropped for good.

This PR implements a modest solution for points 2) an 3) above by adding a polling service to the EventLoop which attempts to (re)connect to all known peer addresses (relay or otherwise) every x seconds. A connection is only established if none already exist to the address in question. Replication sessions will be initiated with any peers that we successfully (re)connect to.

Finding a solution in the same situation for point 1) above is a little more involved, these changes seemed like easy low hanging fruit which brings greatly improved UX to node users, especially in situations where nodes run for long periods and connections may drop.

馃搵 Checklist

  • Add tests that cover your changes
  • Add this PR to the Unreleased section in CHANGELOG.md
  • Link this PR to any issues it closes
  • New files contain a SPDX license header

@sandreae sandreae changed the title Poll known peer addresses Dial all configured known relay and direct node addresses on schedule Jun 14, 2024
@sandreae sandreae requested a review from adzialocha June 14, 2024 20:36
@sandreae sandreae marked this pull request as draft June 14, 2024 20:56
Copy link
Member

@adzialocha adzialocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

On mobile phones we have ways to find out about the connection status of a device, it could be nice to have manual methods to disconnect and connect on the Node API instead of frequent checks, that could be a more efficient approach. Pragmatically this PR is totally fine though and solves the issue.

@sandreae sandreae marked this pull request as ready for review June 18, 2024 18:06
@sandreae sandreae changed the base branch from improve-peer-addr-resolution to main June 18, 2024 18:11
@sandreae sandreae requested a review from adzialocha June 18, 2024 18:45
@sandreae sandreae merged commit 2ed3c4e into main Jun 18, 2024
8 checks passed
@sandreae sandreae deleted the poll-known-peer-addresses branch June 25, 2024 19:06
jmanm added a commit to jmanm/aquadoggo that referenced this pull request Jul 13, 2024
* Make clippy happy

* Revert "Make clippy happy"

This reverts commit e250ccd.

* Try fmt and clippy again

* Add clippy suggestions

* Allow setting path to config file via env args (p2panda#611)

* Enable passing path to config file via env args

* Remove println

* Update comment

* Remove unwanted file

* Update CHANGELOG

* Accept domain name and ip addresses for peers (p2panda#612)

* Accept String for relay and direct peer addresses in config

* Use ToSocketAddress to handle ip and domain name addresses

* Clippy

* fmt

* Update CHANGELOG

* Update example config.toml

* Prepare CHANGELOG for release

* 0.7.2

* Fix: query for child relations fails when relation list empty (p2panda#614)

* Add test get_child_document_ids test case for document with empty relation list

* Account for null values when relation lists are empty

* Update test comment

* Update CHANGELOG

* 0.7.3

* Re-run tasks for partially materialized blobs (p2panda#618)

* Check materialized blob file is complete before aborting task

* Add test

* fmt

* Update CHANGELOG

* Clippy

* Correct cmp logic

* Remove double comment

---------

Co-authored-by: adz <x12@adz.garden>

* Fix: include all logs from target schema id during replication (p2panda#620)

* Include tombstoned documents when calculating local log heights

* Clippy

* Update CHANGELOG

* Make clippy happy

* Bump rust gh action to v1 and define toolchain version

* Introduce `PeerAddress` struct for improved address resolution patterns (p2panda#621)

* Introduce PeerAddress struct with socket and multiaddr resolution methods

* Don't pop of p2p protocol from relay address as it isn't there

* fmt

* Update CHANGELOG

* Cache socket addresses

* Remove Multiaddr from PeerAddress

* Remove serde traits from PeerAddress

* Add doc string to PeerAddress

* Rename methods

* Re-apply unhandled operations during startup of materializer service (p2panda#623)

* Store method to get all un-indexed operation ids

* Pick up un-indexed operations when starting materializer service, add a test

* Add entry to CHANGELOG.md

* Increase `max_pending_connections_*` (p2panda#628)

* Increase max pending connections

* Update CHANGELOG

* Dial all configured known relay and direct node addresses on schedule (p2panda#622)

* Poll all known peer addresses

* Update PeerAddress method name

* Update CHANGELOG

* WIP: poll known peers

* Check if a direct node was identified (and add comments)

* Don't dial direct node address on startup, rely on scheduler

* More comments

* Remove unused import

* fmt

* Doc strings for EventLoop struct

* Clippy

* 0.7.4

* Minor CHANGELOG.md formatting change

* Fix: handle connection ids greater than 9 in `Peer` impl of `Human` trait (p2panda#634)

* Handle connection ids greater than 9 in peer Human impl

* Clippy

* Update CHANGELOG

* Bump `libp2p` to version `0.53.2` (p2panda#631)

* Bump libp2p to version 0.53.2

* We don't need to listen on tcp port when in relay mode

* Listening on relay circuit no longer sometimes fails

* Remove tcp feature requirement from libp2p

* Refactor connection_keep_alive method

* Clippy

* Remove unnecessary connection_keep_alive method from peers behaviour

* Add CHANGELOG.md entry

---------

Co-authored-by: adz <x12@adz.garden>

* Move relay connection logic into main event loop (p2panda#632)

* Bump `libp2p` to version `0.53.2` (p2panda#631)

* Bump libp2p to version 0.53.2

* We don't need to listen on tcp port when in relay mode

* Listening on relay circuit no longer sometimes fails

* Remove tcp feature requirement from libp2p

* Refactor connection_keep_alive method

* Clippy

* Remove unnecessary connection_keep_alive method from peers behaviour

* Add CHANGELOG.md entry

---------

Co-authored-by: adz <x12@adz.garden>

* Move network service relay initialization into main event loop

* Clippy

* Add DCUTR event debug logging to swarm

* Change log message

* Adjust connection limits

* Even nicer log messages

* Helper to print or info log depending on log level

* Listening on relay circuit no longer sometimes fails

---------

Co-authored-by: adz <x12@adz.garden>

* Support private net with pre-shared key (p2panda#635)

* Swarm listens on both TCP and QUIC addresses

* Support both QUIC and TCP protocols

* TCP port_reuse should be false

* Establish a private net over TCP when psk provided in NetworkConfig

* Initiate swarm with private net when psk provided in config

* Update CHANGELOG

* Doc string fix

* Don't need to differentiate between transports when detecting port

* Update README

* Fix README formatting

* Update example config file

* Check if blob file exists before deleting it from fs (p2panda#636)

* Check if blob file exists before deleting it from fs

* Add entry to CHANGELOG.md

* Inconsistent blob storage warning was wrongly shown (p2panda#638)

* Inconsistent blob storage warning was wrongly shown

* Add entry to CHANGELOG.md

* Minor config.toml cleanup

* Safely handle missing document when retrieving document view from store (p2panda#637)

* Return None when document was deleted

* Add entry to CHANGELOG.md

* Introduce API to subscribe to peer connection events (p2panda#625)

* Introduce API to subscribe to peer connection events

* Add entry to CHANGELOG.md

* 0.8.0

* Also bump version in aquadoggo_cli, add note about that in RELEASE.md

* Adjust level of replication session and document materialization logs (p2panda#639)

* Remove relay and direct peer poll attempt logging

* Change document creation/update/delete logging to info level

* Lower level of replication session logs to debug

* Update CHANGELOG

* Remove incorrectly commit file

* Lower logging level for replication finished message

* Fix logging logic error in reducer

* Improve GraphQL re-build error

* Update README.md

* Expose NodeEvent to public API (p2panda#643)

* Expose NodeEvent to public API

* Add entry to CHANGELOG.md

---------

Co-authored-by: adz <x12@adz.garden>
Co-authored-by: Sam Andreae <contact@samandreae.com>
Co-authored-by: adz <adzialocha@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants