Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate replication manager with networking stack #387

Merged
merged 126 commits into from
Jun 19, 2023

Conversation

adzialocha
Copy link
Member

@adzialocha adzialocha commented May 24, 2023

This PR introduces a new replication service which represents the replication protocol layer on top of the networking stack. It uses the service bus to communicate with the networking service, gets informed about established and closed connection with peers and new incoming replication messages.

The replication service keeps track of all currently available peers, can initiate new replication sessions with a simple scheduler logic and maintains the sync manager which is the state machine dealing with the actual replication protocol logic.

Closes: #373 and #396

📋 Checklist

  • Add tests that cover your changes
  • Add this PR to the Unreleased section in CHANGELOG.md
  • Link this PR to any issues it closes
  • New files contain a SPDX license header

@adzialocha adzialocha changed the base branch from main to development May 24, 2023 10:31
@adzialocha adzialocha changed the title Integrate replication manager to behaviour Integrate replication manager with networking stack May 24, 2023
@sandreae sandreae linked an issue May 25, 2023 that may be closed by this pull request
@adzialocha adzialocha requested a review from sandreae June 16, 2023 19:18
Copy link
Member

@sandreae sandreae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We finally made it 🤣 !!! Thanks for these final changes, it's really clear what's going on now and I'm observing very consistent behaviour when testing locally. Much better experience observing node behaviour now that the reaction to disconnects is immediate.

This final approach to peers/connections should make it easy to transition to scheduling dialing peers from a behaviour, rather than in the sync manager.

I believe I'm still experiencing one issue which occurs when new schema are being materialized. It looks like a materialization task crashes and so the schema provider is never updated, if i restart the node, the provider is updated and replication can continue. Logs are here: https://laub.liebechaos.org/WQ3SzzmtSPyfESfLfNEgFA don't know if this is a concern of the current PR though, I can open an issue once I understand the problem better.

I renamed Naive strategy to LogHeight. It's not perfect, but seems mean to call the current setup naive 😄

Would be good to figure out the cause of the issue (above) but apart from that I'm very happy with this PR now.

CHANGELOG.md Show resolved Hide resolved
aquadoggo/src/network/peers/behaviour.rs Show resolved Hide resolved
aquadoggo/src/network/peers/behaviour.rs Show resolved Hide resolved
aquadoggo/src/network/peers/handler.rs Show resolved Hide resolved
}
}

fn connection_keep_alive(&self) -> KeepAlive {
self.keep_alive
if self.critical_error {
return KeepAlive::No;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check my understanding, right now we still expect this to trigger the connection to be gracefully closed as no other behaviours will keep it open. Right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When all behaviours flip KeepAlive to false then the connection gets removed (time out)

aquadoggo/src/network/mod.rs Show resolved Hide resolved
aquadoggo/src/network/peers/peer.rs Show resolved Hide resolved
aquadoggo/src/network/service.rs Show resolved Hide resolved
@adzialocha
Copy link
Member Author

I believe I'm still experiencing one issue which occurs when new schema are being materialized. It looks like a materialization task crashes and so the schema provider is never updated, if i restart the node, the provider is updated and replication can continue. Logs are here: https://laub.liebechaos.org/WQ3SzzmtSPyfESfLfNEgFA don't know if this is a concern of the current PR though, I can open an issue once I understand the problem better.

Yes, I observed the same! It's caused by entries arriving in unexpected order (or at least some race condition). Also think it's unrelated to the networking stack

@adzialocha adzialocha merged commit c45b96a into development Jun 19, 2023
7 of 8 checks passed
@sandreae
Copy link
Member

Wahoo!! MERGED!

@adzialocha
Copy link
Member Author

Wahoo!! MERGED!

Juhuuu!!!

@adzialocha adzialocha deleted the integrate-replication-manager-to-behaviour branch June 19, 2023 07:56
adzialocha added a commit that referenced this pull request Jul 14, 2023
* development: (23 commits)
  Implement `dialer` behaviour (#444)
  Sort expected results in strategy tests
  Update CHANGELOG
  Replicate operations in topo order (#442)
  Maintain sorted operation indexes (#438)
  Use fork of `asynchronous-codec`  (#440)
  Ingest check for duplicate entries (#439)
  Reverse lookup for pinned relations in dependency task (#434)
  Remove unnecessary exact version pinning in Cargo.toml
  Make `TaskInput` an enum and other minor clean ups in materialiser (#429)
  Use `libp2p` `v0.52.0` (#425)
  Fix race condition when check for existing view ids was too early (#420)
  Reduce logging verbosity
  CI: Temporary workaround for Rust compiler bug (#417)
  Fix early document view insertion (#413)
  Handle duplicate document view insertions (#410)
  Decouple p2panda's authentication data types from libp2p's (#408)
  Remove dead_code attribute in lib
  Integrate replication manager with networking stack (#387)
  Implement naive replication protocol (#380)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment