Skip to content

Conversation

@rosalogia
Copy link
Contributor

This PR integrates the failure detection component introduced in #5 by @gsebil08 into the client using systems introduced in #8. It adds a failure detector as a field of the client and runs message handlers and auto-probing functions periodically as part of the client's server. Unfortunately, due to a cyclic dependency issue, it required that the failure detection code all be moved into the client module. I would love to explore solutions to this, as it makes the client module feel quite messy. The failure detector integration has not been tested yet, but the tests that do exist still pass.

* Moves failure detector module into client module to deal with cyclic dependency issue
* Adds failure detection inbox to default initialization
* Having 0 active peers no longer causes the program to crash
* Updates routers to allow non request/response messages to pass
* Adds optional argument to client init for initial peer list
* Calls failure detector functions asynchronously in server
@rosalogia rosalogia added the enhancement New feature or request label Mar 30, 2022
@rosalogia rosalogia added this to the Working Gossip Protocol milestone Mar 30, 2022
@rosalogia rosalogia requested a review from Gau-thier March 30, 2022 21:05
@Gau-thier Gau-thier force-pushed the @rosalogia/integrate-failure-detector branch 2 times, most recently from 0cecb08 to 97640ab Compare March 31, 2022 07:48
@Gau-thier Gau-thier force-pushed the @rosalogia/integrate-failure-detector branch from 97640ab to 8e54d89 Compare March 31, 2022 07:51
Copy link
Collaborator

@Gau-thier Gau-thier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not easy to review (re-organization + rework), but....
Nice job! We are really close to get something great!
I think it also solves (or at least it is linked to) the sequence_number issue #18
I took the liberty to run a esy b dune build @fmt --auto-promote to fix some useless troubles on the CI side.

let new_seq_no = next_seq_no t in
let _ = send_ping_to client peer_to_update in
match%lwt wait_ack_timeout t new_seq_no t.config.round_trip_time with
| Ok _ -> Lwt.return ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing an action here?
I think we should update the status of the Ack sender (recipient of Ping message) to Alive if it replies. Maybe this peer was Suspicious or Faulty before, but since it now replies, it is not the case anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

This is correct in the basic SWIM protocol, but it is a very heavy penalty.
When there is no ACK (direct or indirect) the peer must be set to `Suspicious`.
See section 4.2 from https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf *)
let _ = update_neighbor_status peer_of_client peer_to_update Faulty in
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong now that I think about it. Operating on the peer_of_client has no effect whatsoever.

let new_seq_no = next_seq_no t in
let _ = send_ping_to client peer_to_update in
match%lwt wait_ack_timeout t new_seq_no t.config.round_trip_time with
| Ok _ -> Lwt.return ()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

@Gau-thier Gau-thier merged commit 6a92251 into @rosalogia/routing Apr 4, 2022
rosalogia pushed a commit that referenced this pull request Apr 4, 2022
* feat(failureDetector): Integrate failure detector into client (#21)
@rosalogia rosalogia deleted the @rosalogia/integrate-failure-detector branch April 19, 2022 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants