-
Notifications
You must be signed in to change notification settings - Fork 3
Integrate failure detector into client #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate failure detector into client #21
Conversation
* Moves failure detector module into client module to deal with cyclic dependency issue * Adds failure detection inbox to default initialization * Having 0 active peers no longer causes the program to crash * Updates routers to allow non request/response messages to pass * Adds optional argument to client init for initial peer list * Calls failure detector functions asynchronously in server
0cecb08 to
97640ab
Compare
97640ab to
8e54d89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was not easy to review (re-organization + rework), but....
Nice job! We are really close to get something great!
I think it also solves (or at least it is linked to) the sequence_number issue #18
I took the liberty to run a esy b dune build @fmt --auto-promote to fix some useless troubles on the CI side.
| let new_seq_no = next_seq_no t in | ||
| let _ = send_ping_to client peer_to_update in | ||
| match%lwt wait_ack_timeout t new_seq_no t.config.round_trip_time with | ||
| | Ok _ -> Lwt.return () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing an action here?
I think we should update the status of the Ack sender (recipient of Ping message) to Alive if it replies. Maybe this peer was Suspicious or Faulty before, but since it now replies, it is not the case anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right
| This is correct in the basic SWIM protocol, but it is a very heavy penalty. | ||
| When there is no ACK (direct or indirect) the peer must be set to `Suspicious`. | ||
| See section 4.2 from https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf *) | ||
| let _ = update_neighbor_status peer_of_client peer_to_update Faulty in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong now that I think about it. Operating on the peer_of_client has no effect whatsoever.
| let new_seq_no = next_seq_no t in | ||
| let _ = send_ping_to client peer_to_update in | ||
| match%lwt wait_ack_timeout t new_seq_no t.config.round_trip_time with | ||
| | Ok _ -> Lwt.return () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right
* feat(failureDetector): Integrate failure detector into client (#21)
This PR integrates the failure detection component introduced in #5 by @gsebil08 into the client using systems introduced in #8. It adds a failure detector as a field of the client and runs message handlers and auto-probing functions periodically as part of the client's server. Unfortunately, due to a cyclic dependency issue, it required that the failure detection code all be moved into the client module. I would love to explore solutions to this, as it makes the client module feel quite messy. The failure detector integration has not been tested yet, but the tests that do exist still pass.