-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
need heartbeat RFC #129
Comments
Very good idea! |
Just thinking through the clock synchronization idea: Assuming the heartbeat period is fixed and well known, the tick value (let's not call it epoch to avoid confusion with UNIX epoch) is a very low resolution, synchronized, monotonic clock. To get a high resolution clock value that is synchronized, each rank should record a CLOCK_MONOTONIC timestamp for the last event received. The high res clock value is just (tick * period) + (now - timestamp) where now is the current CLOCK_MONOTONIC value. If tick, period, timestamp were made available via RPC, a module or command running on the same rank can obtain an accurate high res clock value unaffected by the RPC round-trip time (but affected by system call latency) if it calls CLOCK_MONOTONIC locally rather than asking for it in the RPC. To avoid awkward situations when a late-joining broker first starts up and no heartbeat event has been received yet, the current heartbeat state should be obtained as part of the "hello" bootstrap protocol. In cases where the RPC latency is undesireable, one could establish a "heartbeat follower" in a module or command that subscribes to the heartbeat event and allows the clock or high res time info to be obtained without an RPC (at least after the first heartbeat event is received). Finally, all this could be wrapped in an API that can be used with a local heartbeat follower or a remote one. |
Couple of other random thoughts:
|
Of course, if the clock Is "corrected" at each heartbeat, one has to keep the result of the last query in order to maintain montonicity. For example if a heartbeat is "late", (tick + old_period) + (now - old_timestamp) might be greater than (tick + new_period) + (now - new_timetstamp). |
As noted in #128, we should add a RFC for the instance heartbeat, and consider expanding the API to include time since last heartbeat, heartbeat period, and heartbeat start time.
Since FLUIDs may make use of a synchronized clock, we should consider how one can be derived from the heartbeat, and explore its properties and constraints.
The text was updated successfully, but these errors were encountered: