Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Clone in Desktop Download ZIP

Loading…

A partition isolating Chronos from the ZK leader can *not* cause a crash #522

Open
aphyr opened this Issue · 4 comments

3 participants

@aphyr

Per #513, Chronos is expected to crash when a leader loses its Zookeeper connection. In this test case, Chronos detects the loss of its Zookeeper connection and, instead of crashing, sleeps quietly and reconnects when the partition heals. #513 argues that to keep running would violate unspecified correctness constraints. To preserve safety, should Chronos also crash here?

@air
Collaborator

Hi - you're referring to a statement that doesn't represent the design (it wasn't expressed carefully enough). Please disregard it and refer to the clarification in the thread. Make sense?

@air
Collaborator

To help us get a better statement on the behaviour we'll check out the logs and see if Chronos is taking both approaches here (self-terminating in some cases and retrying in others). Thanks!

@aphyr

Please disregard it and refer to the clarification in the thread. Make sense?

No, not really. If you're trying to "take a highly conservative approach, make the fewest assumptions and exit," and "avoid a class of faults by dropping all possibly-outdated state," then you should, you know, actually exit reliably, instead of crashing some but not all of the time. Choosing both failure modes is silly.

@air
Collaborator

Yep I get you - we'll check it out as I described.

@gkleiman gkleiman was assigned by air
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.