
Loading…
A partition isolating Chronos from the ZK leader can *not* cause a crash #522
Hi - you're referring to a statement that doesn't represent the design (it wasn't expressed carefully enough). Please disregard it and refer to the clarification in the thread. Make sense?
To help us get a better statement on the behaviour we'll check out the logs and see if Chronos is taking both approaches here (self-terminating in some cases and retrying in others). Thanks!
Please disregard it and refer to the clarification in the thread. Make sense?
No, not really. If you're trying to "take a highly conservative approach, make the fewest assumptions and exit," and "avoid a class of faults by dropping all possibly-outdated state," then you should, you know, actually exit reliably, instead of crashing some but not all of the time. Choosing both failure modes is silly.
Yep I get you - we'll check it out as I described.
Per #513, Chronos is expected to crash when a leader loses its Zookeeper connection. In this test case, Chronos detects the loss of its Zookeeper connection and, instead of crashing, sleeps quietly and reconnects when the partition heals. #513 argues that to keep running would violate unspecified correctness constraints. To preserve safety, should Chronos also crash here?