You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A replica that is declared as bad which had previously been declared as temporarily unavailable might be reported as recovered even though no request is created. This can happen more easily when there’s a backlog of bad replicas to be processed by the Necromancer and the temporary unavailability expires during that period.
It’s also possible to reproduce this artificially:
Stop all Necromancer instances.
Declare a replica as temporarily unavailable. Wait until it is processed by Minos: the replicas state transitions from AVAILABLE to TEMPORARY_UNAVAILABLE and there’s a row in bad_replicas with the same state.
Declare the replica as lost. Wait until it is processed by Minos: the replicas state transitions from TEMPORARY_UNAVAILABLE to BAD and there’s a new row in bad_replicas with the same state.
Let the temporary unavailability expire naturally or manually update the expires_at column.
Wait until the bad replica is processed by the Minos temporary expiration daemon: the first bad_replicas row is removed and the replicas state transitions from BAD to AVAILABLE.
Restart the Necromancers. The main loop works on the replicas table, so it’s never picked up.
Wait one hour or manually change the value of update_history_threshold so that list_bad_replicas_history() and update_bad_replicas_history() are called. There, Necromancer sees that there’s a bad_replicas row with state BAD but the replicas state is AVAILABLE. Consequently, bad_replicas transitions from BAD to RECOVERED without creating a request. Checkmate.
Modification
Some discussion on how to handle such cases might be necessary.
The text was updated successfully, but these errors were encountered:
Motivation
A replica that is declared as bad which had previously been declared as temporarily unavailable might be reported as recovered even though no request is created. This can happen more easily when there’s a backlog of bad replicas to be processed by the Necromancer and the temporary unavailability expires during that period.
It’s also possible to reproduce this artificially:
replicas
state transitions fromAVAILABLE
toTEMPORARY_UNAVAILABLE
and there’s a row inbad_replicas
with the same state.replicas
state transitions fromTEMPORARY_UNAVAILABLE
toBAD
and there’s a new row inbad_replicas
with the same state.expires_at
column.bad_replicas
row is removed and thereplicas
state transitions fromBAD
toAVAILABLE
.replicas
table, so it’s never picked up.update_history_threshold
so thatlist_bad_replicas_history()
andupdate_bad_replicas_history()
are called. There, Necromancer sees that there’s abad_replicas
row with stateBAD
but thereplicas
state isAVAILABLE
. Consequently,bad_replicas
transitions fromBAD
toRECOVERED
without creating a request. Checkmate.Modification
Some discussion on how to handle such cases might be necessary.
The text was updated successfully, but these errors were encountered: