nsqd: "unhealthy mode" when diskqueue writes fail #422
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hailo reported an issue where
nsqd
could not write to a failed ephemeral disk on EC2 but the node continued to acceptPUB
.In this scenario,
nsqd
should enter an "unhealthy mode" wherePUB
s are rejected until the issue can be investigated.This feedback mechanism to publishers is crucial for them to be able to attempt delivery to another (healthy) node in a cluster or react to this condition in some other way.
Initially, I don't think it's necessary for it to attempt to "heal" itself. I imagine that in most cases you would want to restart the node anyway, which would clear this state.
cc @boyand