Fail all replicas if the node is down or the engine is dead during the restore #566

shuo-wu · 2020-05-14T11:47:45Z

longhorn/longhorn#1270
longhorn/longhorn#1328
longhorn/longhorn#1336

…ad during the restore Longhorn longhorn#1270, longhorn#1336, longhorn#1328 Signed-off-by: Shuo Wu <shuo@rancher.com>

yasker · 2020-05-14T17:25:35Z

I am not convinced this is the right fix.

Kubernetes node status can be delayed.
Even if the auto-salvage or auto-attach is triggered, we shouldn't able to present a usable volume to the user, which is the case in [BUG] Backup restore succeeds when an instance manager engine crashes and data is NOT available in the restore volume longhorn#1336 . I don't know why it happened.
According to [BUG] DRV stuck in attaching state when restoring is interrupted by rebooting attached node longhorn#1328 (comment) , [BUG] DRV stuck in attaching state when restoring is interrupted by rebooting attached node longhorn#1328 is not related to the engine failure.

controller: Fail all replicas if the node is down or the engine is de…

b0f0c37

…ad during the restore Longhorn longhorn#1270, longhorn#1336, longhorn#1328 Signed-off-by: Shuo Wu <shuo@rancher.com>

shuo-wu closed this May 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail all replicas if the node is down or the engine is dead during the restore #566

Fail all replicas if the node is down or the engine is dead during the restore #566

shuo-wu commented May 14, 2020

yasker commented May 14, 2020 •

edited

Fail all replicas if the node is down or the engine is dead during the restore #566

Fail all replicas if the node is down or the engine is dead during the restore #566

Conversation

shuo-wu commented May 14, 2020

yasker commented May 14, 2020 • edited

yasker commented May 14, 2020 •

edited