Fix bug: if the snapshot is no longer in engine CR, don't block the removal process #2074

PhanLe1010 · 2023-07-11T19:39:34Z

removal process Longhorn-6298 Signed-off-by: Phan Le <phan.le@suse.com>

ejweber

I didn't realize you were already on a PR for this, so I drafted #2075. It is similar in concept (allow us to proceed to remove the finalizer if the snapshot is not in the engine). However, I was hoping to just reorder things without adding additional checks.

I'll defer to you since you know this section of code best. LGTM, but need to test it.

PhanLe1010 · 2023-07-11T20:35:54Z

@ejweber I think we still need to check with the engine process before removing the snapshot CR's finalizer. This PR will do that. What do you think?

ejweber · 2023-07-11T21:02:35Z

We discussed my "competing" PR offline and are in agreement that this is the right approach.

We don't have a simple recreate, but I can use the same iterative test I opened the issue with to check whether my cluster continues to accrue snapshots with this fix.

james-munson

After substantial discussion and explanation, this makes sense to me.

innobead

LGTM.

nit: is it possible to have an orphaned snapshot always unable to delete because it can't be deleted at a replica side somehow, so the volume keeps in an auto-attaching state (surely, ticket type snapshot should be interruptable, so it should not impact other operations ideally)?

cc @shuo-wu @derekbit

innobead · 2023-07-12T05:21:34Z

@mergify backport v1.5.x

mergify · 2023-07-12T05:22:00Z

backport v1.5.x

✅ Backports have been created

#2077 Fix bug: if the snapshot is no longer in engine CR, don't block the removal process (backport #2074) has been created for branch v1.5.x

innobead · 2023-07-12T05:24:37Z

Also, auto-cleanup-system-generated-snapshot was introduced in 1.4 as well, so we don't need to tackle this issue even thought the implementation is different than 1.5 which is using longhorn VA.

ejweber

Tested as suggested in #2074 (comment). After 15 iterations my cluster has only 50 snapshots, and each snapshot is only less than five minutes old. This is as expected.

PhanLe1010 · 2023-07-12T21:37:13Z

LGTM.

nit: is it possible to have an orphaned snapshot always unable to delete because it can't be deleted at a replica side somehow, so the volume keeps in an auto-attaching state (surely, ticket type snapshot should be interruptable, so it should not impact other operations ideally)?

cc @shuo-wu @derekbit

If the snapshot is stuck in the removed state in the replica, yes, the volume will remain attached due to the snapshot-controller attachment ticket.

If workload starts on different node, it will interrupt the snapshot AD ticket. For other operations that require attachment, they will request the same node as the snapshot AD ticket. So I think it is fine

PhanLe1010 · 2023-07-12T21:39:58Z

Also, auto-cleanup-system-generated-snapshot was introduced in 1.4 as well, so we don't need to tackle this issue even thought the implementation is different than 1.5 which is using longhorn VA.

Sorry, could you elaborate more on this?

I think this issue should happen in 1.5.x only because of the new AD mechanism

innobead · 2023-07-12T21:40:53Z

Sounds good.

innobead · 2023-07-12T22:40:47Z

Also, auto-cleanup-system-generated-snapshot was introduced in 1.4 as well, so we don't need to tackle this issue even thought the implementation is different than 1.5 which is using longhorn VA.

Sorry, could you elaborate more on this?

I think this issue should happen in 1.5.x only because of the new AD mechanism

It has been clarified. All good.

Fix bug: if the snapshot is no longer in engine CR, don't block the

429d735

removal process Longhorn-6298 Signed-off-by: Phan Le <phan.le@suse.com>

ejweber reviewed Jul 11, 2023

View reviewed changes

james-munson approved these changes Jul 11, 2023

View reviewed changes

PhanLe1010 marked this pull request as ready for review July 11, 2023 21:04

PhanLe1010 requested a review from a team as a code owner July 11, 2023 21:04

shuo-wu approved these changes Jul 12, 2023

View reviewed changes

innobead self-requested a review July 12, 2023 05:16

innobead approved these changes Jul 12, 2023

View reviewed changes

Merge branch 'master' into 6298

844edfa

ejweber approved these changes Jul 12, 2023

View reviewed changes

ejweber assigned PhanLe1010 and unassigned PhanLe1010 Jul 12, 2023

ejweber mentioned this pull request Jul 12, 2023

[BUG] Race leaves snapshot CRs that cannot be deleted longhorn/longhorn#6298

Closed

PhanLe1010 merged commit a5b67e5 into longhorn:master Jul 12, 2023
6 checks passed

mergify bot mentioned this pull request Jul 12, 2023

Fix bug: if the snapshot is no longer in engine CR, don't block the removal process (backport #2074) #2077

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug: if the snapshot is no longer in engine CR, don't block the removal process #2074

Fix bug: if the snapshot is no longer in engine CR, don't block the removal process #2074

PhanLe1010 commented Jul 11, 2023

ejweber left a comment

PhanLe1010 commented Jul 11, 2023

ejweber commented Jul 11, 2023

james-munson left a comment

innobead left a comment •

edited

innobead commented Jul 12, 2023

mergify bot commented Jul 12, 2023 •

edited

innobead commented Jul 12, 2023 •

edited

ejweber left a comment

PhanLe1010 commented Jul 12, 2023

PhanLe1010 commented Jul 12, 2023

innobead commented Jul 12, 2023

innobead commented Jul 12, 2023

Fix bug: if the snapshot is no longer in engine CR, don't block the removal process #2074

Fix bug: if the snapshot is no longer in engine CR, don't block the removal process #2074

Conversation

PhanLe1010 commented Jul 11, 2023

ejweber left a comment

Choose a reason for hiding this comment

PhanLe1010 commented Jul 11, 2023

ejweber commented Jul 11, 2023

james-munson left a comment

Choose a reason for hiding this comment

innobead left a comment • edited

Choose a reason for hiding this comment

innobead commented Jul 12, 2023

mergify bot commented Jul 12, 2023 • edited

✅ Backports have been created

innobead commented Jul 12, 2023 • edited

ejweber left a comment

Choose a reason for hiding this comment

PhanLe1010 commented Jul 12, 2023

PhanLe1010 commented Jul 12, 2023

innobead commented Jul 12, 2023

innobead commented Jul 12, 2023

innobead left a comment •

edited

mergify bot commented Jul 12, 2023 •

edited

innobead commented Jul 12, 2023 •

edited