Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Extra snapshot generated when clone from a detached volume #5986

Closed
chriscchien opened this issue May 23, 2023 · 10 comments
Closed

[BUG] Extra snapshot generated when clone from a detached volume #5986

chriscchien opened this issue May 23, 2023 · 10 comments
Assignees
Labels
area/snapshot Volume snapshot (in-cluster snapshot or external backup) backport/1.4.3 kind/bug kind/regression Regression which has worked before priority/0 Must be fixed in this release (managed by PO) reproduce/always 100% reproducible severity/3 Function working but has a major issue w/ workaround
Milestone

Comments

@chriscchien
Copy link
Contributor

Describe the bug (馃悰 if you encounter this issue)

From test case test_cloning_with_detached_source_volume

Clone a volume from a detached volume will create extra 1 snapshot in source volume, make test case fail.

In v1.4.2

  • Clone from attached volume : 2 snapshots created in the source volume
  • Clone from detached volume : 1 snapshot created in the source volume

master-head

  • Clone from attached volume : 2 snapshots created in the source volume
  • Clone from detached volume : 2 snapshot created in the source volume

To Reproduce

Steps to reproduce the behavior:

  1. Dynamic provision volume1
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: source-pvc
spec:
  storageClassName: longhorn
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
  1. Clone volume1 by below manifest
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cloned-pvc
spec:
  storageClassName: longhorn
  dataSource:
    name: source-pvc
    kind: PersistentVolumeClaim
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
  1. After clone complete, volume 1 have 2 snapshots(not include volume-head), in previous version only new 1 generated.

Expected behavior

Do not generate extra snapshot as before or have test code update to match new behavior.

Log or Support bundle

N/A

Environment

  • Longhorn version: master-head
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.24.7+k3s1

Additional context

test_inc_restoration_with_multiple_rebuild_and_expansion

@chriscchien chriscchien added kind/bug reproduce/always 100% reproducible severity/3 Function working but has a major issue w/ workaround labels May 23, 2023
@innobead innobead added this to the v1.5.0 milestone May 23, 2023
@innobead innobead added kind/regression Regression which has worked before area/snapshot Volume snapshot (in-cluster snapshot or external backup) labels May 29, 2023
@innobead innobead added the priority/0 Must be fixed in this release (managed by PO) label May 29, 2023
@ejweber
Copy link
Contributor

ejweber commented May 31, 2023

When the reproduce steps are followed, the volume controller responsible for initiating the volume clone first creates a snapshot, then updates the volume's status.cloneStatus field(s) with information about the snapshot. On the first attempt to do this, the snapshot is created but the status update does not succeed. On the second attempt to do this, another snapshot is created and the status update succeeds.

https://github.com/longhorn/longhorn-manager/blob/2a83384208c88904bba67202c5d55b1d36dc6ea3/controller/volume_controller.go#L2978-L2992

time="2023-05-31T14:55:30Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-c2e5f072-1002-4f7c-b300-55b7417a0839\", UID:\"3bb6feee-3fa6-4fae-ade3-9e96b4e9edf3\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"34673568\", FieldPath:\"\"}): type: 'Normal' reason: 'Degraded' volume pvc-c2e5f072-1002-4f7c-b300-55b7417a0839 became degraded"
time="2023-05-31T14:55:30Z" level=debug msg="Created snapshot fe34b167-0d67-41f0-a435-8ca081ba09e6 with labels map[longhorn.io/for-cloning-volume:pvc-c2e5f072-1002-4f7c-b300-55b7417a0839] for volume pvc-e63d887a-0bc8-45d2-b49f-fddf843aaf56"
time="2023-05-31T14:55:30Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-c2e5f072-1002-4f7c-b300-55b7417a0839\", UID:\"3bb6feee-3fa6-4fae-ade3-9e96b4e9edf3\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"34673568\", FieldPath:\"\"}): type: 'Normal' reason: 'VolumeCloneInitiated' source volume pvc-e63d887a-0bc8-45d2-b49f-fddf843aaf56, snapshot fe34b167-0d67-41f0-a435-8ca081ba09e6"
time="2023-05-31T14:55:30Z" level=debug msg="Requeue volume due to error <nil> or Operation cannot be fulfilled on engines.longhorn.io \"pvc-c2e5f072-1002-4f7c-b300-55b7417a0839-e-12c64c7c\": the object has been modified; please apply your changes to the latest version and try again" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=eweber-v124-worker-1ae51dbb-4pngn owner=eweber-v124-worker-1ae51dbb-4pngn state=attached volume=pvc-c2e5f072-1002-4f7c-b300-55b7417a0839
time="2023-05-31T14:55:30Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-c2e5f072-1002-4f7c-b300-55b7417a0839\", UID:\"3bb6feee-3fa6-4fae-ade3-9e96b4e9edf3\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"34673568\", FieldPath:\"\"}): type: 'Normal' reason: 'Degraded' volume pvc-c2e5f072-1002-4f7c-b300-55b7417a0839 became degraded"
time="2023-05-31T14:55:30Z" level=debug msg="Created snapshot 208cf85c-86b4-4c91-af0f-e5fba1561379 with labels map[longhorn.io/for-cloning-volume:pvc-c2e5f072-1002-4f7c-b300-55b7417a0839] for volume pvc-e63d887a-0bc8-45d2-b49f-fddf843aaf56"
time="2023-05-31T14:55:30Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-c2e5f072-1002-4f7c-b300-55b7417a0839\", UID:\"3bb6feee-3fa6-4fae-ade3-9e96b4e9edf3\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"34673568\", FieldPath:\"\"}): type: 'Normal' reason: 'VolumeCloneInitiated' source volume pvc-e63d887a-0bc8-45d2-b49f-fddf843aaf56, snapshot 208cf85c-86b4-4c91-af0f-e5fba1561379"

The first attempt to update the volume status fails because the volume controller refuses to continue when it fails to update an engine first. Something must have changed on since v1.4.x that either causes the volume controller to update an engine (when it previously wouldn't have) or causes the engine controller to update an engine status simultaneously (when it previously wouldn't have).

https://github.com/longhorn/longhorn-manager/blob/2a83384208c88904bba67202c5d55b1d36dc6ea3/controller/volume_controller.go#L419-L438

This regression first appeared on May 14.

@ejweber
Copy link
Contributor

ejweber commented Jun 1, 2023

I was able to reproduce this issue in v1.4.2 as well, though it appears to happen only rarely there. It is the result of a race between the volume controller and the engine controller (as detailed above), and v1.5.0 changes seem to have made it more likely for the volume controller to lose it (though I'm not 100% sure why).

I am submitting a PR that uses calls SnapshotCreate with a deterministic UUID as the name (instead of letting longhorn-engine generate one itself). I used a deterministic UUID so we can use the same name in a subsequent reconciliation loop if we failed to update the volume or engine with the name of the snapshot we already created.

@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Jun 1, 2023

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at: [BUG] Extra snapshot generated when clone from a detached volume聽#5986 (comment).

  • Is there a workaround for the issue? If so, where is it documented?

  • Does the PR include the explanation for the fix or the feature?

  • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at Use a deterministic snapshot name when cloning from a volume聽longhorn-manager#1942.

  • Which areas/issues this PR might have potential impacts on?

  • If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?

  • If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?

  • If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?

  • If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)

  • If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?

  • If labeled: require/manual-test-plan Has the manual test plan been documented?

  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?

@weizhe0422
Copy link
Contributor

weizhe0422 commented Jun 6, 2023

Environment

  • Longhorn version: v1.5.0-rc1
    image
  • cluster info
    • K8S version: v1.25.9 +rke2r1
    • master1 + worker3
      image

Test result

Wait confirm the expected behavior with engineer

Test steps (refer to #5986 (comment))

Scenario 1: The source PVC is in detached state

  1. Create a new PVC named src-pvc
    image
  2. Create another new PVC whose source PVC is src-pvc generated from step 1 (AccessMode=RWX, Size=5Gi)
    image
    image
  3. [PASS, 20230606] Verify that the source PVC has only one snapshot
    image

Scenario 2: The source PVC is in attached state

  1. Create a new PVC named src-pvc-2
    image
  2. Attach the volume
    image
    image
  3. Create another new PVC whose source PVC is src-pvc-2 generated from step 1 (AccessMode=RWX, Size=5Gi)
    image
  4. [WAIT CONFIRM, 20230606] Verify that the source PVC has two snapshots
    • In v1.5.0-rc1, there is only one snapshots has been generated.
      image
    • In V1.4.2, there will be two snapshots are generated
      image

@weizhe0422
Copy link
Contributor

weizhe0422 commented Jun 6, 2023

Hi @ejweber, I'm verifying this issue but would like consult with you one question, please.

Scenario 2: Source PVC is in attach state in my test result, I found that the source PVC has only one snapshot in v1.5.0-rc1, but has two snapshots in V1.4.2. I'm guessing this has something with using UUIDs to prevent duplicate snapshots, but some behavior has changed from previous versions, so I wanted to check with you if the results are as expected.

cc @chriscchien

@ejweber
Copy link
Contributor

ejweber commented Jun 6, 2023

I only focused on the detached scenario (since that was how the bug was filed). Let me check. If we HAVE decreased the snapshot count legitimately, I think we will want to update the assertion in the attached automated test case.

@ejweber
Copy link
Contributor

ejweber commented Jun 6, 2023

In my 1.4.2 reproduce, I actually saw three snapshots created! All were for the same reason the fix is targeting: a new snapshot is created after a failed reconcile (however this time the failures were due to replica churn).

time="2023-06-06T13:59:21Z" level=debug msg="Created snapshot 7c9ca020-6cb4-4ff6-87e3-81ed261ec7ad with labels map[longhorn.io/for-cloning-volume:pvc-a0fc50c5-6819-49da-84c9-a2766b61e88c] for volume pvc-8693d82b-948e-4b24-860c-d5484ae7991c"
time="2023-06-06T13:59:21Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-a0fc50c5-6819-49da-84c9-a2766b61e88c\", UID:\"89ba9de9-b491-4e03-9b4d-3c318eddbaac\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"38626023\", FieldPath:\"\"}): type: 'Normal' reason: 'VolumeCloneInitiated' source volume pvc-8693d82b-948e-4b24-860c-d5484ae7991c, snapshot 7c9ca020-6cb4-4ff6-87e3-81ed261ec7ad"
time="2023-06-06T13:59:22Z" level=debug msg="Requeue volume due to error <nil> or Operation cannot be fulfilled on replicas.longhorn.io \"pvc-a0fc50c5-6819-49da-84c9-a2766b61e88c-r-6ac20495\": the object has been modified; please apply your changes to the latest version and try again" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=eweber-v124-worker-1ae51dbb-ppvzp owner= state= volume=pvc-a0fc50c5-6819-49da-84c9-a2766b61e88c
time="2023-06-06T13:59:22Z" level=debug msg="Created snapshot 5269fbd0-4ed2-4685-9779-31c82a9b764a with labels map[longhorn.io/for-cloning-volume:pvc-a0fc50c5-6819-49da-84c9-a2766b61e88c] for volume pvc-8693d82b-948e-4b24-860c-d5484ae7991c"
time="2023-06-06T13:59:22Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-a0fc50c5-6819-49da-84c9-a2766b61e88c\", UID:\"89ba9de9-b491-4e03-9b4d-3c318eddbaac\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"38626023\", FieldPath:\"\"}): type: 'Normal' reason: 'VolumeCloneInitiated' source volume pvc-8693d82b-948e-4b24-860c-d5484ae7991c, snapshot 5269fbd0-4ed2-4685-9779-31c82a9b764a"
time="2023-06-06T13:59:22Z" level=debug msg="Requeue volume due to error <nil> or Operation cannot be fulfilled on replicas.longhorn.io \"pvc-a0fc50c5-6819-49da-84c9-a2766b61e88c-r-6ac20495\": the object has been modified; please apply your changes to the latest version and try again" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=eweber-v124-worker-1ae51dbb-ppvzp owner=eweber-v124-worker-1ae51dbb-ppvzp state= volume=pvc-a0fc50c5-6819-49da-84c9-a2766b61e88c
time="2023-06-06T13:59:22Z" level=debug msg="Created snapshot 454e0b14-8daf-4309-aae8-05d1e848f6e7 with labels map[longhorn.io/for-cloning-volume:pvc-a0fc50c5-6819-49da-84c9-a2766b61e88c] for volume pvc-8693d82b-948e-4b24-860c-d5484ae7991c"
time="2023-06-06T13:59:22Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-a0fc50c5-6819-49da-84c9-a2766b61e88c\", UID:\"89ba9de9-b491-4e03-9b4d-3c318eddbaac\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"38626023\", FieldPath:\"\"}): type: 'Normal' reason: 'VolumeCloneInitiated' source volume pvc-8693d82b-948e-4b24-860c-d5484ae7991c, snapshot 454e0b14-8daf-4309-aae8-05d1e848f6e7"

After updating to master head, we are logging that the fix is working.

time="2023-06-06T14:24:15Z" level=debug msg="Created snapshot 38e2b852-80b9-59e3-bf54-2e2688d7886f with labels map[longhorn.io/for-cloning-volume:pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75] for volume pvc-3a6ec3dc-b639-4b7d-af37-7ba0ccfba659"
time="2023-06-06T14:24:15Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75\", UID:\"c18328bd-b8a8-44a2-aa44-2053c767ddad\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"38639992\", FieldPath:\"\"}): type: 'Normal' reason: 'VolumeCloneInitiated' source volume pvc-3a6ec3dc-b639-4b7d-af37-7ba0ccfba659, snapshot 38e2b852-80b9-59e3-bf54-2e2688d7886f"
time="2023-06-06T14:24:15Z" level=debug msg="Requeue volume due to error <nil> or Operation cannot be fulfilled on replicas.longhorn.io \"pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75-r-9ee4c5a7\": the object has been modified; please apply your changes to the latest version and try again" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=eweber-v124-worker-1ae51dbb-pbxr9 owner= state= volume=pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75
time="2023-06-06T14:24:15Z" level=debug msg="Snapshot 38e2b852-80b9-59e3-bf54-2e2688d7886f with labels map[longhorn.io/for-cloning-volume:pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75] for volume pvc-3a6ec3dc-b639-4b7d-af37-7ba0ccfba659 already exists"
time="2023-06-06T14:24:15Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75\", UID:\"c18328bd-b8a8-44a2-aa44-2053c767ddad\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"38639992\", FieldPath:\"\"}): type: 'Normal' reason: 'VolumeCloneInitiated' source volume pvc-3a6ec3dc-b639-4b7d-af37-7ba0ccfba659, snapshot 38e2b852-80b9-59e3-bf54-2e2688d7886f"
time="2023-06-06T14:24:15Z" level=debug msg="Requeue volume due to error <nil> or Operation cannot be fulfilled on replicas.longhorn.io \"pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75-r-9ee4c5a7\": the object has been modified; please apply your changes to the latest version and try again" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=eweber-v124-worker-1ae51dbb-pbxr9 owner=eweber-v124-worker-1ae51dbb-pbxr9 state= volume=pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75
time="2023-06-06T14:24:15Z" level=debug msg="Snapshot 38e2b852-80b9-59e3-bf54-2e2688d7886f with labels map[longhorn.io/for-cloning-volume:pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75] for volume pvc-3a6ec3dc-b639-4b7d-af37-7ba0ccfba659 already exists"
time="2023-06-06T14:24:15Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75\", UID:\"c18328bd-b8a8-44a2-aa44-2053c767ddad\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"38639992\", FieldPath:\"\"}): type: 'Normal' reason: 'VolumeCloneInitiated' source volume pvc-3a6ec3dc-b639-4b7d-af37-7ba0ccfba659, snapshot 38e2b852-80b9-59e3-bf54-2e2688d7886f"
time="2023-06-06T14:24:15Z" level=debug msg="Requeue volume due to error <nil> or Operation cannot be fulfilled on engines.longhorn.io \"pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75-e-5cb20675\": the object has been modified; please apply your changes to the latest version and try again" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=eweber-v124-worker-1ae51dbb-pbxr9 owner=eweber-v124-worker-1ae51dbb-pbxr9 state= volume=pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75
time="2023-06-06T14:24:15Z" level=debug msg="Snapshot 38e2b852-80b9-59e3-bf54-2e2688d7886f with labels map[longhorn.io/for-cloning-volume:pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75] for volume pvc-3a6ec3dc-b639-4b7d-af37-7ba0ccfba659 already exists"
time="2023-06-06T14:24:15Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75\", UID:\"c18328bd-b8a8-44a2-aa44-2053c767ddad\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"38639992\", FieldPath:\"\"}): type: 'Normal' reason: 'VolumeCloneInitiated' source volume pvc-3a6ec3dc-b639-4b7d-af37-7ba0ccfba659, snapshot 38e2b852-80b9-59e3-bf54-2e2688d7886f"
time="2023-06-06T14:24:15Z" level=debug msg="Requeue volume due to error <nil> or Operation cannot be fulfilled on engines.longhorn.io \"pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75-e-5cb20675\": the object has been modified; please apply your changes to the latest version and try again" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=eweber-v124-worker-1ae51dbb-pbxr9 owner=eweber-v124-worker-1ae51dbb-pbxr9 state= volume=pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75
time="2023-06-06T14:24:15Z" level=debug msg="Snapshot 38e2b852-80b9-59e3-bf54-2e2688d7886f with labels map[longhorn.io/for-cloning-volume:pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75] for volume pvc-3a6ec3dc-b639-4b7d-af37-7ba0ccfba659 already exists"
time="2023-06-06T14:24:15Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-5ed370c5-8821-4cd3-9ef8-f8a05f628d75\", UID:\"c18328bd-b8a8-44a2-aa44-2053c767ddad\", APIVersion:\"longhorn.io/v1beta2\", ResourceVersion:\"38639992\", FieldPath:\"\"}): type: 'Normal' reason: 'VolumeCloneInitiated' source volume pvc-3a6ec3dc-b639-4b7d-af37-7ba0ccfba659, snapshot 38e2b852-80b9-59e3-bf54-2e2688d7886f"

This looks to be working as I would expect now, so no worries. I was curious why this fix didn't cause some test case to fail with a wrong snapshot count (like the failure that opened this bug), but it looks like we only wait_for_snapshot_count() in test_cloning_with_detached_source_volume. Do you think we should do so in test_cloning_basic as well?

@ejweber
Copy link
Contributor

ejweber commented Jun 7, 2023

I was able to reproduce this issue in v1.4.2 as well, though it appears to happen only rarely there. It is the result of a race between the volume controller and the engine controller (as detailed above), and v1.5.0 changes seem to have made it more likely for the volume controller to lose it (though I'm not 100% sure why).

This fix is possibly the reason: #3692.

@weizhe0422
Copy link
Contributor

Hi @ejweber, I think I know why the e2e: test_cloning_with_detached_source_volume won't failed when checking the amount of source volume snapshots (source code).

    # Step-8
    clone_volume = client.by_id_volume(clone_volume_name)
    clone_volume.attach(hostId=lht_host_id)
    wait_for_volume_attached(client, clone_volume_name)
    clone_volume = wait_for_volume_endpoint(client, clone_volume_name)

    # Step-9
    check_volume_data(clone_volume, data)

    # Step-10
    wait_for_volume_healthy(client, clone_volume_name)

    # Step-11
    source_volume = client.by_id_volume(source_volume_name)
    source_volume.attach(hostId=lht_host_id)
    source_volume = wait_for_volume_attached(client, source_volume_name)
    wait_for_snapshot_count(source_volume, 2)

It is because the volume.snapshotList() in wait_for_snapshot_count() will list all snapshots and the volume head. Therefore, the e2e won't be failed.
image

{
    "data": [
      {
        "checksum": "",
        "children": {
          
        },
        "created": "2023-06-07T07:30:25Z",
        "labels": {
          
        },
        "name": "volume-head",
        "parent": "2227e197-7a86-5e05-89fb-41b614b0b98a",
        "removed": False,
        "size": "0",
        "usercreated": False
      },
      {
        "checksum": "",
        "children": {
          "volume-head": True
        },
        "created": "2023-06-07T07:30:25Z",
        "labels": {
          "longhorn.io/for-cloning-volume": "pvc-b33eeb51-495d-42a0-8594-4a238372d0eb"
        },
        "name": "2227e197-7a86-5e05-89fb-41b614b0b98a",
        "parent": "",
        "removed": False,
        "size": "8192",
        "usercreated": True
      }
    ],
    "resourceType": "snapshot"
  }

@weizhe0422
Copy link
Contributor

Based on the test results that this issue was fixed. Thanks @ejweber explanation again. I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/snapshot Volume snapshot (in-cluster snapshot or external backup) backport/1.4.3 kind/bug kind/regression Regression which has worked before priority/0 Must be fixed in this release (managed by PO) reproduce/always 100% reproducible severity/3 Function working but has a major issue w/ workaround
Projects
None yet
Development

No branches or pull requests

5 participants