Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaking Test] [sig-storage] CSI Volumes [Driver: csi-hostpath-v0] [Testpattern: Dynamic PV (default fs)] subPath should unmount if pod is gracefully deleted while kubelet is down [Disruptive][Slow] #75196

Open
kacole2 opened this Issue Mar 8, 2019 · 11 comments

Comments

@kacole2
Copy link
Member

kacole2 commented Mar 8, 2019

Which jobs are failing:
ci-kubernetes-e2e-gce-master-new-downgrade-cluster sig-release-master-upgrade#gce-master-new-downgrade-cluster

Which test(s) are failing:
[sig-storage] CSI Volumes [Driver: csi-hostpath-v0] [Testpattern: Dynamic PV (default fs)] subPath should unmount if pod is gracefully deleted while kubelet is down [Disruptive][Slow]

link to prow

Since when has it been failing:
2019-02-22 @ 20:43

Testgrid link:
sig-release-master-upgrade#gce-master-new-downgrade-cluster

Reason for failure:

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/storage/testsuites/subpath.go:313
Expected grep stdout to be empty (i.e. no mount found).
Expected
    <string>: overlay on /var/lib/kubelet/pods/6f95bde9-415c-11e9-b025-42010a8a0009/volumes/kubernetes.io~csi/pvc-6d0cf06d-415c-11e9-b025-42010a8a0009/mount type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/TR7RSLOTG32K74YLSFHOGEF7BR:/var/lib/docker/overlay2/l/4UZHXW7X6IT7U6HHVNMMW6IPU4:/var/lib/docker/overlay2/l/3OJLNS5IZVWKQQJFPKON64H6EI,upperdir=/var/lib/docker/overlay2/23e122e3c0538f7c7f7fd17baa280334cffbd86f6198538887778f1d712cb313/diff,workdir=/var/lib/docker/overlay2/23e122e3c0538f7c7f7fd17baa280334cffbd86f6198538887778f1d712cb313/work)
    overlay on /var/lib/kubelet/pods/6f95bde9-415c-11e9-b025-42010a8a0009/volumes/kubernetes.io~csi/pvc-6d0cf06d-415c-11e9-b025-42010a8a0009/mount type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/TR7RSLOTG32K74YLSFHOGEF7BR:/var/lib/docker/overlay2/l/4UZHXW7X6IT7U6HHVNMMW6IPU4:/var/lib/docker/overlay2/l/3OJLNS5IZVWKQQJFPKON64H6EI,upperdir=/var/lib/docker/overlay2/23e122e3c0538f7c7f7fd17baa280334cffbd86f6198538887778f1d712cb313/diff,workdir=/var/lib/docker/overlay2/23e122e3c0538f7c7f7fd17baa280334cffbd86f6198538887778f1d712cb313/work)
    overlay on /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/6f95bde9-415c-11e9-b025-42010a8a0009/volumes/kubernetes.io~csi/pvc-6d0cf06d-415c-11e9-b025-42010a8a0009/mount type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/TR7RSLOTG32K74YLSFHOGEF7BR:/var/lib/docker/overlay2/l/4UZHXW7X6IT7U6HHVNMMW6IPU4:/var/lib/docker/overlay2/l/3OJLNS5IZVWKQQJFPKON64H6EI,upperdir=/var/lib/docker/overlay2/23e122e3c0538f7c7f7fd17baa280334cffbd86f6198538887778f1d712cb313/diff,workdir=/var/lib/docker/overlay2/23e122e3c0538f7c7f7fd17baa280334cffbd86f6198538887778f1d712cb313/work)
    overlay on /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/6f95bde9-415c-11e9-b025-42010a8a0009/volumes/kubernetes.io~csi/pvc-6d0cf06d-415c-11e9-b025-42010a8a0009/mount type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/TR7RSLOTG32K74YLSFHOGEF7BR:/var/lib/docker/overlay2/l/4UZHXW7X6IT7U6HHVNMMW6IPU4:/var/lib/docker/overlay2/l/3OJLNS5IZVWKQQJFPKON64H6EI,upperdir=/var/lib/docker/overlay2/23e122e3c0538f7c7f7fd17baa280334cffbd86f6198538887778f1d712cb313/diff,workdir=/var/lib/docker/overlay2/23e122e3c0538f7c7f7fd17baa280334cffbd86f6198538887778f1d712cb313/work)
    
to be empty
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/storage/utils/utils.go:255

Anything else we need to know:
/milestone v1.14
/sig storage
/kind flake
/priority important-soon

@soggiest

This comment has been minimized.

Copy link
Contributor

soggiest commented Mar 8, 2019

Hello! We are in code freeze for 1.14. If this issue is release blocking, can we get a review on this issue? If this issue is not release blocking can we move it to 1.15?

@kubernetes/sig-storage-misc

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Mar 9, 2019

We have not triaged the "gracefully deleted" test case, only the "force deleted" test case. cc @cofyc @jingxu97 if this is the same issue or not

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Mar 9, 2019

@mariantalla

This comment has been minimized.

Copy link
Contributor

mariantalla commented Mar 11, 2019

Hey @msau42 / @kubernetes/sig-storage-test-failures , any updates on this?

For more info: it's also had consecutive failed runs in https://testgrid.k8s.io/sig-release-1.14-all#gce-1.14-1.13-downgrade-cluster.

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Mar 11, 2019

Will investigate today.

@cofyc

This comment has been minimized.

Copy link

cofyc commented Mar 12, 2019

/assign

@cofyc

This comment has been minimized.

Copy link

cofyc commented Mar 13, 2019

The test [sig-storage] CSI Volumes [Driver: csi-hostpath-v0] [Testpattern: Dynamic PV (default fs)] subPath should unmount if pod is gracefully deleted while kubelet is down [Disruptive][Slow] does not fail in recent runs. It probably has been fixed by PR #74652 which fixed a reconstruction issue of CSI volumes.
I will check if it starts to fail again.

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Mar 13, 2019

The downgrade test is running against a 1.13 cluster. We will need to backport the fix to 1.13 if we want to fix these flakes.

@mariantalla mariantalla added this to Under investigation (prioritized) in 1.15 CI Signal Mar 14, 2019

@mariantalla mariantalla moved this from Under investigation (prioritized) to Open PR-wait for >5 successes before "Resolved" in 1.15 CI Signal Mar 14, 2019

@mariantalla

This comment has been minimized.

Copy link
Contributor

mariantalla commented Mar 14, 2019

/remove-kind failing-test

@spiffxp

This comment has been minimized.

Copy link
Member

spiffxp commented Mar 16, 2019

Kicking this back to under investigation, flakes appear to have reduced slightly? maybe? but it's difficult to tell
https://storage.googleapis.com/k8s-gubernator/triage/index.html?text=grep%20stdout&job=gce&test=subPath
Screen Shot 2019-03-16 at 12 33 06 PM

@spiffxp spiffxp moved this from Open PR-wait for >5 successes before "Resolved" to Under investigation (prioritized) in 1.15 CI Signal Mar 16, 2019

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Mar 16, 2019

The fix is waiting for cherry pick to 1.13 in order to fix downgrade job flakes.

There are 3 different bugs open for this test case that are all open due to waiting for 1.13 cherrypick. Can we consolidate them all?

@mariantalla mariantalla moved this from Under investigation (prioritized) to Resolved flakes (observe closed for a week before "Resolved") in 1.15 CI Signal Mar 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.