Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI E2E: retry csi-pod creation #68822

Merged
merged 1 commit into from
Oct 11, 2018
Merged

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Sep 19, 2018

Normally the pod would get created via a DaemonSet controller, but
during testing it is easier to create it directly. We just need to
ignore errors (like 'No API token found for service account
"csi-service-account"') and retry for a while. If the error persists,
the error will still abort and report it eventually.

What this PR does / why we need it:

The CSI E2E can fail randomly due to a race condition.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes #68776

Release note:

NONE

/sig storage

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/storage Categorizes an issue or PR as relevant to SIG Storage. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 19, 2018
@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Sep 19, 2018
var err error
ret, err = podClient.Create(pod)
return err
}).ShouldNot(HaveOccurred(), "Failed to create %q pod: %v", pod.GetName())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra %v, needs err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to remove the "%v". Gomega itself will dump the error. But well-spotted, thanks.

Further testing also showed that the default Expect() timeouts were too short. I've bumped that to 1m in the revised commit.

@nikhita
Copy link
Member

nikhita commented Sep 19, 2018

/kind bug
/area test
/ok-to-test

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. area/test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 19, 2018
@pohly pohly force-pushed the csi-pod-creation branch 2 times, most recently from 0991e4f to b30f556 Compare September 19, 2018 14:58
@pohly
Copy link
Contributor Author

pohly commented Sep 19, 2018

/retest

// We could use a DaemonSet, but then the name of the csi-pod changes
// during each test run. It's simpler to just try for a while here.
var ret *v1.Pod
Eventually(func() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth adding a CreatePodWithRetry utility function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that may be useful. I'll do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I ended up adding CreateEventually to pod.go's PodClient, which looked like the right place because it already had a Create (without retry).

@pohly pohly changed the title CSI E2E: retry csi-pod creation WIP: CSI E2E: retry csi-pod creation Sep 20, 2018
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 20, 2018
Normally the pod would get created via a DaemonSet controller, but
during testing it is easier to create it directly. We just need to
ignore errors (like 'No API token found for service account
"csi-service-account"') and retry for a while. If the error persists,
the error will still abort and report it eventually.

This problem also occurs elsewhere, so an utility function in the
framework for it seems justified.

Fixes: kubernetes#68776
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 21, 2018
@pohly pohly changed the title WIP: CSI E2E: retry csi-pod creation CSI E2E: retry csi-pod creation Sep 21, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 21, 2018
@pohly
Copy link
Contributor Author

pohly commented Sep 21, 2018

/retest

@msau42
Copy link
Member

msau42 commented Sep 24, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 24, 2018
@pohly
Copy link
Contributor Author

pohly commented Sep 27, 2018

/assign saad-ali

@saad-ali
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2018
@k8s-ci-robot k8s-ci-robot merged commit 632df63 into kubernetes:master Oct 11, 2018
avalluri pushed a commit to intel/oim that referenced this pull request Jan 30, 2019
As discussed
upstream (kubernetes/kubernetes#67882),
the 'No API token found for service account "csi-service-account"'
error is normal and must be handled by trying to create the pod
multiple times, either manually or via a DaemonSet. Here we simply
loop with Gomega (same fix that is also proposed upstream in
kubernetes/kubernetes#68822).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

E2E CSI storage: deploy CSI pod via DaemonSet
5 participants