Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: support long CSI driver names #86000

Merged

Conversation

timoreimann
Copy link
Contributor

@timoreimann timoreimann commented Dec 6, 2019

What type of PR is this?

/kind bug

What this PR does / why we need it:

The storage e2e test suite uses given CSI driver names as pod names. For pod names that also get enriched by a prefix and suffix, it is very easy to exceed the 63 character limit that pod names are subject to, thereby causing tests to fail. This is already the case for moderately sized CSI driver names like dobs.csi.digitalocean.com that we employ for DigitalOcean's CSI driver: about ~15 tests were failing on us due to this problem.

This change fixes the described problem by omitting the driver name from the pod name suffix.

It also allows us to drop VolumeResource.VolType.

This change runs CSI driver names through a shortening function that reduces each domain name label of the driver name to its first letter. For all practical purposes, this should prevent pod name character length violations. (Original approach that was changed along the review.)

Special notes for your reviewer:

I discussed the matter with @msau42 a couple of days ago on Slack. We concluded that truncating or short name usage would make sense.

I did not implement either of the two directly because of the following reasons:

  • Truncating means removing/shortening either the prefix or the (randomly generated) suffix of a pod name, with the implications that pods are harder to associate to tests or could cause collisions with other pod names, respectively. Neither seemed ideal.
  • The short name is currently not part of the interface that is passed to individual tests. Unless we were willing to do some rather nasty type assertions, we would have to extend the interface used to retrieve test driver details from. This seemed like a rather invasive change.
  • Finally, the proposed solution of shortening by domain name labels is effective while still allowing to identify involved CSI drivers (with a bit of a squint).

Does this PR introduce a user-facing change?:

-->

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 6, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @timoreimann. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@timoreimann
Copy link
Contributor Author

/sig storage

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 6, 2019
@timoreimann
Copy link
Contributor Author

/assign @msau42

@msau42
Copy link
Member

msau42 commented Dec 6, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 6, 2019
@timoreimann
Copy link
Contributor Author

/retest

@@ -199,7 +199,7 @@ func CreateVolumeResource(driver TestDriver, config *PerTestConfig, pattern test
framework.Logf("Creating resource for inline volume")
if iDriver, ok := driver.(InlineVolumeTestDriver); ok {
r.VolSource = iDriver.GetVolumeSource(false, pattern.FsType, r.Volume)
r.VolType = dInfo.Name
r.VolType = shortenCSIDriverName(dInfo.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that dInfo.Name comes from the user-provided external config file, and this doesn't actually have to equal the full csi driver name, it just needs to be a descriptive name. Are you seeing otherwise?

If it doesn't actually have to match the driver name, can we just document the name length limit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that it has to be the full CSI driver name: The name is first declared as the provisioner and then fed into the StorageClass value we create.

I also just ran a quick test and selected a different name, but that makes tests hang/fail with

waiting for a volume to be created, either by external provisioner "foobar-driver" or manually created by system administrator

I think the tests need some way to reference the provisioner, and at least with the name-based approach the name has to match that.

Please let me know if I misunderstood.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks for pointing that out. I'm actually trying to search for which tests are using r.VolType, and I only found the one here:

suffix := generateSuffixForPodName(volumeType)

I'm actually wondering if we can remove this r.VolType altogether, and use a different naming scheme for that test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another occurrence in

suffix = generateSuffixForPodName(volumeType)

However, it's not just one or two tests that are affected due to the dynamic nature of how storage tests are generated. Below is an example output of our CSI tests running without the patch this PR proposes, showing 16 failures seemingly all of them failing due to the length problem:

Summarizing 16 Failures:

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should fail if subpath file is outside the volume [Slow][LinuxOnly]

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should support creating multiple subpath from same volumes [Slow]

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should support restarting containers using directory as subpath [Slow]

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should fail if subpath directory is outside the volume [Slow]

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should support file as subpath [LinuxOnly]

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should support existing single file [LinuxOnly]

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should support existing directories when readOnly specified in the volumeSource

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should support readOnly file specified in the volumeMount [LinuxOnly]

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should support restarting containers using file as subpath [Slow][LinuxOnly]

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should be able to unmount after the subpath directory is deleted

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should support non-existent path

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should support existing directory

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should fail if subpath with backstepping is outside the volume [Slow]

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should support readOnly directory specified in the volumeMount

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should fail if non-existent subpath is outside the volume [Slow][LinuxOnly]

[Fail] External Storage [Driver: dobs.csi.digitalocean.com-dev] [Testpattern: Dynamic PV (default fs)] subPath [It] should verify container cannot write to subpath readonly volumes [Slow]

Ran 27 of 5062 Specs in 394.036 seconds
FAIL! -- 11 Passed | 16 Failed | 0 Pending | 5035 Skipped

This is only to show the extent of the tests that are affected; it's not to say we possibly can't address this problem by fixing the two occurrences of the generateSuffixForPodName function call (or its implementation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tracked git history down where the volume type was first introduced for the pod name suffix: it looks like the addition of the e2e subpath tests brought it in.

@jsafrane I see you were the author back then. Maybe you have context around whether the volume type is strictly necessary to be part of the suffix name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ease the discussion, I just pushed a commit that omits the volume type from the pod name suffix. The tests still ran successfully for my CSI driver with that change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you have context around whether the volume type is strictly necessary to be part of the suffix name.

No, I just wanted something that's easy to read in e2e logs. The pod names can be virtually anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like the 2nd commit better :-) Let's also just remove r.VolType altogether?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed r.VolType and made some necessary adjustments. (Parts of subpath.go were still referencing it.)

I tried to reconcile both desires to have pod names readable in logs and long driver names not break tests by only including the actual volume type (i.e., excluding the driver name) in pod name suffices. I feel that's a good compromise between the two goals.

Please let me know what you think.

@timoreimann
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 12, 2019
Copy link
Member

@msau42 msau42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just one minor comment. Thanks for looking into this!
/approve

@@ -348,9 +350,9 @@ func (s *subPathTestSuite) DefineTests(driver TestDriver, pattern testpatterns.T
init()
defer cleanup()

if strings.HasPrefix(l.resource.VolType, "hostPath") || strings.HasPrefix(l.resource.VolType, "csi-hostpath") {
if strings.HasPrefix(driverName, "hostPath") || strings.HasPrefix(driverName, "csi-hostpath") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "csi-hostpath" check can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@msau42
Copy link
Member

msau42 commented Dec 13, 2019

Also, can you squash your commits?

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 13, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, timoreimann

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 13, 2019
The storage e2e test suite uses given CSI driver names as pod names. For
pod names that also get enriched by a prefix and suffix, it is very easy
to exceed the 63 character limit that pod names are subject to, thereby
causing tests to fail.

This change fixes the described problem by omitting the driver name from
the pod name suffix.

It also allows us to drop VolumeResource.VolType.
@timoreimann timoreimann force-pushed the e2e-support-long-csi-driver-names branch from 5d30ab4 to a70bec4 Compare December 13, 2019 21:54
@timoreimann
Copy link
Contributor Author

Commits squashed.

Thanks for the review, @msau42.

@msau42
Copy link
Member

msau42 commented Dec 13, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 13, 2019
@timoreimann
Copy link
Contributor Author

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@timoreimann
Copy link
Contributor Author

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 5ead497 into kubernetes:master Dec 15, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Dec 15, 2019
@timoreimann timoreimann deleted the e2e-support-long-csi-driver-names branch December 15, 2019 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants