New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E storage: more tests for different pod/node combinations #72002

Merged
merged 8 commits into from Feb 12, 2019

Conversation

@pohly
Copy link
Contributor

pohly commented Dec 12, 2018

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

There was no test for running multiple pods at the same time one the same node, sharing the same volume. Such a test is relevant for CSI drivers which need to implement NodeStageVolume for this kind of usage to work.

Also not covered was accessing data on one node after writing it on another.

Special notes for your reviewer:

In order to implement the new test cleanly, the existing test infrastructure and other tests get refactored first.

Does this PR introduce a user-facing change?:

NONE

/sig storage
/cc @msau42

@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Dec 12, 2018

I am not sure whether it makes sense to enable this test for all test patterns. A full matrix (all tests for all configurations) of course reduces the chance of missing some situation where the test fails, but it also increases the overall test runtime.

For now I am following the "testsuites" approach where everything is enabled for all test patterns.

@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Dec 13, 2018

/hold

Waiting for other PRs to get merged first.

@pohly pohly force-pushed the pohly:storage-volume-testsuites-concurrency branch 2 times, most recently from 0f8ce56 to e719947 Dec 17, 2018

@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Dec 19, 2018

/retest

@pohly pohly force-pushed the pohly:storage-volume-testsuites-concurrency branch 4 times, most recently from 68f9671 to 376a633 Dec 20, 2018

@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Dec 21, 2018

@msau42 I ended up using the approach where the provisioning test determines which nodes it has first, and then picks nodes by name. I think it is simpler this way and it automatically handles the case where the node selector only matches a single node.

@pohly pohly changed the title E2E storage: concurrent pods E2E storage: more tests for different pod/node combinations Dec 21, 2018

@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Dec 21, 2018

/test pull-kubernetes-e2e-kops-aws

@pohly pohly force-pushed the pohly:storage-volume-testsuites-concurrency branch from 376a633 to efd2dff Jan 4, 2019

@k8s-ci-robot k8s-ci-robot added size/XL and removed size/XXL labels Jan 4, 2019

@pohly pohly force-pushed the pohly:storage-volume-testsuites-concurrency branch from efd2dff to 3033bf6 Jan 4, 2019

@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Jan 4, 2019

/test pull-kubernetes-e2e-kops-aws

1 similar comment
@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Jan 4, 2019

/test pull-kubernetes-e2e-kops-aws

@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Feb 9, 2019

/test pull-kubernetes-e2e-gce-alpha-features

@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Feb 9, 2019

/test pull-kubernetes-e2e-gce-csi-serial

@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Feb 9, 2019

/hold

I need to check why [sig-storage] CSI Volumes [Driver: csi-hostpath-v0] [Testpattern: Dynamic Snapshot] snapshottable should create snapshot with defaults [Feature:VolumeSnapshotDataSource] isn't getting skipped.

pohly added some commits Feb 9, 2019

e2e: fix snapshot skip test
Even if snapshots are supported by the driver interface, the driver or
suite might still want to skip a particular test, so those checks
still need to be executed.
e2e/storage: remove unnecessary empty string checks
There is no need to check for empty strings, we can also directly
initialize structs with the value. The end result is the same when the
value is empty (empty string in the struct).
e2e/storage: use different names for test pods
When the provisioning test gets stuck, the log fills up with messages
about waiting for a certain pod to run. Now the pod names are
pvc-[volume-tester|snapshot]-[writer|reader] plus the random
number appended by Kubernetes. This makes it easier to see where the
test is stuck.
e2e/storage: improve PV checking
TestDynamicProvisioning had multiple ways of choosing additional
checks:
- the PvCheck callback
- the builtin write/read check controlled by a boolean
- the snapshot testing

Complicating matters further, that builtin write/read test had been
more customizable with new fields `NodeSelector` and
`ExpectUnschedulable` which were only set by one particular test (see
#70941).

That is confusing and will only get more confusing when adding more
checks in the future. Therefore the write/read check is now a separate
function that must be enabled explicitly by tests that want to run it.
The snapshot checking is also defined only for the snapshot test.

The test that expects unschedulable pods now also checks for that
particular situation itself. Instead of testing it with two pods (the
behavior from the write/read check) that both fail to start, only a
single unschedulable pod is created.

Because node name, node selector and the `ExpectUnschedulable` were
only used for checking, it is possible to simplify `StorageClassTest`
by removing all of these fields.

Expect(err).NotTo(HaveOccurred()) is an anti-pattern in Ginkgo testing
because a test failure doesn't explain what failed (see
#34059). We avoid it
now by making the check function itself responsible for checking
errors and including more information in those checks.
e2e: refine snapshot test
This addresses the two remaining change requests from
#69036:
- replace "csi-hostpath-v0" name check with capability
  check (cleaner that way)
- add feature tag to "should create snapshot with defaults" because
  that is an alpha feature

Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>

@pohly pohly force-pushed the pohly:storage-volume-testsuites-concurrency branch from ffc483a to f9a4124 Feb 9, 2019

@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Feb 9, 2019

Found it. Introducing the snapshot short-circuited some checks in skipUnsupportedTests. See f8536e8.

/hold cancel

@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Feb 9, 2019

/test pull-kubernetes-e2e-gce-alpha-features

@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Feb 10, 2019

@msau42 all tests are clean now in this PR and the right tests ran:

Passed:
[sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with snapshot data source [Feature:VolumeSnapshotDataSource]
[sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic Snapshot] snapshottable should create snapshot with defaults [Feature:VolumeSnapshotDataSource]

Skipped:
...
[sig-storage] CSI Volumes [Driver: csi-hostpath-v0] [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with snapshot data source [Feature:VolumeSnapshotDataSource]
...

@dims

This comment has been minimized.

Copy link
Member

dims commented Feb 10, 2019

/assign @msau42

@msau42
Copy link
Member

msau42 left a comment

/approve

Just one nit from me.

I was originally concerned that we would be losing test coverage by changing the RWCheck to only launch pods on a single node, but all the volume drivers tested in volume_provisioning.go should be covered in drivers/in_tree.go.

I think it's also ok for the regional_pd.go tests that use RWCheck to only check a single node because we will get multi-node coverage through the failover test cases.

// persistent across pods.
//
// This is a common test that can be called from a StorageClassTest.PvCheck.
func PVWriteReadCheck(client clientset.Interface, claim *v1.PersistentVolumeClaim, volume *v1.PersistentVolume, node NodeSelection) {

This comment has been minimized.

@msau42

msau42 Feb 12, 2019

Member

Can you rename this to PVWriteReadSingleNodeCheck to be more obvious about what the function is doing?

This comment has been minimized.

@pohly

pohly Feb 12, 2019

Author Contributor

Done.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 12, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pohly added some commits Dec 12, 2018

e2e/storage: test usage of volume in multiple pods at once
This is a special case that both kubelet and the volume driver should
support, because users might expect it. One Kubernetes mechanism to
deploy pods like this is via pod affinity.

However, strictly speaking the CSI spec does not allow this usage
mode (see container-storage-interface/spec#150) and
there is an on-going debate to enable it (see
container-storage-interface/spec#178). Therefore
this test gets skipped unless explicitly enabled for a driver.

CSI drivers which create a block device for a remote volume in
NodePublishVolume fail this test. They have to make the volume
available in NodeStageVolume and then in NodePublishVolume merely do a
bind mount (as for example in
https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/gce-pd-csi-driver/node.go#L150).
e2e/storage: test provisioned volume on multiple nodes
Whether the read test after writing was done on the same node was
random for drivers that weren't locked onto a single node. Now it is
deterministic: it always happens on the same node.

The case with reading on another node is covered separately for test
configurations that support it (not locked onto a single node, more
than one node in the test cluster).

As before, the TestConfig.ClientNodeSelector is ignored by the
provisioning testsuite.
e2e/storage: enable concurrent writes for gcepd
The driver should support multiple pods using the same volume on the
same node.

@pohly pohly force-pushed the pohly:storage-volume-testsuites-concurrency branch from f9a4124 to ecc0c4e Feb 12, 2019

@k8s-ci-robot k8s-ci-robot added size/XL and removed size/L labels Feb 12, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 12, 2019

@pohly: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-csi-serial ffc483a link /test pull-kubernetes-e2e-gce-csi-serial

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@pohly

This comment has been minimized.

Copy link
Contributor Author

pohly commented Feb 12, 2019

/retest

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Feb 12, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 12, 2019

@k8s-ci-robot k8s-ci-robot merged commit f968499 into kubernetes:master Feb 12, 2019

17 checks passed

cla/linuxfoundation pohly authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-godeps Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment