Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e/storage: delay CreateDriver, simplify APIs and test definition #72434

Merged

Conversation

@pohly
Copy link
Contributor

commented Dec 29, 2018

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

CreateDriver is a potentially expensive operation, depending on the
driver. Creating and tearing down a framework instance also takes
time (measured at 6 seconds on a fast machine) and produces quite a
bit of log output.

Both can be avoided for tests that skipped based on static
information (like for instance the current OS, vendor, driver and test
pattern) by making the test suite responsible for creating framework
and driver.

The lifecycle of the TestConfig instance was confusing because it was
stored inside the DriverInfo, a struct which conceptually is static,
while the TestConfig is dynamic. It is cleaner to separate the two,
even if that means that an additional pointer must be passed into some
functions. Now CreateDriver is responsible for initializing the
PerTestConfig that is to be used by the test.

To make this approach simpler to implement (= less functions which
need the pointer) and the tests easier to read, the entire setup and
test definition is now contained in a single function. This is how it
is normally done in Ginkgo. This is easier to read because one can see
at a glance where variables are set, instead of having to trace values
though two additional structs (TestResource and TestInput).

Because we are changing the API already, also other changes are made:

  • some function prototypes get simplified
  • the naming of functions is changed to match their purpose
    (tests aren't executed by the test suite, they only get defined
    for later execution)
  • unused methods get removed (TestSuite.skipUnsupportedTest is redundant)

Which issue(s) this PR fixes:
Fixes #72288

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

e2e storage tests run faster and are easier to read
@msau42

This comment has been minimized.

Copy link
Member

commented Jan 2, 2019

/assign @verult
cc @mkimuram

@pohly

This comment has been minimized.

Copy link
Contributor Author

commented Jan 3, 2019

/cc @mkimuram

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2019

@pohly: GitHub didn't allow me to request PR reviews from the following users: mkimuram.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @mkimuram

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pohly

This comment has been minimized.

Copy link
Contributor Author

commented Jan 3, 2019

This WIP because only the provisioning testsuite has been updated. To get the code compiled, the rest of the testsuites were removed.

@pohly

This comment has been minimized.

Copy link
Contributor Author

commented Jan 3, 2019

I'd like to get feedback on this approach and a decision on #70992 before continuing with this PR.

@pohly pohly force-pushed the pohly:storage-volume-testsuites-config branch 2 times, most recently from 1d427cf to c764180 Jan 4, 2019

@mkimuram

This comment has been minimized.

Copy link
Contributor

commented Jan 4, 2019

@pohly

I understand that 12th commit is the core part of this PR, 1st-5th commits are from #70992, and 6th-11th commit are doing other minor fixes or improvements than separating config from driver.

I think that it would be easier to be reviewed by separating 6th-11th from this PR if they aren't prerequisites for this PR.
(10th commit might be better to be merged as part of #70992. 6th,7th, 9th, and 11th commits won't have dependency to #70992 and this PR, so it could be a separate PR, which applies change to master branch. 8th commit will have a dependency to #70992, but it would be easy to resolve merge conflict, so it would be better to make it a change to master branch. In addition, 8th commit seems adding new test, so this should be a separate PR. )

As for 12th commit, I agree to separate config from driver.

Can this refactoring be easily applied to other testsuites?
For provisioning testsuite, it only needs to handle dynamic provisioning. However, for most of other tests, they needs to also handle pre-provisioned and inline volume. I wonder how common logics for creating such resources could be implemented to each test suites by this way.
It would be helpful if you could commit some of the sample implementations for other testusites to see how it looks like, first. (Like two tests from subpath and one test from volumes.)

If there are easy way to achieve it, I agree to change this way, for it looks easier to read and understand.

@davidz627

This comment has been minimized.

Copy link
Contributor

commented Jan 4, 2019

/cc

@k8s-ci-robot k8s-ci-robot requested a review from davidz627 Jan 4, 2019

@davidz627
Copy link
Contributor

left a comment

just a quick first pass. Thanks for working on this!

Show resolved Hide resolved test/e2e/framework/specs.go Outdated
Show resolved Hide resolved test/e2e/storage/csi_volumes.go Outdated
Show resolved Hide resolved test/e2e/storage/drivers/csi.go Outdated
Show resolved Hide resolved test/e2e/storage/testsuites/provisioning.go Outdated
Show resolved Hide resolved test/e2e/storage/testsuites/provisioning.go Outdated
Show resolved Hide resolved test/e2e/storage/testsuites/provisioning.go Outdated
Show resolved Hide resolved test/e2e/storage/testsuites/provisioning.go Outdated
Show resolved Hide resolved test/e2e/storage/testsuites/testdriver.go Outdated
Show resolved Hide resolved test/e2e/storage/testsuites/testdriver.go Outdated
@@ -59,7 +59,7 @@ type TestVolume interface {
type PreprovisionedVolumeTestDriver interface {
TestDriver
// CreateVolume creates a pre-provisioned volume of the desired volume type.
CreateVolume(volumeType testpatterns.TestVolType) TestVolume
CreateVolume(config *TestConfig, volumeType testpatterns.TestVolType) TestVolume

This comment has been minimized.

Copy link
@wongma7

wongma7 Jan 7, 2019

Contributor

same as with CreateDriver, I think it would be clearer to return a TestConfig if CreateVolume may modify the config

This comment has been minimized.

Copy link
@pohly

pohly Jan 8, 2019

Author Contributor

So you meant it should take a TestConfig and return a potentially different one?

This comment has been minimized.

Copy link
@wongma7

wongma7 Jan 8, 2019

Contributor

yes

This comment has been minimized.

Copy link
@pohly

pohly Jan 10, 2019

Author Contributor

Not done yet.

@pohly pohly force-pushed the pohly:storage-volume-testsuites-config branch from c764180 to 4410815 Jan 10, 2019

@pohly pohly changed the title WIP: e2e/storage: delay CreateDriver, simplify APIs and test definition e2e/storage: delay CreateDriver, simplify APIs and test definition Feb 14, 2019

@wongma7
Copy link
Contributor

left a comment

I just had some inconsequential suggestions.
driver & driver interface changes lgtm.
testsuite changes largely lgtm, may need to do another pass. Agree with removing skipUnsupportedTest from testsuites, only snapshots implemented it

// Now do the more expensive test initialization.
config, testCleanup = driver.PrepareTest(f)
resource = createGenericVolumeTestResource(driver, config, pattern)
if resource.volSource == nil {

This comment has been minimized.

Copy link
@wongma7

wongma7 Feb 14, 2019

Contributor

nit, check not needed

This comment has been minimized.

Copy link
@pohly

pohly Feb 14, 2019

Author Contributor

Probably not, but it was there in the original code and I wanted to avoid changing test logic.

// Now do the more expensive test initialization.
config, testCleanup = driver.PrepareTest(f)
resource = createGenericVolumeTestResource(driver, config, pattern)
if resource.volSource == nil {

This comment has been minimized.

Copy link
@wongma7

wongma7 Feb 14, 2019

Contributor

same

This comment has been minimized.

Copy link
@pohly

pohly Feb 14, 2019

Author Contributor

Same 🤷‍♂. I wanted to avoid changing test logic.

claimSize := dDriver.GetClaimSize()
sc = dDriver.GetDynamicProvisionStorageClass(config, "")
if sc == nil {
framework.Skipf("Driver %q does not define Dynamic Provision StorageClass - skipping", dInfo.Name)

This comment has been minimized.

Copy link
@wongma7

wongma7 Feb 14, 2019

Contributor

these Skip messages should specify that the sc is not defined because the fstype %s is not supported. there is no other reason (currently?) for a driver to implement DynamicPVTestDriver but return nil

This comment has been minimized.

Copy link
@pohly

pohly Feb 14, 2019

Author Contributor

Can this code here know why the driver didn't return a storage class? I don't think it should make assumptions based on the current implementation.

This comment has been minimized.

Copy link
@wongma7

wongma7 Feb 14, 2019

Contributor

yeah I guess it's fine as-is. I guess it is the driver author's responsibility to make sense of why their GetDynamicProvisionStorageClass would return nil. Maybe it could return an error in the future, seems pointless now though

// TestDynamicProvisioning tests dynamic provisioning with specified StorageClassTest
func (t StorageClassTest) TestDynamicProvisioning() *v1.PersistentVolume {
client := t.Client
claim := t.Claim

This comment has been minimized.

Copy link
@wongma7

wongma7 Feb 14, 2019

Contributor

could fail w/ nice error if this is nil

This comment has been minimized.

Copy link
@pohly

pohly Feb 14, 2019

Author Contributor

Not sure if Github is showing me the right context. What can be nil here? t.Client and/or t.Claim?

Are you worried that they might be accidentally passed as nil because they are now struct members instead of separate function parameters? I agree, this isn't caught explicitly. If this is the only requested change, then I'd prefer to add assertions in a separate PR.

This comment has been minimized.

Copy link
@wongma7

wongma7 Feb 14, 2019

Contributor

I meant t.Claim, though client could be nil as well. either way, the nil error will be caught a few lines below, so yeah these assertions can be added in a separate PR as "nice-to-haves" 👍

This comment has been minimized.

Copy link
@pohly

pohly Feb 15, 2019

Author Contributor

As I was rebasing anyway, I added the asserts.

@jarrpa

jarrpa approved these changes Feb 14, 2019

Copy link
Contributor

left a comment

I can't speak to the full PR, but the parts I'm familiar with LGTM.

@davidz627
Copy link
Contributor

left a comment

mostly LGTM. Just a couple small comments. Thanks for doing this!

defer destroyCSIDriver(csics, driver)
defer driver.CleanupDriver()
defer destroyCSIDriver(csics, config.GetUniqueDriverName())
defer testCleanup()

This comment has been minimized.

Copy link
@davidz627

davidz627 Feb 14, 2019

Contributor

shouldn't we want to cleanup the test whether the driver exists or not

This comment has been minimized.

Copy link
@pohly

pohly Feb 15, 2019

Author Contributor

Yes. This was introduced in #74079 and I noticed it too when rebasing on top of that, then notified @jarrpa. There's now a fix pending in #74079... which has been merged. I'll need to rebase once more.

By("Deleting pod")
err := framework.DeletePodWithWait(f, cs, pod)
Expect(err).ToNot(HaveOccurred(), "while deleting pod")
pod = nil

This comment has been minimized.

Copy link
@davidz627

davidz627 Feb 14, 2019

Contributor

this is suspicious to me. I want to avoid a pattern where we are reusing structs and just clearing field. Is there a way we can start from scratch instead?

This comment has been minimized.

Copy link
@pohly

pohly Feb 15, 2019

Author Contributor

The test does start from scratch, with a fresh copy of all local variables. Setting the pointers to nil is meant to ensure that cleanup can be called more than once, should the need ever arise. Currently this isn't done, so I could remove the pod = nil and leave the pointers to already deleted items, but personally would prefer to keep this extra precaution.

This comment has been minimized.

Copy link
@pohly

pohly Feb 15, 2019

Author Contributor

Hmm, I need to be more precise: each test starts with a fresh copy of the the local variables if init initializes all of them. I see how that might be something that can be missed for a complex test suite where some variables are only set under certain conditions.

This can be avoided by storing all these local variables in a struct and re-initializing the struct before setting individual members. I've done that in commit ec3655a.

I'm undecided whether it makes the code easier to read ("shared" variables clearly marked with l. prefix) or harder to read (extra prefix). @davidz627 Please let me know whether we should do it this way or by carefully setting all variables in init.

pohly added some commits Dec 28, 2018

e2e/storage: introduce TestVolume interface
This increases type safety and makes the code easier to read because
it becomes obvious that the "test resource" passed to some functions
must be the result of a previous CreateVolume.

This makes it possible to remove:
- functions that never did anything (the DeleteVolume methods in
  drivers that never create a volume)
- type casts (in the DeleteVolume implementation)
- the unused DeleteVolume parameters
- the stand-alone DeleteVolume function (which would be just a non-nil
  check)

GetPersistentVolumeSource and GetVolumeSource could also become
methods on more specific interfaces - they don't actually use anything
from TestDriver instance which provides them.

The main motivation however is to reduce the number of methods which
might need an explicit test config parameter.
e2e/storage: speed up skipping, simplify APIs and test definition
CreateDriver (now called SetupTest) is a potentially expensive
operation, depending on the driver. Creating and tearing down a
framework instance also takes time (measured at 6 seconds on a fast
machine) and produces quite a bit of log output.

Both can be avoided for tests that skip based on static
information (like for instance the current OS, vendor, driver and test
pattern) by making the test suite responsible for creating framework
and driver.

The lifecycle of the TestConfig instance was confusing because it was
stored inside the DriverInfo, a struct which conceptually is static,
while the TestConfig is dynamic. It is cleaner to separate the two,
even if that means that an additional pointer must be passed into some
functions. Now CreateDriver is responsible for initializing the
PerTestConfig that is to be used by the test.

To make this approach simpler to implement (= less functions which
need the pointer) and the tests easier to read, the entire setup and
test definition is now contained in a single function. This is how it
is normally done in Ginkgo. This is easier to read because one can see
at a glance where variables are set, instead of having to trace values
though two additional structs (TestResource and TestInput).

Because we are changing the API already, also other changes are made:
- some function prototypes get simplified
- the naming of functions is changed to match their purpose
  (tests aren't executed by the test suite, they only get defined
  for later execution)
- unused methods get removed (TestSuite.skipUnsupportedTest is redundant)
e2e/storage: remove test pattern filtering
The recommended approach for not running unsuitable tests is to skip
them at runtime with an explanation. Filtering out unsuitable test
patters and thus not even defining unsuitable tests was done earlier
because it was faster than skipping tests at runtime.

But now these tests can be skipped efficiently, so this special case
can be removed.
e2e: update bazel BUILD files
Generated via hack/update-bazel.sh.
e2e/storage: re-initialize all local variables
There is a risk that the init function does not reset one of the local
variables that was set by a previous test. To avoid this, all
variables set by init are now in a struct which gets cleaned
completely first.

@pohly pohly force-pushed the pohly:storage-volume-testsuites-config branch from 1692255 to ec3655a Feb 15, 2019

@pohly

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2019

/test pull-kubernetes-e2e-gce-alpha-features
/test pull-kubernetes-e2e-gce-csi-serial

@pohly

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2019

/priority important-soon

@childsb childsb added this to the v1.13 milestone Feb 15, 2019

@msau42

This comment has been minimized.

Copy link
Member

commented Feb 19, 2019

@childsb this should be 1.14 milestone right?

/approve

will let @davidz627 and/or @wongma7 lgtm

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Feb 19, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@davidz627

This comment has been minimized.

Copy link
Contributor

commented Feb 20, 2019

/lgtm

@fejta-bot

This comment has been minimized.

Copy link

commented Feb 20, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot

This comment has been minimized.

Copy link

commented Feb 20, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 07cf5fa into kubernetes:master Feb 20, 2019

19 checks passed

cla/linuxfoundation pohly authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-alpha-features Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-godeps Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.