Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e/storage: avoid definining unusable tests #70992

Closed

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Nov 13, 2018

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

After the recent test/e2e/storage refactoring (merged for 1.13), a lot of tests get defined that can only run with specific cluster providers. Those tests then get skipped at runtime. Even tests that get skipped take time (test setup, creating client set or worse, deploying the driver if the driver needs that) and the list of skipped tests in the CI grew a lot without adding any information (skip reason not captured). Therefore this PR:

  • enables the definition of tests depending on the test context
  • changes the "skip test" API so that test definition gets skipped, while still allowing runtime skipping
  • changes the CSI and in-tree drivers to skip definition as much as possible

Special notes for your reviewer:

This contains PR #70862 because of code conflicts, so that PR must be merged first.

Does this PR introduce a user-facing change?:

NONE

/sig storage
/cc @mkimuram
/assign @msau42

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Nov 13, 2018
@k8s-ci-robot
Copy link
Contributor

@pohly: GitHub didn't allow me to request PR reviews from the following users: mkimuram.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

After the recent test/e2e/storage refactoring (merged for 1.13), a lot of tests get defined that can only run with specific cluster providers. Those tests then get skipped at runtime. Even tests that get skipped take time (test setup, creating client set or worse, deploying the driver if the driver needs that) and the list of skipped tests in the CI grew a lot without adding any information (skip reason not captured). Therefore this PR:

  • enables the definition of tests depending on the test context
  • changes the "skip test" API so that test definition gets skipped, while still allowing runtime skipping
  • changes the CSI and in-tree drivers to skip definition as much as possible

Special notes for your reviewer:

This contains PR #70862 because of code conflicts, so that PR must be merged first.

Does this PR introduce a user-facing change?:

NONE

/sig storage
/cc @mkimuram
/assign @msau42

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Nov 13, 2018
@pohly
Copy link
Contributor Author

pohly commented Nov 13, 2018

/hold

PR #70862 must be merged first.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2018
@pohly
Copy link
Contributor Author

pohly commented Nov 13, 2018

There's one change that I didn't make to avoid unnecessary churn: testsuites.RunTestSuite isn't actually running any tests, it merely defines them. This is an important distinction for people reading the code or trying to understand how to use the API, because they have to understand that defining tests and executing them happens at entirely different points in time.

"test suite" is also ambiguous, because in the larger context of the final test binary, the entire set of specs is the "test suite". What the method adds is merely a certain set of tests (or even more precisely, some "specs" in Ginkgo terms).

Therefore it would be useful to rename the method to "AddTests" or "DefineTests". I'd prefer to not use the term "specs" because it is Ginkgo-specific and doesn't match at all with the naming of the package ("testsuites").

Anyway, let me know what you think about this. I can still add the renaming to this PR. But if we do it, we should do it now, to avoid breaking external tests suites which might start using the package.

@neolit123
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 13, 2018
@pohly pohly force-pushed the storage-volume-testsuites-avoid-tests branch 4 times, most recently from 57c3b16 to 91bba41 Compare November 15, 2018 14:18
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 16, 2018
@pohly pohly force-pushed the storage-volume-testsuites-avoid-tests branch from 91bba41 to 831e473 Compare November 16, 2018 18:57
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 16, 2018
@pohly pohly force-pushed the storage-volume-testsuites-avoid-tests branch 2 times, most recently from d049d9c to 007ebd8 Compare November 17, 2018 18:40
pohly added a commit to intel/pmem-csi that referenced this pull request Nov 22, 2018
To run the tests, do:
  make start
  export KUBECONFIG=<value as shown by make start>
  REPO_ROOT=$(pwd) go test -v ./test/e2e -ginkgo.v

This is intentionally pretty verbose. "go test" (without -v) will only
print something on failure, while "-ginkgo.v" can be removed to make
the test execution itself less verbose.

At the moment, a single provisioning test will be executed, with two
being skipped (mount options not declared, no block support).

The out-of-tree version of the "testsuites" package has to be used
because two PRs are still under review:
kubernetes/kubernetes#70862
kubernetes/kubernetes#70992

Kubernetes 1.13 could be used instead, but then more code would get
pulled in (PR 70862) or the API would be different (PR 70992). Either
way, the implementation of k8s.io/kubernetes/pkg/util/mount
changes, which affects the pmem-csi-driver.

That code itself hasn't changed much. The real change is that
Kubernetes now uses a fork of the more-or-less-unmaintained glog. That
is a good thing also for csi-pmem, so all remaining references to glog
get replaced.
pohly added a commit to intel/pmem-csi that referenced this pull request Nov 22, 2018
To run the tests, do:
  make start
  export KUBECONFIG=<value as shown by make start>
  REPO_ROOT=$(pwd) go test -v ./test/e2e -ginkgo.v

This is intentionally pretty verbose. "go test" (without -v) will only
print something on failure, while "-ginkgo.v" can be removed to make
the test execution itself less verbose.

At the moment, a single provisioning test will be executed, with two
being skipped (mount options not declared, no block support).

The out-of-tree version of the "testsuites" package has to be used
because two PRs are still under review:
kubernetes/kubernetes#70862
kubernetes/kubernetes#70992

Kubernetes 1.13 could be used instead, but then more code would get
pulled in (PR 70862) or the API would be different (PR 70992). Either
way, the implementation of k8s.io/kubernetes/pkg/util/mount
changes, which affects the pmem-csi-driver.

That code itself hasn't changed much. The real change is that
Kubernetes now uses a fork of the more-or-less-unmaintained glog. That
is a good thing also for csi-pmem, so all remaining references to glog
get replaced.

The YAML files get updated to make them a bit more similar to how it
is done elsewhere and how the test code expects it:
- separate storage class, because conceptually there can be more
  than one class per driver, each with different settings
- StatefulSet instead of ReplicaSet, resulting in stable pod names
@pohly
Copy link
Contributor Author

pohly commented Dec 20, 2018

/test pull-kubernetes-e2e-gce-100-performance

Copy link
Contributor

@verult verult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the first 2 of the commits. Thanks for improving the skip process!

@@ -1020,8 +1006,8 @@ func (c *cinderDriver) GetDriverInfo() *testsuites.DriverInfo {
return &c.driverInfo
}

func (c *cinderDriver) SkipUnsupportedTest(pattern testpatterns.TestPattern) {
framework.SkipUnlessProviderIs("openstack")
func (c *cinderDriver) SkipUnsupportedTest(pattern testpatterns.TestPattern) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename func to "IsTestSupported". The current form seems like a double negative to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I just didn't want to mess with the original code more than strictly necessary.
But as this is now an API change, I might was well change the method name (not just here, but also the other "skip unsupported test" methods).

}

if bTestDriver, ok := driver.(BeforeEachTestDriver); ok {
pattern := pattern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: something like p := pattern to make the variable less ambiguous

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://golang.org/doc/faq#closures_and_goroutines also uses the same variable name:

for _, v := range values {
        v := v // create a new 'v'.

But I can change it, no problem.

// CreateDriver cleanup all the resources that is created in CreateDriver
// CreateDriver cleanup all the resources that is created in CreateDriver. There is
// no guarantee that CreateDriver succeeded or even was called at all, so the test driver
// has to track resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a well-defined expected behavior if CleanupDriver() was called but CreateDriver() either failed or wasn't called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CleanupDriver() will always get called, under all circumstances. This has been true also before, it just wasn't explicitly mentioned, therefore I added this comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the new comment seems to imply the behavior of CleanupDriver() is different depending on what happens with CreateDriver(). Could you improve the comment to make what you said here more explicit? (Maybe ".... and the test driver has to clean up resources regardless")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

}

// StaticSkipTestDriver is an optional interface that drivers can
// implement to filter out unsuitable tests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add: "...unsuitable tests while tests are defined."

to clarify why the struct is "Static".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed the interface to FilterTestDriver (seemed to make more sense, now that the method is called IsTestSupported) and added the comment.

}
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

!framework.NodeOSDistroIs("gci", "ubuntu", "custom") || 
(pattern.FsType == "xfs" && 
!framework.NodeOSDistroIs("ubuntu", "custom"))

(negated if you are changing the function name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

func RunTestSuite(f *framework.Framework, driver TestDriver, tsInits []func() TestSuite, tunePatternFunc func([]testpatterns.TestPattern) []testpatterns.TestPattern) {
for _, testSuiteInit := range tsInits {
suite := testSuiteInit()
patterns := tunePatternFunc(suite.getTestSuiteInfo().testPatterns)

for _, pattern := range patterns {
if skipUnsupportedTest(suite, driver, pattern) {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log then skip. The user should know a particular test is detected but skipped due to driver incompatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make each test suite invocation very verbose, because that information would get printed before running any test, whether the user intends to run the filtered out test or not.

I agree that this is useful information, but we should make it depend on some non-default command line flag. Perhaps we can (ab)use -ginkgo.dryRun for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To further reduce the verbosity, we could print a single log message per test suite and accumulate incompatible drivers in a list. This is still a huge improvement over the verbosity we have today.

I'm not a fan of relying on a non-default flag that could potentially be hard to discover, or in the case of using -ginkgo.dryRun, hard to correlate with controlling the log level here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what the output should look like. How many lines should that log message have? What information do you expect?

Ideally, it would be something like:

The following tests were not added because they are unsupported:
[sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Dynamic PV (default fs) block mode] - block mode not supported
...
[sig-storage] CSI Volumes [Driver: gcepd] [Testpattern: Dynamic PV (default fs)] - not running on GCE
...

But that will be many lines of output, basically one line for each suite/pattern combination unless we implement some intelligence in RunTestSuite that determines whether all patterns or even all combinations were removed. The other problem is that the reason for skipping is not transported back to RunTestSuite. We either just print the test name prefix and no explanation, or we have to change the return type.

Copy link
Contributor Author

@pohly pohly Dec 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to implement something like this, but hit a major snag: there's no way to retrieve the current set of components (i.e. the text from previous Describe and Context calls) from ginkgo during test registration. The effect is that at the place where an It call gets skipped, not enough information is available to record the entire test name. In my current prototype, the inner test name is available and (with some extra effort) the testsuite/pattern part. The driver name could be added, but that's still no guarantee that the resulting string is complete.

Here's what it looks like:

$ go test -v ./test/e2e 
W1223 17:21:55.928475   18945 test_context.go:392] Unable to find in-cluster config, using default host : http://127.0.0.1:8080
Dec 23 17:21:55.928: INFO: The --provider flag is not set.  Treating as a conformance test.  Some tests may not be run.
I1223 17:21:56.041325   18945 test_context.go:427] The following tests were not added to the test suite because they are not supported in the current configuration:
??? [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with mount options
??? [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with mount options
??? [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with mount options
??? [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with mount options
??? should create and delete block persistent volumes
??? should create and delete block persistent volumes
??? should create and delete block persistent volumes
??? should create and delete block persistent volumes
??? should create and delete block persistent volumes

The prototype commit is here: pohly@20e270a

I see two options:

  1. take this PR after addressing only the other feedback, without logging tests skipped during test registration
  2. work with Ginkgo upstream such that it better supports skipped test registration

The latter will take time. Ginkgo is not entirely frozen, but maintenance isn't particularly active either. One of my PRs has been waiting to get merged for a while now.

@verult @msau42: I'll let you guys decide.

}

func skipPersistenceTest(driver TestDriver) {
func skipPersistenceTest(driver TestDriver) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also seems like a double negative

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've inlined the code from these functions and removed them. They were trivial and now testVolumes is easier to understand.

}

func skipExecTest(driver TestDriver) {
func skipExecTest(driver TestDriver) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

func SIGDescribe(text string, body func()) bool {
return ginkgo.Describe("[sig-storage] "+text, body)
return framework.DescribeWithTestContext("[sig-storage] "+text, body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @rmmh

// time when TestContext has been fully initialized. A callback
// invoked via framework.Describe therefore can add or parameterize
// specs depending on the test configuration.
func DescribeWithTestContext(text string, body func()) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @rmmh

@pohly pohly force-pushed the storage-volume-testsuites-avoid-tests branch from 772e3ef to d6a43cc Compare December 21, 2018 12:48
@pohly
Copy link
Contributor Author

pohly commented Dec 21, 2018

@verult pushed, please have another look.

Copy link
Contributor

@verult verult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting comments for first 2 commits for now; I'm working on the last 2

}
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Something like this is arguably easier to read:

if framework.NodeOSDistroIs("gci", "ubuntu", "custom") {
    if pattern.FsType != "xfs" || framework.NodeOSDistroIs("ubuntu", "custom") {
        return true
    }
}
return false

To improve it further, I would personally use this (but up to you):

return framework.NodeOSDistroIs("gci", "ubuntu", "custom") &&
       (pattern.FsType != "xfs" ||
        framework.NodeOSDistroIs("ubuntu", "custom"))

The double negatives in your code made it a bit hard for me to parse

Copy link
Contributor Author

@pohly pohly Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the latter better, but again didn't want to change too much to keep the review simpler: when a ginkgo.Skip gets replaced by a return true or now return true, then it is obvious that the overall logic is unchanged.

I'll change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -37,8 +37,8 @@ import (
type TestSuite interface {
// getTestSuiteInfo returns the TestSuiteInfo for this TestSuite
getTestSuiteInfo() TestSuiteInfo
// skipUnsupportedTest skips the test if this TestSuite is not suitable to be tested with the combination of TestPattern and TestDriver
skipUnsupportedTest(testpatterns.TestPattern, TestDriver)
// isTestSupported returns false if this TestSuite cannot be tested with the combination of TestPattern and TestDriver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "returns true if this TestSuite can be..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// is not suitable to be tested.
// isTestSupported will determine if the combination of driver, testsuite, and testpattern
// can be tested.
//
// Whether it needs to be skipped is checked by following steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Whether the test combination is supported is checked..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// FilterTestDriver is an optional interface that drivers can
// implement to filter out unsuitable tests while tests get defined.
type FilterTestDriver interface {
// IsTestSupported returns false if the Testpattern is not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testpattern -> TestPattern

nit: "...returns true if the TestPattern is suitable..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@verult verult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2nd half look good. Added some minor comments

cs: driver.GetDriverInfo().Config.Framework.ClientSet,
pvc: resource.pvc,
sc: resource.sc,
dInfo: driver.GetDriverInfo(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the field entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which field? I'm sorry, but it's not obvious in the Github UI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Np. dInfo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But dInfo is gone already, I'm removing it in this PR already because it was clearly redundant.

Is there anything else you want to get removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#72434 removes the entire struct holding these fields.

@@ -120,13 +119,11 @@ func (p *provisioningTestSuite) execTest(driver TestDriver, pattern testpatterns
// Ginkgo's "Global Shared Behaviors" require arguments for a shared function
// to be a single struct and to be passed as a pointer.
// Please see https://onsi.github.io/ginkgo/#global-shared-behaviors for details.
testProvisioning(&input)
testProvisioning(driver, &input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to make sure I understand: BeforeEach() runs before It()s, but not necessarily before testProvisioning()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First testProvisioning runs. It's the function which registers tests (aka It callbacks). BeforeEach is called while tests get registered, and it's callback then gets called before the callback given to It.

That's the background behind this comment before testProvisioning: the input variable gets instantiated before calling testProvisioning, but it's content only gets initialized later when the corresponding BeforeEach callback executes.

FWIW, I find this code pattern of setting up per-test input data unnecessarily complex. When reading tests, it is not obvious where the input data really comes from. But this is unrelated to this PR, and I don't want to change it. It's part of the overall testsuites design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #72434 addresses this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly

- testProvisioning(&input)
+ testProvisioning(driver, &input)

I didn't know that the restriction of single variable and pointer only applies to the argument for execution phase and we can pass additional arguments for preparation phase, here. Good improvement!

A common pattern is to define a test and then skip it at runtime if
the current provider isn't the right one. The downside of this
approach is the high number of useless tests in many test runs.

With the new framework.Describe API it becomes possible to delay spec
definition until TestContext is initialized. Then the spec can be
created dynamically so that only tests that actually can run get
defined.
@pohly pohly force-pushed the storage-volume-testsuites-avoid-tests branch from d6a43cc to 44f677f Compare January 4, 2019 08:37
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 4, 2019
@pohly
Copy link
Contributor Author

pohly commented Jan 4, 2019

I think I addressed all review feedback except the open question around logging of tests that don't get registered (#70992 (comment)) and rebased onto current master.

I put the logging topic onto the agenda for the next SIG-storage meeting on January 8th.

@pohly
Copy link
Contributor Author

pohly commented Jan 4, 2019

/test pull-kubernetes-integration

Copy link
Contributor

@mkimuram mkimuram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly

Sorry for late response.

I still don't completely understand the logic behind DescribeWithTestContext, the other logics look good to me if it works correctly.
Other than above, let me add minor comments for naming and comment.

SkipUnsupportedTest(testpatterns.TestPattern)
}

// FilterTestDriver is an optional interface that drivers can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this PR aims to separate skip to two types, runtime skip and preparation-phase skip. (There will be better wording, though.) This concept sounds good and the comments for FilterTestDriver and BeforeEachTestDriver explain this concept well, but it's hard to guess it from their names and their method's names.

So, it would be better that interfaces and methods are named to imply this concept.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a better name for FilterTestDriver? I'm completely open for suggestions here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly

Now that BeforeEachTestDriver is deleted, so there is no need to compare and distinguish these two for whether they are runtime or not. In this situation, I think that current naming is good enough.

Thank you for your consideration and handling on this.

}

// BeforeEachTestDriver is an optional interface that drivers can
// implement to hook into test execution. It can be used to initialize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to allow this method to initialize additional resources?
Are there any existing use cases for it?

(I'm worrying about that it was due to my previous mistake in putting resource initialization code for GCE driver here, which has already been moved to right place in another commit.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really needed and in fact later (#72002) I ended up removing it again because it didn't turn out to be that useful. CreateDriver gets called at the same time and can replace the extra BeforeEachTestDriver.

I'll remove it directly in this PR instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Most of the checks whether a certain test can run can already be made
while defining tests because it only depends on static
information (driver, testsuite, test pattern, test run
configuration).

Not defining tests that can't run leads to cleaner test suites (no
useless tests) and faster execution (in particular when a driver or
server need to be brought up before starting a test that then doesn't
do anything).

While implementing this, the TestDriver.SkipUnsupportedTest method was
moved into a separate interface because many drivers do not need to
implement it and the the semantic of the functions was inverted ("is
supported" instead of "skip) to make the boolean logic simpler.

Runtime skipping is still supported in CreateDriver.
Generated via hack/update-bazel.sh.
Previously, running the tests against a driver without dynamic
provisioning would have resulted in a test failure - hasn't happened
in practice because all drivers support it.
When passing the driver to the test setup function directly, it
becomes possible to skip test definitions at init time instead of
skipping test execution at runtime. The structs also can be
simplified.
@pohly pohly force-pushed the storage-volume-testsuites-avoid-tests branch from 44f677f to e34e7bb Compare January 4, 2019 19:49

var specs []spec

// DescribeWithTestContext is like ginkgo.Describe, but in contrast to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like the comment to mention the motivation for why someone might want to use this function to "invoke spec bod[ies]" only after "TestContext has been fully initialized". i.e. this exists so that we can skip tests in spec bodies based on TestContext. Otherwise the file is too generic looking and the technical language imposing :]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand "can add or parameterize specs depending on the test configuration" basically explains it but I would word it differently is all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another use case in mind: testing CSI drivers with the upstream e2e test binary.

With this change, the e2e test binary could have a command line parameter(s) that enable testing of a driver by providing the necessary information. The "TestDriver" instance that is given to the storage test suite then gets defined based on these parameters.

Without this change, the same can still be done, but would have to be based on parsing environment variables.

@pohly
Copy link
Contributor Author

pohly commented Jan 8, 2019

In the SIG-storage meeting today, the first question was why skipping tests at runtime is considered bad. I pointed out that drivers get deployed for each test, for which the reply was that a) the driver installation should be triggered when a test really runs (which I don't think can be done nicely) or b) driver installation should happen before the test suite runs.

However, it's not clear how to get the driver installation hooked into the cluster setup. This will require further discussion.

But even when driver installation is not part of the test setup, skipped tests still take time. I don't have hard numbers about that, though. Let's take the gcepd production driver as example, which is pre-installed. @msau42 what's your opinion, do you prefer skipping unusable tests for that driver at runtime or at testsuite definition time?

@msau42
Copy link
Member

msau42 commented Jan 8, 2019

But even when driver installation is not part of the test setup, skipped tests still take time. I don't have hard numbers about that, though.

Could you try running locally some non-supported environment tests, with only the change that adds skip to CreateDriver, to measure how much just skipping the tests take?

@pohly
Copy link
Contributor Author

pohly commented Jan 9, 2019

That can already be tested without any changes. On a 1.13 cluster
brought up with hack/local-up-cluster.sh, running all tests involving
the external gcepd driver takes:

$ go test -v ./test/e2e -args -ginkgo.v -ginkgo.focus=Feature..gcePD-external
...
Will run 31 of 2161 specs
...

S [SKIPPING] in Spec Setup (BeforeEach) [6.157 seconds]
[sig-storage] CSI Volumes
/nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/utils/framework.go:22
  [Driver: pd.csi.storage.gke.io][Feature: gcePD-external]
  /nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/csi_volumes.go:129
    [Testpattern: Dynamic PV (ext3)] volumes
    /nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/testsuites/volumes.go:128
      should be mountable [BeforeEach]
      /nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/testsuites/volumes.go:169

      Jan  9 06:01:46.774: Only supported for providers [gce gke] (not )

      /nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/util.go:292
...
Ran 0 of 2161 Specs in 190.920 seconds
SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 2161 Skipped --- PASS: TestE2E (191.01s)

That's 6 seconds per test. Not too bad and 3 minutes overall don't matter in a CI setup. It could be more annoying when running the tests interactively, though.

But as soon as CreateDriver is expensive, it gets worse of course. We might be able to solve this (eventually) by moving the driver installation into the test cluster setup phase, but what about drivers like NFS which bring up a NFS server for each test? Should and can that also be part of the cluster setup?

It might be easier to delay the CreateDriver call. Right now it gets called outside of the test suites. If we make the test suites responsible for calling it right before running a specific test, then runtime skipping of tests that aren't supported by the driver + test pattern combination becomes cheap.

@pohly
Copy link
Contributor Author

pohly commented Jan 9, 2019

Even better, if the framework instantiation also gets moved after checking the driver + test pattern combination, then we can also save those 6 seconds.

@pohly
Copy link
Contributor Author

pohly commented Jan 10, 2019

There is value in skipping tests at runtime (consistency, result reporting via Ginkgo). By reordering the initialization steps it is possible to address the main pain point that this PR was meant for (avoiding useless driver deployment in skipped tests), see PR #72434. Therefore I am closing this PR.

@pohly pohly closed this Jan 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants