Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Remove Provider Specific details from e2e tests #70194

Open
timothysc opened this issue Oct 24, 2018 · 24 comments

Comments

@timothysc
Copy link
Member

commented Oct 24, 2018

Is this a FEATURE REQUEST?:
The e2e tests have details baked in that have existed since the early days around providers. Recent refactoring has exposed brittleness in how we invoke the tests and TBH provider specific behavioral tests that depend on non-standard APIs should be moved outside of tree.

/kind feature
/sig testing
/cc @neolit123 @pohly @BenTheElder @kubernetes/cncf-conformance-wg @kubernetes/sig-testing @kubernetes/k8s-infra-team

@timothysc

This comment has been minimized.

Copy link
Member Author

commented Oct 24, 2018

@hogepodge

This comment has been minimized.

Copy link
Member

commented Oct 24, 2018

sig-cloud-provider can notify individual providers and help assign work on this enhancement

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Oct 24, 2018

/assign

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Oct 24, 2018

@timothysc do we have any details on what tests specifically need changing or are there too many to list?

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Oct 24, 2018

nvm, I did a quick scan and it's everywhere 😬

@BenTheElder

This comment has been minimized.

Copy link
Member

commented Oct 24, 2018

I suspect this will make more sense as we get the provider drivers out of tree, which can be followed by ripping that provider out of the in-tree e2e code? doing it the other way around seems problematic.
cc @dims @cheftako

@pohly

This comment has been minimized.

Copy link
Contributor

commented Oct 25, 2018

Is the goal to not have any dependency (whether it is direct or indirect) on provider-specific functionality in Kubernetes? The idea behind the vendor interface was that tests can still be developed in-tree and different cloud providers can share the same test logic. That doesn't have to be in Kubernetes itself, though; "shared tests" could also be its own repo.

Another concern was raised in the CSI working group, independent of this PR here: when removing tests that depend on vendor-specific functionality, Kubernetes CI by itself might exercise less code paths in Kubernetes core and thus miss regressions that currently get caught.

@neolit123

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

my vote would go for no provider implementations in k/k, only interfaces or a framework that can be implemented in third party repos.
in terms of CI, test jobs can always pull code from third party repos and build + run tests so that k8s has good coverage.

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

There's a muddled boundary between what is "provider specific" and what tests are "generic" but only pass on certain providers (like this test). The latter becomes harder to reason with because we need to decouple the provider specific parts in Kubernetes itself (an effort we are working on now) before we we can decouple it in the tests. There may be intermediate steps we have to take to get us closer to that point but I agree with @neolit123 that an ideal end result is that there are no provider tests in k/k and each provider can run the core tests suites in k/k and their own provider specific test suits (which lives in their own provider repos).

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

@pohly to answer your question directly, yes the goal is the get rid of all cloud provider depenencies in k8s.io/kubernetes. Though there will be a transitional period where we'll vendor k8s.io/cloud-provider-<provider> instead of k8s.io/kubernetes/pkg/cloudprovider/providers/<provider> so yeah perhaps a "shared tests" repo can import those providers if they want.

@BenTheElder

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

+1 to what @neolit123 said, though I will add that presumably these tests are necessarily coupled to the provider implementations, and imo should be versioned with them, so I think probably they should move out together, provider and provider tests.

Another concern was raised in the CSI working group, independent of this PR here: when removing tests that depend on vendor-specific functionality, Kubernetes CI by itself might exercise less code paths in Kubernetes core and thus miss regressions that currently get caught.

I don't think this is acceptable. The CSI interface should be fully test-able without vendor specific details in the tests and vendor specific testing should happen in the driver repos. That's sort of the point of having an interface imo... things should be able to target the interface, including the tests. We should be able to use various "local" drivers or generic ones like say NFS to test within the repo, and test other drivers elsewhere..

To the same point, we don't test all CRI in presubmit, or all cloud providers or... we'd never merge anything with that matrix.
Edit: and in fact ideally we're looking at options to test presubmit provider-less-ly for example

We can always put vendor specific CSI tests in release blocking CI without being in the core repo anyhow if we really need to.

More generally we should not catch literally all regressions via pre-submit-test-all-the-things in kubernetes/kubernetes. We already cannot. If the tests aren't in k/k presubmit, it's not particularly important from a testing POV where the code lives, probably it should live close to the vendor specific code it tests.
We have post-submit and continuous testing for this which can use any number of repos.

cc @davidz627

@pohly

This comment has been minimized.

Copy link
Contributor

commented Oct 25, 2018

@BenTheElder

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

If those kind of tests aren't welcome in Kubernetes itself
anymore, then we'll need this extra "shared tests repo", because they
can't (or rather, shouldn't) be moved into the repos of each vendor
either.

That seems unfortunate, ideally that should still be behind a k8s style API / interface. I'm not sure that those are necessarily unwelcome so much as something we should avoid. I'd hope we aim for tests to generally be of high enough quality to move towards conformance unless they are:

  • testing provider details (which belongs to the provider to manage in the long term, not k/k...)
  • testing performance
  • intentionally disruptive

EG that private registry test, surely we could just run a local private registry in the test cluster instead of depending on GCE / GKE?

Something based on NFS is indeed being discussed at the moment, but it
is ready yet. There are ways to mitigate this issue, but not always is
mocking close enough to the real thing.

Well of course, but we should be able to cover the interface surface / paths without any particular vendor ...?
We'd of course not be covering the vendor implementations, but we should cover those where they are implemented, which is presumably out of tree ...?

@neolit123

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

Some tests aren't specific to one provider, but depend on
functionality provided only by some providers via provider-specific
APIs. If those kind of tests aren't welcome in Kubernetes itself
anymore, then we'll need this extra "shared tests repo", because they
can't (or rather, shouldn't) be moved into the repos of each vendor
either.

could you provide an example of such a test?
if a single test is importing bits from different c-provider SDK/API it feels as if such a test should be split into multiple tests to avoid the scenario.

such tests should not reside in k/k.

@davidz627

This comment has been minimized.

Copy link
Contributor

commented Oct 25, 2018

/cc @saad-ali @msau42 @mkimuram

For storage:
This is already on our radar. That being said, this would be a huge change and I believe we have a multi-stage plan to address this.

  • In the near term @mkimuram is doing some refactoring of the tests which helps clarify the distinction between providers (drivers) and test suites (run for each provider).
  • In the slightly longer term I believe @pohly and @msau42 have expressed interest in refactoring the tests even further to decouple test suites from the driver so that the test suites can be easily consumed from an outside repository (providers)
  • In the much longer term we plan on migrating all in-tree drivers to CSI (https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-migration.md) and at the end of that process should be able to delete all in-tree storage provider code. This is necessarily coupled with he maturity of the respective CSI Drivers and therefore we can also remove those drivers from the e2e tests. Each provider can then target the k8s/k8s test suites with their own drivers in their third party repositories.

Below is from @msau42:
yes, we're taking baby steps towards reaching that end goal, but until we have the appropriate cloud-provider specific testing infrastructure in place, we can't just remove the tests now.

Also to the other point about why can't we just test mock drivers in-tree mock drivers don't give us the same level of confidence as a real production driver. mocks are great for unit or integration testing, but we want to test real end-user behavior as much as possible. if there's a way to make an out-of-tree job still release blocking for kubernetes (and it sounds like there is), then i agree that this is a good way forward

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

@BenTheElder

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

Also to the other point about why can't we just test mock drivers in-tree mock drivers don't give us the same level of confidence as a real production driver. mocks are great for unit or integration testing, but we want to test real end-user behavior as much as possible. if there's a way to make an out-of-tree job still release blocking for kubernetes (and it sounds like there is), then i agree that this is a good way forward

Not suggesting just "mock" drivers, though those would be great too, but "actual" drivers for say, NFS, which we can run using only what is in tree, or even using other "actual" drivers but making the test logic (IE e2e.test) only depend on EG the CSI, w/o vendor details in the tests.

We can also have other tests block releases (EG some CI job integrating k/k and a vendor driver..), but we really ought to get vendor details out of tree so we can patch those independently of k/k and vice versa, ... that includes the tests currently.

Obviously this is a long-term thing, and there are already some efforts working on the non-test code, but we probably should do something similar for the tests eventually.

More concretely: The providers moving out of k/k tend to need to depend on / vendor k/k, so the test code cannot in turn depend on the provider code or we have a cycle...
Edit: which more or less is already something we're having to fix (the cycles).

@msau42

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

No single driver, except maybe a mock driver, can test the full functionality of the CSI interface. For example, NFS can test mount/unmount, but in order to test attach/detach, you need a cloud-provider volume plugin. Same goes for other storage features like snapshots and topology. For storage, at least, we'll need a combination of "common" drivers like nfs along with cloud provider specific plugins to be able to get complete coverage. I do agree though, that we should work towards moving all provider-specific testing out of tree, but we may still need to aggregate results from in-tree and out-of-tree in order to get the full picture.

@BenTheElder

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

👍 Also discussed this offline with @davidz627, I think mostly we had some terminology mismatch. All of the work coming out of storage for this looks great 💯
I think we'll be in a much better place within a few releases.. we (David and I) at least plan to follow up ~beginning of the next cycle about getting specifically some more testing targeting CSI into release-blocking.

Edit: and thanks for all the work on this!! :-)

@pohly

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2018

@neolit123

This comment has been minimized.

Copy link
Member

commented Oct 26, 2018

@pohly
thanks for the examples!

That would lead to copy-and-paste of the test logic and thus make it
harder to update the tests later on. I think it is better to define a
test-specific interface and let cloud providers hook into that if they
want. Just to be clear, that interface doesn't have to be
ProviderInterface above.

yes, this makes sense. if the test is a common ground for the CPs, an interface can be hooked into instead.

on a side note, if "cloud provider A" related tests reside in a "cloud provider A" related repo, it would be up to the maintainers if they want to import the "cloud provider B" packages. it feels like they probably shouldn't, but the important part is that at least we won't have that in k/k.

@spiffxp spiffxp added this to To Triage in cncf-k8s-conformance-wg Jan 11, 2019

@fejta-bot

This comment has been minimized.

Copy link

commented Jan 24, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@neolit123

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

/lifecycle frozen

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

I might be missing some context here, but my take on this is that as long as we still have cloud providers in k8s.io/kubernetes, we will necessarily have to couple the E2E tests here. E2E tests that are cloud provider specific should be moved out in lockstep with the cloud providers in-tree. It's not ideal but I think it's the sanest way to ensure test coverage for those behaviors in the meantime. We (SIG Cloud Provider) will take ownership of doing this as we chip away at getting cloud providers removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.