Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e refactor #68483

Merged
merged 2 commits into from
Oct 19, 2018
Merged

e2e refactor #68483

merged 2 commits into from
Oct 19, 2018

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Sep 10, 2018

What this PR does / why we need it:

There is interest in using and extending the Kubernetes test/e2e/framework also outside of Kubernetes. More and more people are developing extensions like CSI drivers that get developed and published independently from Kubernetes. Those components need a way to write and run E2E tests on a Kubernetes cluster. Trying to do that with the framework as it stands now is problematic because of the huge dependency chain.

After refactoring, other projects can create and run a custom Ginkgo suite where they:

  • import test/e2e/framework as a vendor dependency without pulling in support for cloud providers and any tests (because of the huge dependency chains that a complete import causes)
  • import support for cloud providers, tests and certain features separately if desired
  • add custom tests without modifying the framework

Which issue(s) this PR fixes:
Fixes #66649

Release note:

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 10, 2018
@spiffxp
Copy link
Member

spiffxp commented Sep 10, 2018

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 10, 2018
@timothysc
Copy link
Member

/ok-to-test

@timothysc
Copy link
Member

/assign @timothysc

@neolit123
Copy link
Member

/assign

@pohly
Copy link
Contributor Author

pohly commented Sep 11, 2018

/retest

@pohly pohly force-pushed the e2e-refactor-pr branch 3 times, most recently from 761f71e to 91a90b6 Compare September 11, 2018 10:46
@pohly
Copy link
Contributor Author

pohly commented Sep 11, 2018

/retest

@pohly pohly force-pushed the e2e-refactor-pr branch 7 times, most recently from 1cd7501 to d994c26 Compare September 12, 2018 08:50
@timothysc timothysc added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Oct 12, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label Oct 12, 2018
Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need to chat about this. This is not really the direction where I think we should be going. I will try to write down my notes. Could you please attend the next sig-testing meeting.

@@ -46,86 +44,18 @@ import (

// ensure auth plugins are loaded
_ "k8s.io/client-go/plugin/pkg/client/auth"

// ensure that cloud providers are loaded
_ "k8s.io/kubernetes/test/e2e/framework/providers/aws"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm pretty against the idea of bin smashing in provider specifics into the main framework. In the long run, this is antithetical to where we want to go with provider code.

If anything we want to excise it and wrap any kube specific are with an extension label, such as [storage]

CreatePD(zone string) (string, error)
DeletePD(pdName string) error
CreatePVSource(zone, diskName string) (*v1.PersistentVolumeSource, error)
DeletePVSource(pvSource *v1.PersistentVolumeSource) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So umm yeah, we need to discuss this.

Copy link
Member

@msau42 msau42 Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIG storage has already created a sub-framework that abstracts out volume plugin setup/teardown. I don't think we need to add an additional interface here. For storage, it is possible to test multiple volume plugins against a single cloud provider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a pointer to that sub-framework? Have you looked at the original code and whether that can be rewritten to use that sub-framework instead?

The original code was in test/e2e/framework/pv_util.go:

func createPD(zone string) (string, error) {
        if zone == "" {
                zone = TestContext.CloudConfig.Zone
        }

        if TestContext.Provider == "gce" || TestContext.Provider == "gke" {
                pdName := fmt.Sprintf("%s-%s", TestContext.Prefix, string(uuid.NewUUID()))

                gceCloud, err := GetGCECloud()
                if err != nil {
                        return "", err
                }

                if zone == "" && TestContext.CloudConfig.MultiZone {
                        zones, err := gceCloud.GetAllZonesFromCloudProvider()
                        if err != nil {
                                return "", err
                        }
                        zone, _ = zones.PopAny()
                }

                tags := map[string]string{}
                err = gceCloud.CreateDisk(pdName, gcecloud.DiskTypeStandard, zone, 2 /* sizeGb */, tags)
                if err != nil {
                        return "", err
                }
                return pdName, nil
        } else if TestContext.Provider == "aws" {
                client := newAWSClient(zone)
                request := &ec2.CreateVolumeInput{}
                request.AvailabilityZone = aws.String(zone)
                request.Size = aws.Int64(10)
                request.VolumeType = aws.String(awscloud.DefaultVolumeType)
                response, err := client.CreateVolume(request)
                if err != nil {
                        return "", err
                }

                az := aws.StringValue(response.AvailabilityZone)
                awsID := aws.StringValue(response.VolumeId)

                volumeName := "aws://" + az + "/" + awsID
                return volumeName, nil
        } else if TestContext.Provider == "azure" {
                pdName := fmt.Sprintf("%s-%s", TestContext.Prefix, string(uuid.NewUUID()))
                azureCloud, err := GetAzureCloud()

                if err != nil {
                        return "", err
                }

                _, diskURI, _, err := azureCloud.CreateVolume(pdName, "" /* account */, "" /* sku */, "" /* location */, 1 /* sizeGb */)
                if err != nil {
                        return "", err
                }
                return diskURI, nil
        } else {
                return "", fmt.Errorf("provider does not support volume creation")
        }
}

It doesn't seem to be used much (storage/nfs_persistent_volume-disruptive.go, upgrades/storage/persistent_volumes.go, scheduling/ubernetes_lite_volumes.go depend on it indirectly). So perhaps this is simply code that can be deleted together with the tests that uses it.

It's just not something that I wanted to propose myself, because I don't know which tests are still relevant. I also didn't want to rewrite test logic. IMHO that would be better if that was done by people who actually know the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subframework defines an interface for different volume plugins to implement at test/e2e/storage/drivers/base.go. I think many of these tests can be converted to use the new framework, it just needs an owner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base.go has the interface, in_tree.go the implementation. Looking at the implementation for GCE, I find:

func (g *gcePdDriver) CreateVolume(volType testpatterns.TestVolType) interface{} {
...
        By("creating a test gce pd volume")
        vname, err := framework.CreatePDWithRetry()
...
}

framework.CreatePDWithRetry is from pv_util.go. In other words, this new API just wraps the existing code. It cannot replace it. Even if the tests were rewritten to not call CreatePDWithRetry directly, the problem would still remain.

Either the new API must be implemented differently (without direct calls to specific implementations) or we keep it as-is and connect it with the actual implementation via the API in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could reimplement each driver's method to call the cloud provider's methods directly, and then put each driver implementation into its own package. My main point was I don't think the Provider interface is the right place to put PD/Volume methods because a cloud provider can support multiple volume types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the "vendor" interface in this PR isn't ideal and something better could be designed and implemented - over time. Right now the interface closely resembles how the framework previously called vendor-specific code (createPD function became the createPD interface method). There's still time and opportunities to improve that interface while all affected code is in the same repo, but I'd like to keep that out of the current PR, otherwise it'll never get done.

@timothysc
Copy link
Member

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 12, 2018
@pohly
Copy link
Contributor Author

pohly commented Oct 12, 2018 via email

@spiffxp
Copy link
Member

spiffxp commented Oct 16, 2018

@timothysc We discussed this during sig-testing, it looks reasonable upon first glance as a step towards abstracting out the provider code. We didn't really know what your concerns were specifically and so found ourselves speculating. Do we need to schedule a breakout?

@timothysc
Copy link
Member

@spiffxp @pohly - I'm sorry I could not make it I have overlapping meetings. Could we have a breakout discussion on Friday this week. Feel free to send out an invite.

@pohly
Copy link
Contributor Author

pohly commented Oct 17, 2018 via email

@timothysc timothysc removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 19, 2018
Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 19, 2018
@spiffxp
Copy link
Member

spiffxp commented Oct 19, 2018

Meeting was scheduled, words were said. We agreed to unblock this to move forward, and @timothysc will file followup issues for further untangling of provider-specific code. There was also discussion of moving this to staging as a followup, but recognition that it's a slightly trickier task than what has been done thus far.

/lgtm
/approve
for /hack

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, spiffxp, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 19, 2018
@pohly
Copy link
Contributor Author

pohly commented Oct 19, 2018

/test pull-kubernetes-e2e-kops-aws

@k8s-ci-robot k8s-ci-robot merged commit 8b36038 into kubernetes:master Oct 19, 2018
@neolit123
Copy link
Member

Meeting was scheduled, words were said. We agreed to unblock this to move forward

great to hear!

@spiffxp
Copy link
Member

spiffxp commented Oct 19, 2018

🎉 for landing this @pohly!

Keeping an eye on https://k8s-testgrid.appspot.com/sig-release-master-blocking#Summary to see if any of the less frequently run jobs / tests were tripped up by this

@Katharine
Copy link
Member

Unfortunately, this PR has broken at least running conformance tests using --provider=skeleton and KUBERNETES_CONFORMANCE_TEST=y.

@spiffxp
Copy link
Member

spiffxp commented Oct 20, 2018

ref: #70042 which appears to fix the above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor test/e2e/framework
8 participants