Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e refactor #68483

Merged
merged 2 commits into from Oct 19, 2018

Conversation

@pohly
Copy link
Contributor

commented Sep 10, 2018

What this PR does / why we need it:

There is interest in using and extending the Kubernetes test/e2e/framework also outside of Kubernetes. More and more people are developing extensions like CSI drivers that get developed and published independently from Kubernetes. Those components need a way to write and run E2E tests on a Kubernetes cluster. Trying to do that with the framework as it stands now is problematic because of the huge dependency chain.

After refactoring, other projects can create and run a custom Ginkgo suite where they:

  • import test/e2e/framework as a vendor dependency without pulling in support for cloud providers and any tests (because of the huge dependency chains that a complete import causes)
  • import support for cloud providers, tests and certain features separately if desired
  • add custom tests without modifying the framework

Which issue(s) this PR fixes:
Fixes #66649

Release note:

@spiffxp

This comment has been minimized.

Copy link
Member

commented Sep 10, 2018

/ok-to-test

@timothysc

This comment has been minimized.

Copy link
Member

commented Sep 10, 2018

/ok-to-test

@timothysc

This comment has been minimized.

Copy link
Member

commented Sep 10, 2018

/assign @timothysc

@neolit123

This comment has been minimized.

Copy link
Member

commented Sep 10, 2018

/assign

@pohly

This comment has been minimized.

Copy link
Contributor Author

commented Sep 11, 2018

/retest

@pohly pohly force-pushed the pohly:e2e-refactor-pr branch 3 times, most recently from 761f71e to 91a90b6 Sep 11, 2018

@pohly

This comment has been minimized.

Copy link
Contributor Author

commented Sep 11, 2018

/retest

@pohly pohly force-pushed the pohly:e2e-refactor-pr branch 7 times, most recently from 1cd7501 to d994c26 Sep 11, 2018

@timothysc
Copy link
Member

left a comment

So we need to chat about this. This is not really the direction where I think we should be going. I will try to write down my notes. Could you please attend the next sig-testing meeting.

@@ -46,86 +44,18 @@ import (

// ensure auth plugins are loaded
_ "k8s.io/client-go/plugin/pkg/client/auth"

// ensure that cloud providers are loaded
_ "k8s.io/kubernetes/test/e2e/framework/providers/aws"

This comment has been minimized.

Copy link
@timothysc

timothysc Oct 12, 2018

Member

So I'm pretty against the idea of bin smashing in provider specifics into the main framework. In the long run, this is antithetical to where we want to go with provider code.

If anything we want to excise it and wrap any kube specific are with an extension label, such as [storage]

CreatePD(zone string) (string, error)
DeletePD(pdName string) error
CreatePVSource(zone, diskName string) (*v1.PersistentVolumeSource, error)
DeletePVSource(pvSource *v1.PersistentVolumeSource) error

This comment has been minimized.

Copy link
@timothysc

timothysc Oct 12, 2018

Member

So umm yeah, we need to discuss this.

This comment has been minimized.

Copy link
@msau42

msau42 Oct 15, 2018

Member

SIG storage has already created a sub-framework that abstracts out volume plugin setup/teardown. I don't think we need to add an additional interface here. For storage, it is possible to test multiple volume plugins against a single cloud provider.

This comment has been minimized.

Copy link
@pohly

pohly Oct 16, 2018

Author Contributor

Do you have a pointer to that sub-framework? Have you looked at the original code and whether that can be rewritten to use that sub-framework instead?

The original code was in test/e2e/framework/pv_util.go:

func createPD(zone string) (string, error) {
        if zone == "" {
                zone = TestContext.CloudConfig.Zone
        }

        if TestContext.Provider == "gce" || TestContext.Provider == "gke" {
                pdName := fmt.Sprintf("%s-%s", TestContext.Prefix, string(uuid.NewUUID()))

                gceCloud, err := GetGCECloud()
                if err != nil {
                        return "", err
                }

                if zone == "" && TestContext.CloudConfig.MultiZone {
                        zones, err := gceCloud.GetAllZonesFromCloudProvider()
                        if err != nil {
                                return "", err
                        }
                        zone, _ = zones.PopAny()
                }

                tags := map[string]string{}
                err = gceCloud.CreateDisk(pdName, gcecloud.DiskTypeStandard, zone, 2 /* sizeGb */, tags)
                if err != nil {
                        return "", err
                }
                return pdName, nil
        } else if TestContext.Provider == "aws" {
                client := newAWSClient(zone)
                request := &ec2.CreateVolumeInput{}
                request.AvailabilityZone = aws.String(zone)
                request.Size = aws.Int64(10)
                request.VolumeType = aws.String(awscloud.DefaultVolumeType)
                response, err := client.CreateVolume(request)
                if err != nil {
                        return "", err
                }

                az := aws.StringValue(response.AvailabilityZone)
                awsID := aws.StringValue(response.VolumeId)

                volumeName := "aws://" + az + "/" + awsID
                return volumeName, nil
        } else if TestContext.Provider == "azure" {
                pdName := fmt.Sprintf("%s-%s", TestContext.Prefix, string(uuid.NewUUID()))
                azureCloud, err := GetAzureCloud()

                if err != nil {
                        return "", err
                }

                _, diskURI, _, err := azureCloud.CreateVolume(pdName, "" /* account */, "" /* sku */, "" /* location */, 1 /* sizeGb */)
                if err != nil {
                        return "", err
                }
                return diskURI, nil
        } else {
                return "", fmt.Errorf("provider does not support volume creation")
        }
}

It doesn't seem to be used much (storage/nfs_persistent_volume-disruptive.go, upgrades/storage/persistent_volumes.go, scheduling/ubernetes_lite_volumes.go depend on it indirectly). So perhaps this is simply code that can be deleted together with the tests that uses it.

It's just not something that I wanted to propose myself, because I don't know which tests are still relevant. I also didn't want to rewrite test logic. IMHO that would be better if that was done by people who actually know the tests.

This comment has been minimized.

Copy link
@msau42

msau42 Oct 16, 2018

Member

The subframework defines an interface for different volume plugins to implement at test/e2e/storage/drivers/base.go. I think many of these tests can be converted to use the new framework, it just needs an owner.

This comment has been minimized.

Copy link
@pohly

pohly Oct 16, 2018

Author Contributor

base.go has the interface, in_tree.go the implementation. Looking at the implementation for GCE, I find:

func (g *gcePdDriver) CreateVolume(volType testpatterns.TestVolType) interface{} {
...
        By("creating a test gce pd volume")
        vname, err := framework.CreatePDWithRetry()
...
}

framework.CreatePDWithRetry is from pv_util.go. In other words, this new API just wraps the existing code. It cannot replace it. Even if the tests were rewritten to not call CreatePDWithRetry directly, the problem would still remain.

Either the new API must be implemented differently (without direct calls to specific implementations) or we keep it as-is and connect it with the actual implementation via the API in this PR.

This comment has been minimized.

Copy link
@msau42

msau42 Oct 16, 2018

Member

We could reimplement each driver's method to call the cloud provider's methods directly, and then put each driver implementation into its own package. My main point was I don't think the Provider interface is the right place to put PD/Volume methods because a cloud provider can support multiple volume types.

This comment has been minimized.

Copy link
@pohly

pohly Oct 19, 2018

Author Contributor

I agree that the "vendor" interface in this PR isn't ideal and something better could be designed and implemented - over time. Right now the interface closely resembles how the framework previously called vendor-specific code (createPD function became the createPD interface method). There's still time and opportunities to improve that interface while all affected code is in the same repo, but I'd like to keep that out of the current PR, otherwise it'll never get done.

@timothysc

This comment has been minimized.

Copy link
Member

commented Oct 12, 2018

/hold

@pohly

This comment has been minimized.

Copy link
Contributor Author

commented Oct 12, 2018

@spiffxp

This comment has been minimized.

Copy link
Member

commented Oct 16, 2018

@timothysc We discussed this during sig-testing, it looks reasonable upon first glance as a step towards abstracting out the provider code. We didn't really know what your concerns were specifically and so found ourselves speculating. Do we need to schedule a breakout?

@timothysc

This comment has been minimized.

Copy link
Member

commented Oct 16, 2018

@spiffxp @pohly - I'm sorry I could not make it I have overlapping meetings. Could we have a breakout discussion on Friday this week. Feel free to send out an invite.

@pohly

This comment has been minimized.

Copy link
Contributor Author

commented Oct 17, 2018

@timothysc
Copy link
Member

left a comment

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Oct 19, 2018

@spiffxp

This comment has been minimized.

Copy link
Member

commented Oct 19, 2018

Meeting was scheduled, words were said. We agreed to unblock this to move forward, and @timothysc will file followup issues for further untangling of provider-specific code. There was also discussion of moving this to staging as a followup, but recognition that it's a slightly trickier task than what has been done thus far.

/lgtm
/approve
for /hack

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Oct 19, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, spiffxp, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pohly

This comment has been minimized.

Copy link
Contributor Author

commented Oct 19, 2018

/test pull-kubernetes-e2e-kops-aws

@k8s-ci-robot k8s-ci-robot merged commit 8b36038 into kubernetes:master Oct 19, 2018

18 checks passed

cla/linuxfoundation pohly authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@neolit123

This comment has been minimized.

Copy link
Member

commented Oct 19, 2018

Meeting was scheduled, words were said. We agreed to unblock this to move forward

great to hear!

@spiffxp

This comment has been minimized.

Copy link
Member

commented Oct 19, 2018

🎉 for landing this @pohly!

Keeping an eye on https://k8s-testgrid.appspot.com/sig-release-master-blocking#Summary to see if any of the less frequently run jobs / tests were tripped up by this

@Katharine

This comment has been minimized.

Copy link
Member

commented Oct 20, 2018

Unfortunately, this PR has broken at least running conformance tests using --provider=skeleton and KUBERNETES_CONFORMANCE_TEST=y.

@spiffxp

This comment has been minimized.

Copy link
Member

commented Oct 20, 2018

ref: #70042 which appears to fix the above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.