Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix storage e2e clean up #83653

Open
wants to merge 1 commit into
base: master
from

Conversation

@cwdsuzhou
Copy link
Member

commented Oct 9, 2019

What type of PR is this?

/kind bug
/kind cleanup

What this PR does / why we need it:

Fix potential resource leaking

Which issue(s) this PR fixes:

Fixes #74317

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/cc @gnufied @pohly

@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Oct 9, 2019

/sig storage

@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Oct 9, 2019

/assign @msau42

@cwdsuzhou cwdsuzhou force-pushed the cwdsuzhou:Octo/e2efix branch from f2f3394 to d340936 Oct 9, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Oct 9, 2019

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cwdsuzhou
To complete the pull request process, please assign msau42
You can assign the PR to them by writing /assign @msau42 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -286,10 +286,14 @@ func (r *genericVolumeTestResource) cleanupResource() {
}
if r.pvc != nil {
err := e2epv.DeletePersistentVolumeClaim(f.ClientSet, r.pvc.Name, f.Namespace.Name)
framework.ExpectNoError(err, "Failed to delete PVC %v", r.pvc.Name)
if err != nil {
return err

This comment has been minimized.

Copy link
@pohly

pohly Oct 9, 2019

Contributor

PR looks good overall, I just have one suggestion: instead of return err, can you use github.com/pkg/errors.Wrap (https://github.com/pkg/errors#adding-context-to-an-error) to add more information about what went wrong and where?

The error itself might just be "timeout" - without additional information, it will be impossible to tell which operation timed out.

Can you check what error reporting via framework.ExpectNoError(errors.NewAggregate(errs), "while cleaning up resource") will then print?

github.com/pkg/errors captures the full stack trace for the call chain, but it might not be visible in the output.

This comment has been minimized.

Copy link
@cwdsuzhou

cwdsuzhou Oct 10, 2019

Author Member

Due to errors.NewAggregate(errs), full stack trace can not be visible in the output. But using errors.wrap sill can help a lot, it can take a shot message when it happens.

I have updated the PR, do you have other suggestions?

@cwdsuzhou cwdsuzhou force-pushed the cwdsuzhou:Octo/e2efix branch from d340936 to 0da4098 Oct 10, 2019
@cwdsuzhou cwdsuzhou requested a review from pohly Oct 10, 2019
@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Oct 10, 2019

/retest

Copy link
Contributor

left a comment

Due to errors.NewAggregate(errs), full stack trace can not be visible in the output. But using errors.wrap sill can help a lot, it can take a shot message when it happens.

I wanted to see how this is reported, so I added some fake errors:

diff --git a/test/e2e/storage/testsuites/base.go b/test/e2e/storage/testsuites/base.go
index e18bcf274c..5dc63abb44 100644
--- a/test/e2e/storage/testsuites/base.go
+++ b/test/e2e/storage/testsuites/base.go
@@ -313,7 +313,7 @@ func (r *genericVolumeTestResource) cleanupResource() error {
 	if r.volume != nil {
 		r.volume.DeleteVolume()
 	}
-	return nil
+	return errors.New("fake error")
 }
 
 func createPVCPV(
diff --git a/test/e2e/storage/testsuites/multivolume.go b/test/e2e/storage/testsuites/multivolume.go
index 40ef154d94..de99ef8969 100644
--- a/test/e2e/storage/testsuites/multivolume.go
+++ b/test/e2e/storage/testsuites/multivolume.go
@@ -21,10 +21,11 @@ import (
 	"time"
 
 	"github.com/onsi/ginkgo"
+	"github.com/pkg/errors"
 
 	v1 "k8s.io/api/core/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
-	"k8s.io/apimachinery/pkg/util/errors"
+	apierrors "k8s.io/apimachinery/pkg/util/errors"
 	clientset "k8s.io/client-go/kubernetes"
 	"k8s.io/kubernetes/test/e2e/framework"
 	e2epod "k8s.io/kubernetes/test/e2e/framework/pod"
@@ -112,13 +113,15 @@ func (t *multiVolumeTestSuite) defineTests(driver TestDriver, pattern testpatter
 		var errs []error
 		for _, resource := range l.resources {
 			errs = append(errs, resource.cleanupResource())
+			errs = append(errs, errors.New("fake error for cleanupResource"))
 		}
 
 		if l.driverCleanup != nil {
 			l.driverCleanup()
 			l.driverCleanup = nil
+			errs = append(errs, errors.New("fake error for driverCleanup"))
 		}
-		framework.ExpectNoError(errors.NewAggregate(errs), "while cleanup resource")
+		framework.ExpectNoError(apierrors.NewAggregate(errs), "while cleanup resource")
 		validateMigrationVolumeOpCounts(f.ClientSet, dInfo.InTreePluginName, l.intreeOps, l.migratedOps)
 	}

This leads to:

• Failure [44.021 seconds]
[sig-storage] CSI Volumes
/nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/utils/framework.go:23
  [Driver: csi-hostpath]
  /nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/csi_volumes.go:56
    [Testpattern: Dynamic PV (default fs)] volumes
    /nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/testsuites/base.go:98
      should store data [It]
      /nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/testsuites/volumes.go:150

      Oct 10 18:08:53.985: while cleaning up resource
      Unexpected error:
          <*errors.fundamental | 0xc002625780>: {
              msg: "fake error",
              stack: [0x38313b3, 0x38735b6, 0x38736ce, 0x3873c3d, 0x7b19dc, 0x7b164f, 0x7b0ae4, 0x7b7ab6, 0x7b7364, 0x7bd2af, 0x7bcdc4, 0x7bc607, 0x7bec4e, 0x7c1987, 0x7c16cd, 0x395b747, 0x395e0fb, 0x509010, 0x460791],
          }
          fake error
      occurred

      /nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/testsuites/volumes.go:146

• Failure [62.128 seconds]
[sig-storage] CSI Volumes
/nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/utils/framework.go:23
  [Driver: csi-hostpath]
  /nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/csi_volumes.go:56
    [Testpattern: Dynamic PV (block volmode)] multiVolume [Slow]
    /nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/testsuites/base.go:98
      should access to two volumes with the same volume mode and retain data across pod recreation on the same node [It]
      /nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/testsuites/multivolume.go:133

      Oct 10 18:21:02.942: while cleanup resource
      Unexpected error:
          <errors.aggregate | len:5, cap:8>: [
              {
                  msg: "fake error",
                  stack: [0x38313b3, 0x3857675, 0x3857f52, 0x7b19dc, 0x7b164f, 0x7b0ae4, 0x7b7ab6, 0x7b7364, 0x7bd2af, 0x7bcdc4, 0x7bc607, 0x7bec4e, 0x7c1987, 0x7c16cd, 0x395b9a7, 0x395e35b, 0x509010, 0x460791],
              },
              {
                  msg: "fake error for cleanupResource",
                  stack: [0x38576d0, 0x3857f52, 0x7b19dc, 0x7b164f, 0x7b0ae4, 0x7b7ab6, 0x7b7364, 0x7bd2af, 0x7bcdc4, 0x7bc607, 0x7bec4e, 0x7c1987, 0x7c16cd, 0x395b9a7, 0x395e35b, 0x509010, 0x460791],
              },
              {
                  msg: "fake error",
                  stack: [0x38313b3, 0x3857675, 0x3857f52, 0x7b19dc, 0x7b164f, 0x7b0ae4, 0x7b7ab6, 0x7b7364, 0x7bd2af, 0x7bcdc4, 0x7bc607, 0x7bec4e, 0x7c1987, 0x7c16cd, 0x395b9a7, 0x395e35b, 0x509010, 0x460791],
              },
              {
                  msg: "fake error for cleanupResource",
                  stack: [0x38576d0, 0x3857f52, 0x7b19dc, 0x7b164f, 0x7b0ae4, 0x7b7ab6, 0x7b7364, 0x7bd2af, 0x7bcdc4, 0x7bc607, 0x7bec4e, 0x7c1987, 0x7c16cd, 0x395b9a7, 0x395e35b, 0x509010, 0x460791],
              },
              {
                  msg: "fake error for driverCleanup",
                  stack: [0x38579d2, 0x3857f52, 0x7b19dc, 0x7b164f, 0x7b0ae4, 0x7b7ab6, 0x7b7364, 0x7bd2af, 0x7bcdc4, 0x7bc607, 0x7bec4e, 0x7c1987, 0x7c16cd, 0x395b9a7, 0x395e35b, 0x509010, 0x460791],
              },
          ]
          [fake error, fake error for cleanupResource, fake error for driverCleanup]
      occurred

      /nvme/gopath/src/k8s.io/kubernetes/test/e2e/storage/testsuites/multivolume.go:124

So indeed, we loose the stack information. But I think with informative error texts and usage of errors.Wrap that may be good enough.

defer deleteStorageClass(t.Client, class.Name)
defer func() {
err = deleteStorageClass(t.Client, class.Name)
framework.ExpectNoError(err)

This comment has been minimized.

Copy link
@pohly

pohly Oct 10, 2019

Contributor

Please add an explanation to ExpectNoError, see #34059.

While at it, perhaps also do it in the existing call above it.

l.pod = nil
}

if l.resource != nil {
l.resource.cleanupResource()
err := l.resource.cleanupResource()
errs = append(errs, err)
l.resource = nil
}

if l.driverCleanup != nil {
l.driverCleanup()

This comment has been minimized.

Copy link
@pohly

pohly Oct 10, 2019

Contributor

You would also need to change all driverCleanup calls and their implementations to return an error. Right now, those internally do the same "fail on error" as the other functions instead of returning an error. That would be the clean solution.

The quicker solution would be a tryDriverCleanup wrapper function which recovers an error in a defer and then returns that error.

deleteStorageClass(f.ClientSet, r.sc.Name)
if err := deleteStorageClass(f.ClientSet, r.sc.Name); err != nil {
return errors.Wrapf(err, "Failed to delete StorageClass %v", r.sc.Name)
}
}

// Cleanup volume for pre-provisioned volume tests
if r.volume != nil {
r.volume.DeleteVolume()

This comment has been minimized.

Copy link
@pohly

pohly Oct 10, 2019

Contributor

Same situation as with driverCleanup: DeleteVolume must either return errors or get wrapped.

This comment has been minimized.

Copy link
@cwdsuzhou

cwdsuzhou Oct 12, 2019

Author Member

done
PTAL, thanks

@cwdsuzhou cwdsuzhou force-pushed the cwdsuzhou:Octo/e2efix branch from 0da4098 to 64fc40f Oct 12, 2019
@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels Oct 12, 2019
@cwdsuzhou cwdsuzhou force-pushed the cwdsuzhou:Octo/e2efix branch from 64fc40f to 7468b5e Oct 13, 2019
@cwdsuzhou cwdsuzhou force-pushed the cwdsuzhou:Octo/e2efix branch from 7468b5e to c0e346f Oct 13, 2019
@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Oct 13, 2019

/retest

Copy link
Contributor

left a comment

Almost there... 👍

func tryFunc(f func()) error {
var err error
if f == nil {
return fmt.Errorf("nil func received")

This comment has been minimized.

Copy link
@pohly

pohly Oct 14, 2019

Contributor

If you turn this into return nil (i.e. allow tryFunc to be called with nil) then you can simplify all call locations by removing the if check.

This comment has been minimized.

Copy link
@cwdsuzhou

cwdsuzhou Oct 18, 2019

Author Member

I have removed if check

l.resource = nil
}

if l.driverCleanup != nil {
l.driverCleanup()
errs = append(errs, tryFunc(l.driverCleanup))

This comment has been minimized.

Copy link
@pohly

pohly Oct 14, 2019

Contributor

For example, this if check can then be removed.

@cwdsuzhou cwdsuzhou force-pushed the cwdsuzhou:Octo/e2efix branch from c0e346f to 63efc2d Oct 18, 2019
@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Oct 18, 2019

Thanks for your suggestions @pohly
Done, PTAL

@cwdsuzhou cwdsuzhou requested a review from pohly Oct 18, 2019
@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Oct 18, 2019

/retest

@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Oct 22, 2019

ping @pohly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.