Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop timer and correctly drain it #101475

Merged
merged 1 commit into from
Aug 9, 2021

Conversation

ash2k
Copy link
Member

@ash2k ash2k commented Apr 26, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

Correctly drains timers acquired from backoff manager. This API is easy to misuse, I wonder if a better one is possible. Something like a function that takes a function and, after it returns, cleans things up properly. That would probably cater for only a subset of use cases though.

Does this PR introduce a user-facing change?

NONE

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Apr 26, 2021
@@ -475,14 +475,14 @@ func (c *csiAttacher) waitForVolumeAttachDetachStatusWithLister(volumeHandle, at
clock = &clock.RealClock{}
)
backoffMgr := wait.NewExponentialBackoffManager(initBackoff, maxBackoff, resetDuration, backoffFactor, jitter, clock)
defer backoffMgr.Backoff().Stop()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect as this may get another timer before draining the existing one. This can happen if ctx signals done. See docs in

// BackoffManager manages backoff with a particular scheme based on its underlying implementation. It provides
// an interface to return a timer for backoff, and caller shall backoff until Timer.C() drains. If the second Backoff()
// is called before the timer from the first Backoff() call finishes, the first timer will NOT be drained and result in
// undetermined behavior.
// The BackoffManager is supposed to be called in a single-threaded environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Yes, after reviewing the API this defer clearly doesn't work.

@@ -491,6 +491,7 @@ func (c *csiAttacher) waitForVolumeAttachDetachStatusWithLister(volumeHandle, at
return nil
}
case <-ctx.Done():
t.Stop()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to drain the timer's channel because the backoff manager is private to this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the <-t.C() case can return, do we need to call t.Stop() if it does?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's a Timer (not a Ticker), it fires once and there is no need to stop it after that.

@@ -166,6 +166,9 @@ func BackoffUntil(f func(), backoff BackoffManager, sliding bool, stopCh <-chan
// of every loop to prevent extra executions of f().
select {
case <-stopCh:
if !t.Stop() {
<-t.C()
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backoff manager was provided to this function so it must stop and drain the timer's channel here, otherwise the owner of the manager cannot reuse it. Even if the owner didn't care, stopping the timer earlier releases some resources so is beneficial.

@ash2k
Copy link
Member Author

ash2k commented Apr 26, 2021

/retest

@jpbetz
Copy link
Contributor

jpbetz commented Apr 27, 2021

/assign @jpbetz
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 27, 2021
@k8s-triage-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2021
@ash2k
Copy link
Member Author

ash2k commented Jul 27, 2021

/remove-lifecycle stale
/retest

ping @jpbetz

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2021
Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question then LGTM.

@@ -475,14 +475,14 @@ func (c *csiAttacher) waitForVolumeAttachDetachStatusWithLister(volumeHandle, at
clock = &clock.RealClock{}
)
backoffMgr := wait.NewExponentialBackoffManager(initBackoff, maxBackoff, resetDuration, backoffFactor, jitter, clock)
defer backoffMgr.Backoff().Stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Yes, after reviewing the API this defer clearly doesn't work.

@@ -491,6 +491,7 @@ func (c *csiAttacher) waitForVolumeAttachDetachStatusWithLister(volumeHandle, at
return nil
}
case <-ctx.Done():
t.Stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the <-t.C() case can return, do we need to call t.Stop() if it does?

Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 27, 2021
@jpbetz
Copy link
Contributor

jpbetz commented Jul 27, 2021

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 27, 2021
@jpbetz
Copy link
Contributor

jpbetz commented Jul 27, 2021

@caesarxuchao, @saad-ali would either of you be willing to approve?

@caesarxuchao
Copy link
Member

/lgtm
/approve

@ash2k
Copy link
Member Author

ash2k commented Jul 28, 2021

/assign davidz627

@davidz627 mind taking a look

@ash2k
Copy link
Member Author

ash2k commented Aug 8, 2021

/unassign davidz627
/assign jsafrane

@jsafrane mind taking a look?

@k8s-ci-robot k8s-ci-robot assigned jsafrane and unassigned davidz627 Aug 8, 2021
@jsafrane
Copy link
Member

jsafrane commented Aug 9, 2021

/approve
should be fine for CSI

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ash2k, caesarxuchao, jpbetz, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 9, 2021
@k8s-ci-robot k8s-ci-robot merged commit a7af9f6 into kubernetes:master Aug 9, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Aug 9, 2021
@ash2k ash2k deleted the ash2k/stop-timer branch August 9, 2021 23:19
@smarterclayton
Copy link
Contributor

This API is easy to misuse, I wonder if a better one is possible. Something like a function that takes a function and, after it returns, cleans things up properly. That would probably cater for only a subset of use cases though.

I am in the process of unifying and aligning some of the methods here, and I concur that clock.Timer is very dangerous and has no clear semantics for a consumer to use, because it differs significantly enough from Go's time.Timer that usage can be complicated. At a minimum, the contract of C() would have to be clarified to describe whether users can cache C(), or whether Stop() must be called only after Reset is called (as users would naively assume). The presence of BackoffManager complicates this even more - since we expose Stop() on Timer any consumer of a backoff manager can "break" the entire backoff manager.

From a design perspective the right option is probably to have the backoff manager return the duration to wait (i.e., be just a variant of backoff), and let the loop manage the timer reuse itself. I may try that along with #107826 and consider the input of #114531.

@smarterclayton
Copy link
Contributor

I took some of the comments in this and the related issues and applied them in #115064 to reduce the potential for accidental misuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants