Skip to content
This repository has been archived by the owner on Feb 12, 2019. It is now read-only.

fix a bug in cancellation delayer #343

Merged
merged 2 commits into from
Sep 14, 2016
Merged

fix a bug in cancellation delayer #343

merged 2 commits into from
Sep 14, 2016

Conversation

songgao
Copy link
Contributor

@songgao songgao commented Sep 13, 2016

When calling EnableDelayedCancellationWithGracePeriod the second time, if it's already past grace period since last time EnableDelayedCancellationWithGracePeriod was called, a context.Canceled would be returned even if parent context has not been canceled yet.

This is an issue for e.g. Flush() since we enable the context delayer in libfuse, and the enable function is called again in finalizeMDWriteLocked. For a real server over network, this can cause a lot of cancellations for slow writes.

Resolves keybase/client#4254 (hopefully)

…it was past grace period, even if parent ctx was not canceled yet
if err != nil {
t.Fatalf("1st EnableDelayedCancellationWithGracePeriod failed: %v", err)
}
time.Sleep(5 * time.Millisecond) // make sure async ops sorts out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know what I'm going to say about this. Is there a way we can get rid of the Sleep? <-ctx.Done()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. I can't think of a better way to do this here. So the goal of sleep is let it fail if grace period incorrectly starts in EnableDelayedCancellationWithGracePeriod, like before this PR. In this (incorrect) case, after the "async stuff" sorts out, the next call to EnableDelayedCancellationWithGracePeriod should fail. If we don't do this sleep, there's a chance that next line (2nd call to EnableDelayedCancellationWithGracePeriod) would happen the cancellation delayer updates its internal states, which make the 2nd call succeeds. Does this make sense?

Copy link
Contributor Author

@songgao songgao Sep 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it's unlikely this bug will ever happen again. So I think we can also just remove this entirely. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with removing this test, because it don't test anything. The sleep is meaningless because there is nothing asynchronous that matters for the test.

But I think if you changed the test to one that canceled the context, and then made sure the second call got a proper error, that would be a worthwhile test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the async delay comes from sync.Cond and time.Timer. I tried removing the time.Sleep call and the test did not fail for the previous commit (which started grace period in the Enable function). A 1 millisecond sleep managed to fail it though. But I'm worried it's gonna take longer on CI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that in this PR, with the time.Timer gone, there is no need for the time.Sleep, it is just wasteful. So I think you're better off with a different test that actually checks the cancellation behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see! Sounds good will do!

@strib
Copy link
Contributor

strib commented Sep 13, 2016

Looks good to me mod comment, thanks!

t.Fatalf("1st EnableDelayedCancellationWithGracePeriod failed: %v", err)
}
time.Sleep(5 * time.Millisecond) // make sure async ops sorts out
// parent context is not canceled; second "enable" should succeed even it's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't really understand this test. The grace period never started because cancel() hasn't been called yet, right? And if it was after the grace period, you would get back a context.Canceled error, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. That's what this PR fixes. Before this PR, grace period started in EnableDelayedCancellationWithGracePeriod. So the second time would fail since it's after the grace period. This PR makes sure the grace period only happens after parent context is canceled. So if it's never canceled, the grace period never starts, so 2nd call to EnableDelayedCancellationWithGracePeriod should still succeed. This added test makes sure after grace period and calling EnableDelayedCancellationWithGracePeriod for the 2nd time, it still succeeds as long as the parent context is not canceled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I tested to make sure the previous delayed_cancellation.go fails this test)

@songgao songgao merged commit 33dae26 into master Sep 14, 2016
@songgao songgao deleted the songgao/KBFS-1519 branch September 14, 2016 03:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants