fix a bug in cancellation delayer #343

songgao · 2016-09-13T21:55:28Z

When calling EnableDelayedCancellationWithGracePeriod the second time, if it's already past grace period since last time EnableDelayedCancellationWithGracePeriod was called, a context.Canceled would be returned even if parent context has not been canceled yet.

This is an issue for e.g. Flush() since we enable the context delayer in libfuse, and the enable function is called again in finalizeMDWriteLocked. For a real server over network, this can cause a lot of cancellations for slow writes.

Resolves keybase/client#4254 (hopefully)

…it was past grace period, even if parent ctx was not canceled yet

strib · 2016-09-13T23:57:35Z

libkbfs/delayed_cancellation_test.go

+	if err != nil {
+		t.Fatalf("1st EnableDelayedCancellationWithGracePeriod failed: %v", err)
+	}
+	time.Sleep(5 * time.Millisecond) // make sure async ops sorts out


You know what I'm going to say about this. Is there a way we can get rid of the Sleep? <-ctx.Done()?

hmm.. I can't think of a better way to do this here. So the goal of sleep is let it fail if grace period incorrectly starts in EnableDelayedCancellationWithGracePeriod, like before this PR. In this (incorrect) case, after the "async stuff" sorts out, the next call to EnableDelayedCancellationWithGracePeriod should fail. If we don't do this sleep, there's a chance that next line (2nd call to EnableDelayedCancellationWithGracePeriod) would happen the cancellation delayer updates its internal states, which make the 2nd call succeeds. Does this make sense?

Although it's unlikely this bug will ever happen again. So I think we can also just remove this entirely. What do you think?

I'm ok with removing this test, because it don't test anything. The sleep is meaningless because there is nothing asynchronous that matters for the test.

But I think if you changed the test to one that canceled the context, and then made sure the second call got a proper error, that would be a worthwhile test.

I think the async delay comes from sync.Cond and time.Timer. I tried removing the time.Sleep call and the test did not fail for the previous commit (which started grace period in the Enable function). A 1 millisecond sleep managed to fail it though. But I'm worried it's gonna take longer on CI.

My point is that in this PR, with the time.Timer gone, there is no need for the time.Sleep, it is just wasteful. So I think you're better off with a different test that actually checks the cancellation behavior.

Ah I see! Sounds good will do!

strib · 2016-09-13T23:58:12Z

Looks good to me mod comment, thanks!

strib · 2016-09-14T00:09:59Z

libkbfs/delayed_cancellation_test.go

+		t.Fatalf("1st EnableDelayedCancellationWithGracePeriod failed: %v", err)
+	}
+	time.Sleep(5 * time.Millisecond) // make sure async ops sorts out
+	// parent context is not canceled; second "enable" should succeed even it's


Actually I don't really understand this test. The grace period never started because cancel() hasn't been called yet, right? And if it was after the grace period, you would get back a context.Canceled error, right?

You are right. That's what this PR fixes. Before this PR, grace period started in EnableDelayedCancellationWithGracePeriod. So the second time would fail since it's after the grace period. This PR makes sure the grace period only happens after parent context is canceled. So if it's never canceled, the grace period never starts, so 2nd call to EnableDelayedCancellationWithGracePeriod should still succeed. This added test makes sure after grace period and calling EnableDelayedCancellationWithGracePeriod for the 2nd time, it still succeeds as long as the parent context is not canceled.

(I tested to make sure the previous delayed_cancellation.go fails this test)

fix a bug in cancellation delayer where second Enable call failed if …

1381d9a

…it was past grace period, even if parent ctx was not canceled yet

songgao assigned strib Sep 13, 2016

strib reviewed Sep 13, 2016
View reviewed changes

strib reviewed Sep 14, 2016
View reviewed changes

songgao force-pushed the songgao/KBFS-1519 branch from f296dad to 185a4be Compare September 14, 2016 03:28

make test useful ...

185a4be

songgao merged commit 33dae26 into master Sep 14, 2016

songgao deleted the songgao/KBFS-1519 branch September 14, 2016 03:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix a bug in cancellation delayer #343

fix a bug in cancellation delayer #343

songgao commented Sep 13, 2016 •

edited

Loading

strib Sep 13, 2016

songgao Sep 14, 2016

songgao Sep 14, 2016 •

edited

Loading

strib Sep 14, 2016

songgao Sep 14, 2016

strib Sep 14, 2016

songgao Sep 14, 2016

strib commented Sep 13, 2016

strib Sep 14, 2016

songgao Sep 14, 2016

songgao Sep 14, 2016

fix a bug in cancellation delayer #343

fix a bug in cancellation delayer #343

Conversation

songgao commented Sep 13, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

songgao Sep 14, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

strib commented Sep 13, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

songgao commented Sep 13, 2016 •

edited

Loading

songgao Sep 14, 2016 •

edited

Loading