New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix additional panic #110040
Fix additional panic #110040
Conversation
/sig api-machinery |
Hrm unit tests seem to deadlock, investigating now. |
9a780c6
to
1b60d7b
Compare
go func() { | ||
<-stopCh | ||
t.Log("Stopping Watchers") | ||
m.stopWatching(int64(i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/golang/go/wiki/CommonMistakes#using-reference-to-loop-iterator-variable
you need to capture the i
variable with i := i
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff --git a/staging/src/k8s.io/apimachinery/pkg/watch/mux_test.go b/staging/src/k8s.io/apimachinery/pkg/watch/mux_test.go
index 17e5779cfcd..35fe9258bfd 100644
--- a/staging/src/k8s.io/apimachinery/pkg/watch/mux_test.go
+++ b/staging/src/k8s.io/apimachinery/pkg/watch/mux_test.go
@@ -263,11 +263,17 @@ func TestBroadcasterShutdownRace(t *testing.T) {
// eventbroadcaster, see real usage pattern in startRecordingEvents()
go func() {
<-stopCh
- t.Log("Stopping Watchers")
+ t.Logf("Stopping Watchers %d %d", int64(i), i)
m.stopWatching(int64(i))
}()
}
see
=== RUN TestBroadcasterShutdownRace
mux_test.go:266: Stopping Watchers 2 2
mux_test.go:266: Stopping Watchers 2 2
--- PASS: TestBroadcasterShutdownRace (0.00s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh right I forgot about the var capture, that always gets me :)
it deadlocks if you send an event in the meantime diff --git a/staging/src/k8s.io/apimachinery/pkg/watch/mux_test.go b/staging/src/k8s.io/apimachinery/pkg/watch/mux_test.go
index 17e5779cfcd..a5549abbdf8 100644
--- a/staging/src/k8s.io/apimachinery/pkg/watch/mux_test.go
+++ b/staging/src/k8s.io/apimachinery/pkg/watch/mux_test.go
@@ -255,6 +255,7 @@ func TestBroadcasterShutdownRace(t *testing.T) {
// Add a bunch of watchers
const testWatchers = 2
for i := 0; i < testWatchers; i++ {
+ i := i
_, err := m.Watch()
if err != nil {
t.Fatalf("Unable start event watcher: '%v' (will not retry!)",
err)
@@ -268,6 +269,12 @@ func TestBroadcasterShutdownRace(t *testing.T) {
}()
}
+ event := Event{Type: Added, Object: &myType{"foo", "hello world"}}
+ err := m.Action(event.Type, event.Object)
+ if err != nil {
+ t.Fatalf("error sending event: %v", err)
+ }
+
// Manually simulate m.Shutdown() but change it to force a race scenario
// 1. Close watcher stopchannel, so watchers are closed independently of the
// eventBroadcaster |
@@ -115,6 +115,8 @@ func (obj functionFakeRuntimeObject) DeepCopyObject() runtime.Object { | |||
// won't ever see that event, and will always see any event after they are | |||
// added. | |||
func (m *Broadcaster) blockQueue(f func()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is the problem, we should rethink this to avoid this blockQueue "hack" as mentioned in the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed I'm looking at how we can get rid of this, thanks for the article!
I think the thing I need to work through is how can we ensure that watchers don't end up accidentally receiving events that were sent on m.incoming
before they were added/ why exactly we care about that....
I made this diagram to try and help myself figure out this whole flow
@aojea - I think it deadlocks because: (a) watchers have channel size of 0 (so synchronous), (b) nothing is reading from watchers. I actually think this change is correct (though I agree we should simplify it, but I would prefer to first fix it and then simplify). Unless you disagree it's correct. |
There was a race creating a panic with shutting down an eventbroadcaster and it's associated watchers. This test exposes it. Signed-off-by: Andrew Stoycos <astoycos@redhat.com>
Ensure we take the incomingBlock Lock in blockQueue to ensure there is not any possiblity of sending on a closed incoming channel. Signed-off-by: Andrew Stoycos <astoycos@redhat.com>
I verified this claim (bumped the channel size) and updated the unit test |
Made one more pass trying to find a potential problem or deadlock - but that looks fine. The functions passed to blockQueue never take lock and the closing of channels looks fine. /lgtm /priority important-soon |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: astoycos, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/triage accepted |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Take the incomingBlock Lock
in blockQueue to ensure there
is not any possibility of sending on a
closed incoming channel if
shutdown()
closes it unexpectedly.
In #108080 we missed locking the new
incomingBlock
in the blockQueue function resulting in a race as shown here#109975 (comment)
I would still like to be able to repro in a unit test but have been unable to do so yet, going ahead and posting this to start the conversation on how to best cover this failure-mode in testing.
cc @aojea @wojtek-t
Which issue(s) this PR fixes:
Fixes #108518
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: