New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proxy_store: improve error handling #107
proxy_store: improve error handling #107
Conversation
Signed-off-by: Silvio Moioli <silvio@moioli.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for debugging purposes
if status, ok := item.Object.(*metav1.Status); ok { | ||
logrus.Debugf("WatchNames received error: %s", status.Message) | ||
} else { | ||
logrus.Debugf("WatchNames received error: %s", item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think watch.Event has a String()
so this likely would have failed a go vet
and will result in an improperly formatted log message. Did you want %v
instead of %s
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the CI validate step is broken at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prevents a goroutine leak when item.Object is a `runtime.Object` but not a `metav1.Object`, as in that case `WatchNames`’s `for` loop will quit early and subsequent calls to `returnErr` will remain parked forever. This helps with rancher/rancher#41225 Fuller explanation in rancher#107
This prevents a goroutine leak when item.Object is a `runtime.Object` but not a `metav1.Object`, as in that case `WatchNames`’s `for` loop will quit early and subsequent calls to `returnErr` will remain parked forever. This helps with rancher/rancher#41225 Fuller explanation in rancher#107
This prevents a goroutine leak when item.Object is a `runtime.Object` but not a `metav1.Object`, as in that case `WatchNames`’s `for` loop will quit early and subsequent calls to `returnErr` will remain parked forever. This helps with rancher/rancher#41225 Fuller explanation in rancher#107
This draft PR aims to prove a theory that resulted in a goroutine leak observed by one Rancher user after the 2.7.1->2.7.2 update.
This is one of the potential causes of SURE-6256 aka rancher/rancher#41225 currently under investigation.
Main symptom is many goroutines parked at
returnErr
:Theory:
returnErr
waits onc <-
because nobody is ever reading fromc
steve/pkg/stores/proxy/proxy_store.go
Lines 258 to 259 in 3d3cc77
c
is, in most cases, read by a tightfor
loop: that guarantees thatreturnErr
will eventually return. But in one case, an error condition can return from that loop earlier than the channelc
is closed:steve/pkg/stores/proxy/proxy_store.go
Lines 363 to 367 in 3d3cc77
(note:
WatchNames
getsc
fromwatch
, which creates it and passes it tolistAndWatch
, which in turn callsreturnErr
. Hence,WatchNames
’sfor
loop reads from the channelreturnErr
writes to)item.Object
coming out ofc
is aruntime.Object
but not ametav1.Object
, thenWatchNames
’sfor
loop will quit early, nobody reads fromc
any longer, and whatever next call toreturnErr
will block untilc
is closed. Butc
is never closed, becausec
is closed bywatch
, which waits forlistAndWatch
to terminate, but that in turn waits forreturnErr
to terminate, but that never happens because nobody reads fromc
. Such deadlock causes a goroutine leak, and possibly an associated memory leak.Note that the suspect
if
statement that terminates the loop was indeed introduced with 2.7.2, specifically:60d234d#diff-82b79befb09b4c7baa9a34e0412389dc2be544a3e56086726419bbf939a9e9b9R347-R350
CC @rmweir