-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock in keepSafe #369
Conversation
destination/keepsafe.go
Outdated
@@ -64,8 +65,6 @@ func (k *keepSafe) GetAll() [][]byte { | |||
} | |||
|
|||
func (k *keepSafe) Stop() { | |||
k.Lock() | |||
close(k.closed) | |||
k.closed = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it safe to do un-synchronized writes and reads on the same variable? i believe with go's memory model, the answer is no.
i would keep the lock around the write (or if you want to be really fancy, do an atomic operation on an int, but that seems overkill)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not safe heh. I was thinking atomic. We can't keep the write locks around it, that is what is causing the deadlock. The other goroutine will never acquire the lock, because goroutine 1 won't release it, because it is waiting for gourtine 2 to return and decrement the waitgroup.
I guess we could just restrict it to the write though, good point.
071cd38
to
eaa8ccc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. but you can get rid of the select and just for range over the ticker channel directly
couldnt the race here be more easily solved by just removing the locks from Stop()? |
It should not be called twice, but I'm not as familiar with this code. The current change is safer though, otherwise if our assumption of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really happy with this approach, i think we need to just use a simple channel that is closed when stop is called.
The main problems i have are:
-
using a "shutdown" chan is the most common way of dealing with "plugins" like this, and is how we do this in just about every other piece of code we write. It is completely acceptable for a panic to arise if "Stop()" is called multiple times (the panic will be due to calling close() on an already closed channel). There is also an expectation that "plugins" wont be used after stop() has been called.
-
This approach results in an additional delay of up to
k.periodKeep
between when "Stop()" is called and when the "keepClean()" goroutine exits. A quick look over the code suggests that this would be a pretty bad thing.
https://github.com/graphite-ng/carbon-relay-ng/blob/master/destination/destination.go#L295
https://github.com/graphite-ng/carbon-relay-ng/blob/master/destination/destination.go#L304
the call to conn.clearRedo()
calls c.keepSafe.Stop()
.
With the approach used in this PR, a delay of up to 10seconds will be added when connections drop and and reconnect.
As long as we are fine with panics I can change it back to using a channel
Yes, I did think about that. I wasn't sure if it would be a problem or not.
Fair enough |
250225e
to
df70463
Compare
df70463
to
3e3fc6b
Compare
No description provided.