Fix deadlock in keepSafe #369

robert-milan · 2019-06-17T14:32:40Z

No description provided.

Dieterbe · 2019-06-17T16:06:21Z

destination/keepsafe.go

@@ -64,8 +65,6 @@ func (k *keepSafe) GetAll() [][]byte {
 }

 func (k *keepSafe) Stop() {
-	k.Lock()
-	close(k.closed)
+	k.closed = true


is it safe to do un-synchronized writes and reads on the same variable? i believe with go's memory model, the answer is no.
i would keep the lock around the write (or if you want to be really fancy, do an atomic operation on an int, but that seems overkill)

Yeah, it's not safe heh. I was thinking atomic. We can't keep the write locks around it, that is what is causing the deadlock. The other goroutine will never acquire the lock, because goroutine 1 won't release it, because it is waiting for gourtine 2 to return and decrement the waitgroup.

I guess we could just restrict it to the write though, good point.

Dieterbe

looks good. but you can get rid of the select and just for range over the ticker channel directly

woodsaj · 2019-06-17T20:13:37Z

couldnt the race here be more easily solved by just removing the locks from Stop()?
I dont see any reason why the lock was originally being acquired before closing the k.closed chan and calling wg.Wait(). Stop() should only ever be called once.

robert-milan · 2019-06-18T04:39:20Z

couldnt the race here be more easily solved by just removing the locks from Stop()?
I dont see any reason why the lock was originally being acquired before closing the k.closed chan and calling wg.Wait(). Stop() should only ever be called once.

It should not be called twice, but I'm not as familiar with this code. The current change is safer though, otherwise if our assumption of Stop() never being called twice is wrong then we will just get a panic.

woodsaj

I am not really happy with this approach, i think we need to just use a simple channel that is closed when stop is called.
The main problems i have are:

using a "shutdown" chan is the most common way of dealing with "plugins" like this, and is how we do this in just about every other piece of code we write. It is completely acceptable for a panic to arise if "Stop()" is called multiple times (the panic will be due to calling close() on an already closed channel). There is also an expectation that "plugins" wont be used after stop() has been called.
This approach results in an additional delay of up to k.periodKeep between when "Stop()" is called and when the "keepClean()" goroutine exits. A quick look over the code suggests that this would be a pretty bad thing.
https://github.com/graphite-ng/carbon-relay-ng/blob/master/destination/destination.go#L295

https://github.com/graphite-ng/carbon-relay-ng/blob/master/destination/destination.go#L304

the call to conn.clearRedo() calls c.keepSafe.Stop().

With the approach used in this PR, a delay of up to 10seconds will be added when connections drop and and reconnect.

robert-milan · 2019-06-19T11:03:44Z

I am not really happy with this approach, i think we need to just use a simple channel that is closed when stop is called.
The main problems i have are:

using a "shutdown" chan is the most common way of dealing with "plugins" like this, and is how we do this in just about every other piece of code we write. It is completely acceptable for a panic to arise if "Stop()" is called multiple times (the panic will be due to calling close() on an already closed channel). There is also an expectation that "plugins" wont be used after stop() has been called.

As long as we are fine with panics I can change it back to using a channel

This approach results in an additional delay of up to k.periodKeep between when "Stop()" is called and when the "keepClean()" goroutine exits. A quick look over the code suggests that this would be a pretty bad thing.

Yes, I did think about that. I wasn't sure if it would be a problem or not.

https://github.com/graphite-ng/carbon-relay-ng/blob/master/destination/destination.go#L295

https://github.com/graphite-ng/carbon-relay-ng/blob/master/destination/destination.go#L304

the call to conn.clearRedo() calls c.keepSafe.Stop().

With the approach used in this PR, a delay of up to 10seconds will be added when connections drop and and reconnect.

Fair enough

robert-milan changed the title ~~Fix deadlock in keepSafe~~ [WIP] Fix deadlock in keepSafe Jun 17, 2019

robert-milan changed the title ~~[WIP] Fix deadlock in keepSafe~~ Fix deadlock in keepSafe Jun 17, 2019

robert-milan requested review from Dieterbe, DanCech and replay June 17, 2019 14:45

Dieterbe reviewed Jun 17, 2019

View reviewed changes

robert-milan force-pushed the fix-deadlock branch from 071cd38 to eaa8ccc Compare June 17, 2019 16:18

Dieterbe approved these changes Jun 17, 2019

View reviewed changes

robert-milan requested a review from woodsaj June 19, 2019 06:29

woodsaj requested changes Jun 19, 2019

View reviewed changes

robert-milan force-pushed the fix-deadlock branch from 250225e to df70463 Compare June 19, 2019 11:25

Fix deadlock in keepSafe

3e3fc6b

robert-milan force-pushed the fix-deadlock branch from df70463 to 3e3fc6b Compare June 19, 2019 11:27

robert-milan requested review from woodsaj and Dieterbe June 19, 2019 17:01

Dieterbe approved these changes Jun 19, 2019

View reviewed changes

robert-milan merged commit 44034df into master Jun 19, 2019

Dieterbe deleted the fix-deadlock branch June 19, 2019 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deadlock in keepSafe #369

Fix deadlock in keepSafe #369

robert-milan commented Jun 17, 2019

Dieterbe Jun 17, 2019

robert-milan Jun 17, 2019 •

edited

Dieterbe left a comment

woodsaj commented Jun 17, 2019

robert-milan commented Jun 18, 2019

woodsaj left a comment

robert-milan commented Jun 19, 2019

Fix deadlock in keepSafe #369

Fix deadlock in keepSafe #369

Conversation

robert-milan commented Jun 17, 2019

Dieterbe Jun 17, 2019

Choose a reason for hiding this comment

robert-milan Jun 17, 2019 • edited

Choose a reason for hiding this comment

Dieterbe left a comment

Choose a reason for hiding this comment

woodsaj commented Jun 17, 2019

robert-milan commented Jun 18, 2019

woodsaj left a comment

Choose a reason for hiding this comment

robert-milan commented Jun 19, 2019

robert-milan Jun 17, 2019 •

edited