-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: shutdown only once #1912
fix: shutdown only once #1912
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @acud and @esadakar)
@@ -698,6 +701,15 @@ func NewBee(addr string, swarmAddress swarm.Address, publicKey ecdsa.PublicKey, | |||
func (b *Bee) Shutdown(ctx context.Context) error { | |||
var mErr error | |||
|
|||
// if a shutdown is already in process, return here | |||
b.shutdownMutex.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use sync.Once
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is the handling in pidKiller. We need to know if the shutdown is already in progress so we don't kill the process. So if the Shutdown call triggered this from the postage listener we don't do anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could still use sync.Once
:)
solves #1909 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @aloknerurkar and @mrekucci)
pkg/node/node.go, line 715 at r1 (raw file):
// tryClose is a convenient closure which decrease // repetitive io.Closer tryClose procedure. tryClose := func(c io.Closer, errMsg string) {
This can be factored out at relocated before the Do
call.
pkg/node/node.go, line 106 at r2 (raw file):
listenerCloser io.Closer postageServiceCloser io.Closer shutdownInProgress bool
You don't need this one anymore.
pkg/node/node.go, line 107 at r2 (raw file):
postageServiceCloser io.Closer shutdownInProgress bool shutdowner sync.Once
By convention, should called shutdownOnce
or so.
pkg/node/node.go, line 106 at r2 (raw file): Previously, mrekucci (Peter Mrekaj) wrote…
We do actually. This got deleted during the rebase but we need to check for this flag in Shutdown and return an error to prevent force kill. |
pkg/node/node.go, line 705 at r1 (raw file): Previously, acud (acud) wrote…
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @acud)
pkg/node/node.go, line 106 at r2 (raw file):
Previously, aloknerurkar (Alok Nerurkar) wrote…
We do actually. This got deleted during the rebase but we need to check for this flag in Shutdown and return an error to prevent force kill.
Ok, because I don't see it used the way you've described. This variable is just set in the Do
method call but never checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @aloknerurkar)
pkg/node/node.go, line 710 at r3 (raw file):
var mErr error if b.shutdownInProgress {
I don't see why this is bool check is needed, this is why we use sync.Once
here. Plus, you have no assurances against two goroutines calling shutdown and both passing this stage of the code. Since there's no mutex to protect the bool, both goroutines might end up calling the sync.Once
. Obviously it will execute only once due to its atomicity, but it does not eliminate the data race.
pkg/node/node.go, line 869 at r3 (raw file):
// we need to see if a shutdown is already in progress, so we dont do a force // kill here if errors.Is(err, ErrShutdownInProgress) {
if the error is swallowed, I'm not sure it is really useful. I think this can be simplified a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @aloknerurkar)
pkg/node/node.go, line 869 at r3 (raw file): Previously, acud (acud) wrote…
We need to know if a Shutdown was already triggered to prevent doing a pid Kill. I think sync.Once is not the right thing to use here. The simpler mutex check was better. How are you thinking of simplifying this? |
pkg/node/node.go, line 710 at r3 (raw file): Previously, acud (acud) wrote…
Please check the other comment. I think it's better to revert to before I used sync.Once. We need to explicitly know if Shutdown was triggered or not. sync.Once is not of use here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @acud, @mrekucci, and @zelig)
pkg/node/node.go, line 710 at r3 (raw file):
Previously, aloknerurkar (Alok Nerurkar) wrote…
Please check the other comment. I think it's better to revert to before I used sync.Once. We need to explicitly know if Shutdown was triggered or not. sync.Once is not of use here.
If you want to avoid mutexes and you don't find sync.Once
to be a good fit for this case, then consider using atomic.Bool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @acud, @aloknerurkar, @mrekucci, and @zelig)
pkg/node/node.go, line 869 at r3 (raw file):
Previously, aloknerurkar (Alok Nerurkar) wrote…
We need to know if a Shutdown was already triggered to prevent doing a pid Kill. I think sync.Once is not the right thing to use here. The simpler mutex check was better.
How are you thinking of simplifying this?
You're using an error to swallow an error here. A few thoughts:
- on the first call to
sync.Once
you get the actual error to be returned (mErr
), the second call returns an error which is swallowed here, so maybe an error is not the right thing to return. perhaps abool
indicating shutdown in progress? in case you choose to return just an error fromShutdown
, why not propagate it to the caller? - this implementation still does not guard the possibility that two goroutines execute shutdown at the same time. both will end up calling the shutdown sequence and the bool would still be racy. the second goroutine will then return error is
nil
, and will send aSIGKILL
to the process which will then result in a hard exit of the process, potentially resulting in a corrupt state of the storage
pkg/node/node.go, line 710 at r3 (raw file): Previously, mrekucci (Peter Mrekaj) wrote…
Seems like atomic bool would be an overkill here. Simple mutex should be good enough. There is no performance required in this path. I have reverted the changes to before I used sync.Once. |
pkg/node/node.go, line 869 at r3 (raw file): Previously, acud (acud) wrote…
Done. Reverted use of sync.Once. Using the previous approach now. This seems way easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @acud from 2 discussions.
Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @acud, @mrekucci, and @zelig)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aloknerurkar)
pkg/node/node.go, line 870 at r4 (raw file):
// kill here if errors.Is(err, ErrShutdownInProgress) { return nil
I think you should still return the error here, why swallow it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aloknerurkar)
pkg/node/node.go, line 874 at r4 (raw file):
return err } ps, err := os.FindProcess(syscall.Getpid())
I wonder, if this goroutine asked for the shutdown, but then an error occurred during shutdown (not ErrShutdownInProgress
), does this mean that the process will hang forever? (since this line is never reached)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @acud from a discussion.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @aloknerurkar)
pkg/node/node.go, line 870 at r4 (raw file):
Previously, acud (acud) wrote…
I think you should still return the error here, why swallow it?
Done.
I added this to handle the specific case of duplicate shutdown from the postage listener. I did not want to unnecessarily log this as an error. But now I think it will be better to return this and let the caller handle it.
pkg/node/node.go, line 874 at r4 (raw file):
Previously, acud (acud) wrote…
I wonder, if this goroutine asked for the shutdown, but then an error occurred during shutdown (not
ErrShutdownInProgress
), does this mean that the process will hang forever? (since this line is never reached)
This is trying to kill the start process. If we fail for some reason, a force kill (Ctrl-C twice) will still kill the original process which is waiting for OS signals.
Reverted sync.Once changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r5.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @aloknerurkar)
Why not just use SIGINT, and We cannot send SIGKILL to init process (pid 1) (when we run in docker, bee is init process) @aloknerurkar
|
@firesWu This doesnt work on windows. Need to see if this is that big of an issue that we include platform-based handling. |
This change is