Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: shutdown only once #1912

Merged
merged 5 commits into from
Jun 4, 2021
Merged

fix: shutdown only once #1912

merged 5 commits into from
Jun 4, 2021

Conversation

aloknerurkar
Copy link
Contributor

@aloknerurkar aloknerurkar commented May 28, 2021

This change is Reviewable

Copy link
Contributor

@mrekucci mrekucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @acud and @esadakar)

@@ -698,6 +701,15 @@ func NewBee(addr string, swarmAddress swarm.Address, publicKey ecdsa.PublicKey,
func (b *Bee) Shutdown(ctx context.Context) error {
var mErr error

// if a shutdown is already in process, return here
b.shutdownMutex.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use sync.Once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is the handling in pidKiller. We need to know if the shutdown is already in progress so we don't kill the process. So if the Shutdown call triggered this from the postage listener we don't do anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could still use sync.Once :)

@Eknir
Copy link
Contributor

Eknir commented May 28, 2021

solves #1909

@aloknerurkar aloknerurkar added the ready for review The PR is ready to be reviewed label May 31, 2021
Copy link
Contributor

@mrekucci mrekucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @aloknerurkar and @mrekucci)


pkg/node/node.go, line 715 at r1 (raw file):

	// tryClose is a convenient closure which decrease
	// repetitive io.Closer tryClose procedure.
	tryClose := func(c io.Closer, errMsg string) {

This can be factored out at relocated before the Do call.


pkg/node/node.go, line 106 at r2 (raw file):

	listenerCloser           io.Closer
	postageServiceCloser     io.Closer
	shutdownInProgress       bool

You don't need this one anymore.


pkg/node/node.go, line 107 at r2 (raw file):

	postageServiceCloser     io.Closer
	shutdownInProgress       bool
	shutdowner               sync.Once

By convention, should called shutdownOnce or so.

@mrekucci mrekucci self-requested a review June 1, 2021 13:22
@aloknerurkar
Copy link
Contributor Author


pkg/node/node.go, line 106 at r2 (raw file):

Previously, mrekucci (Peter Mrekaj) wrote…

You don't need this one anymore.

We do actually. This got deleted during the rebase but we need to check for this flag in Shutdown and return an error to prevent force kill.

@aloknerurkar
Copy link
Contributor Author


pkg/node/node.go, line 705 at r1 (raw file):

Previously, acud (acud) wrote…

you could still use sync.Once :)

Done.

@aloknerurkar aloknerurkar requested a review from acud June 1, 2021 14:58
Copy link
Contributor

@mrekucci mrekucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @acud)


pkg/node/node.go, line 106 at r2 (raw file):

Previously, aloknerurkar (Alok Nerurkar) wrote…

We do actually. This got deleted during the rebase but we need to check for this flag in Shutdown and return an error to prevent force kill.

Ok, because I don't see it used the way you've described. This variable is just set in the Do method call but never checked.

Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @aloknerurkar)


pkg/node/node.go, line 710 at r3 (raw file):

	var mErr error

	if b.shutdownInProgress {

I don't see why this is bool check is needed, this is why we use sync.Once here. Plus, you have no assurances against two goroutines calling shutdown and both passing this stage of the code. Since there's no mutex to protect the bool, both goroutines might end up calling the sync.Once. Obviously it will execute only once due to its atomicity, but it does not eliminate the data race.


pkg/node/node.go, line 869 at r3 (raw file):

		// we need to see if a shutdown is already in progress, so we dont do a force
		// kill here
		if errors.Is(err, ErrShutdownInProgress) {

if the error is swallowed, I'm not sure it is really useful. I think this can be simplified a bit

Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @aloknerurkar)

@aloknerurkar
Copy link
Contributor Author


pkg/node/node.go, line 869 at r3 (raw file):

Previously, acud (acud) wrote…

if the error is swallowed, I'm not sure it is really useful. I think this can be simplified a bit

We need to know if a Shutdown was already triggered to prevent doing a pid Kill. I think sync.Once is not the right thing to use here. The simpler mutex check was better.

How are you thinking of simplifying this?

@aloknerurkar
Copy link
Contributor Author


pkg/node/node.go, line 710 at r3 (raw file):

Previously, acud (acud) wrote…

I don't see why this is bool check is needed, this is why we use sync.Once here. Plus, you have no assurances against two goroutines calling shutdown and both passing this stage of the code. Since there's no mutex to protect the bool, both goroutines might end up calling the sync.Once. Obviously it will execute only once due to its atomicity, but it does not eliminate the data race.

Please check the other comment. I think it's better to revert to before I used sync.Once. We need to explicitly know if Shutdown was triggered or not. sync.Once is not of use here.

@mrekucci mrekucci requested review from acud and zelig June 2, 2021 10:57
Copy link
Contributor

@mrekucci mrekucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @acud, @mrekucci, and @zelig)


pkg/node/node.go, line 710 at r3 (raw file):

Previously, aloknerurkar (Alok Nerurkar) wrote…

Please check the other comment. I think it's better to revert to before I used sync.Once. We need to explicitly know if Shutdown was triggered or not. sync.Once is not of use here.

If you want to avoid mutexes and you don't find sync.Once to be a good fit for this case, then consider using atomic.Bool.

@mrekucci mrekucci self-requested a review June 2, 2021 10:57
Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @acud, @aloknerurkar, @mrekucci, and @zelig)


pkg/node/node.go, line 869 at r3 (raw file):

Previously, aloknerurkar (Alok Nerurkar) wrote…

We need to know if a Shutdown was already triggered to prevent doing a pid Kill. I think sync.Once is not the right thing to use here. The simpler mutex check was better.

How are you thinking of simplifying this?

You're using an error to swallow an error here. A few thoughts:

  • on the first call to sync.Once you get the actual error to be returned (mErr), the second call returns an error which is swallowed here, so maybe an error is not the right thing to return. perhaps a bool indicating shutdown in progress? in case you choose to return just an error from Shutdown, why not propagate it to the caller?
  • this implementation still does not guard the possibility that two goroutines execute shutdown at the same time. both will end up calling the shutdown sequence and the bool would still be racy. the second goroutine will then return error is nil, and will send a SIGKILL to the process which will then result in a hard exit of the process, potentially resulting in a corrupt state of the storage

@aloknerurkar
Copy link
Contributor Author


pkg/node/node.go, line 710 at r3 (raw file):

Previously, mrekucci (Peter Mrekaj) wrote…

If you want to avoid mutexes and you don't find sync.Once to be a good fit for this case, then consider using atomic.Bool.

Seems like atomic bool would be an overkill here. Simple mutex should be good enough. There is no performance required in this path. I have reverted the changes to before I used sync.Once.

@aloknerurkar
Copy link
Contributor Author


pkg/node/node.go, line 869 at r3 (raw file):

Previously, acud (acud) wrote…

You're using an error to swallow an error here. A few thoughts:

  • on the first call to sync.Once you get the actual error to be returned (mErr), the second call returns an error which is swallowed here, so maybe an error is not the right thing to return. perhaps a bool indicating shutdown in progress? in case you choose to return just an error from Shutdown, why not propagate it to the caller?
  • this implementation still does not guard the possibility that two goroutines execute shutdown at the same time. both will end up calling the shutdown sequence and the bool would still be racy. the second goroutine will then return error is nil, and will send a SIGKILL to the process which will then result in a hard exit of the process, potentially resulting in a corrupt state of the storage

Done.

Reverted use of sync.Once. Using the previous approach now. This seems way easier.

Copy link
Contributor Author

@aloknerurkar aloknerurkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @acud from 2 discussions.
Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @acud, @mrekucci, and @zelig)

acud
acud previously requested changes Jun 4, 2021
Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aloknerurkar)


pkg/node/node.go, line 870 at r4 (raw file):

		// kill here
		if errors.Is(err, ErrShutdownInProgress) {
			return nil

I think you should still return the error here, why swallow it?

Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aloknerurkar)


pkg/node/node.go, line 874 at r4 (raw file):

		return err
	}
	ps, err := os.FindProcess(syscall.Getpid())

I wonder, if this goroutine asked for the shutdown, but then an error occurred during shutdown (not ErrShutdownInProgress), does this mean that the process will hang forever? (since this line is never reached)

Copy link
Contributor Author

@aloknerurkar aloknerurkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @acud from a discussion.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @aloknerurkar)


pkg/node/node.go, line 870 at r4 (raw file):

Previously, acud (acud) wrote…

I think you should still return the error here, why swallow it?

Done.

I added this to handle the specific case of duplicate shutdown from the postage listener. I did not want to unnecessarily log this as an error. But now I think it will be better to return this and let the caller handle it.


pkg/node/node.go, line 874 at r4 (raw file):

Previously, acud (acud) wrote…

I wonder, if this goroutine asked for the shutdown, but then an error occurred during shutdown (not ErrShutdownInProgress), does this mean that the process will hang forever? (since this line is never reached)

This is trying to kill the start process. If we fail for some reason, a force kill (Ctrl-C twice) will still kill the original process which is waiting for OS signals.

@aloknerurkar aloknerurkar requested a review from acud June 4, 2021 12:29
@aloknerurkar aloknerurkar dismissed stale reviews from mrekucci and acud June 4, 2021 12:30

Reverted sync.Once changes

Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @aloknerurkar)

@aloknerurkar aloknerurkar merged commit ad201cd into master Jun 4, 2021
@aloknerurkar aloknerurkar deleted the shutdown.0 branch June 4, 2021 14:08
@firesWu
Copy link

firesWu commented Jun 4, 2021

Why not just use SIGINT, and We cannot send SIGKILL to init process (pid 1) (when we run in docker, bee is init process) @aloknerurkar
example:

func (p *pidKiller) Shutdown(ctx context.Context) error {
	ps, err := os.FindProcess(syscall.Getpid())
	if err != nil {
		return err
	}
	return ps.Signal(os.Interrupt)
}

@aloknerurkar
Copy link
Contributor Author

@firesWu
From the godoc https://pkg.go.dev/os#Process.Signal

This doesnt work on windows. Need to see if this is that big of an issue that we include platform-based handling.
@acud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pull-request ready for review The PR is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants