Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto retire with shutdown on Windows #505

Merged
merged 2 commits into from
Jul 25, 2018
Merged

Conversation

mattn
Copy link
Contributor

@mattn mattn commented Jul 13, 2018

On UNIX, mackerel-agent make the host retired when AUTO_RETIREMENT=1. This change do the same on Windows with shutdown if MACKEREL_AUTO_RETIREMENT Is set 1. You need to set MACKEREL_AUTO_RETIREMENT
, you need it on system property of my computer. This is useful to avandon the VM host on EC2.

Do retire with shutdown if $MACKEREL_AUTORETIREMENT is set.
if req.Cmd == svc.Shutdown && autoRetire() {
if err := h.retire(); err != nil {
h.elog.Error(stopEid, err.Error())
s <- svc.Status{State: svc.Running, Accepts: svc.AcceptShutdown | svc.AcceptShutdown}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

svc.AcceptShutdown | svc.AcceptShutdown can be replaced with single svc.AcceptShutdown ? (Or typo for svc.AcceptStop | svc.AcceptShutdown ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks your reviewing this. The duplicates must be removed. Will do

if req.Cmd == svc.Shutdown && autoRetire() {
if err := h.retire(); err != nil {
h.elog.Error(stopEid, err.Error())
s <- svc.Status{State: svc.Running, Accepts: svc.AcceptShutdown | svc.AcceptShutdown}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since in this case mackerel-agent itself has been stopped successfully, status reported here should not be svc.Running?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If mackerel-agent did not accept the shutdown command unintentionally, how should behave?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, This part report status of current condition. If this return Running, service manager will send svc.Shutdown again in later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If mackerel-agent did not accept the shutdown command unintentionally, how should behave?

It's better to retry retiring, but should not block the shutdown process of Windows too long.

If this return Running, service manager will send svc.Shutdown again in later.

I didn't know that. Thank you to inform.

Overall it looks good to set svc.Running to tell service manager to send svc.Shutdown again, but I'm wondering what happens when Windows shutdown is cancelled during this retry.
If Windows shutdown might be cancelled, mackerel-agent service will be left as "Running", but actually mackerel-agent is not running anymore. If that may occurs we should avoid, but if not it looks safe to set svc.Running. Do you know whether such situation may happen or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows service manager has a fixed timeout value (20 seconds) to wait for shutdown. When it goes over 20 seconds, the OS forcibly terminates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it goes over 20 seconds, the OS forcibly terminates.

Understand. Thank you for explanation!

From my comment #505 (comment):

If Windows shutdown might be cancelled, mackerel-agent service will be left as "Running", but actually mackerel-agent is not running anymore. If that may occurs we should avoid, but if not it looks safe to set svc.Running. Do you know whether such situation may happen or not?

What do you think about this? I imagine following process may happen:

  1. Shutdown started
  2. mackerel-agent service tries to stop, and…
    • mackerel-agent itself stopped
    • mackerel-agent retire failed
  3. Shutdown cancelled by some reason
    • shutdown -a, from Task Manager, or something else

In this case, will mackerel-agent service be "running" but agent is stopped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there seems to be more falls possibles.

  1. timeout while stopping agent
  2. failed to spawn mackerel-agent retire -force
  3. shutdown was canceled (as you mentioned)
  4. failed to send request of retire

I would like to make these be same as UNIX's one as possible. h.stop wait 10 seconds until terminate of mackerel-agent. If timed out, mackerel-agent will be terminated forcibly. So 1 should not be considered. 2 also not.

When failed to spawn mackerel-agent retire -force or the command exited with non-zero value, systemctl return error. I'm thinking this is same behavior as UNIX's one.

The case that shutdown is canceled (3), the session can only be used during the time-out period. So not a really cancelation. So I'm thinking terminating mackerel-agent is right thinkg.

4 possibly be occured on some environment. But we don't have idea for recover.

Either way, mackerel-agent will be detected errors by server caused by regularly posts stopped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, understand.

@daiksy daiksy merged commit c667dc2 into mackerelio:master Jul 25, 2018
@mattn mattn deleted the auto-retire branch July 25, 2018 03:09
@susisu susisu mentioned this pull request Jul 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants