Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto retire with shutdown on Windows #505

Merged
merged 2 commits into from
Jul 25, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions wix/wrapper/wrapper_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,13 @@ type handler struct {
// normal log: 2017/01/24 14:14:27 INFO <main> Starting mackerel-agent version:0.36.0
var logRe = regexp.MustCompile(`^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} (?:\S+\.go:\d+: )?([A-Z]+) `)

func (h *handler) retire() error {
dir := execdir()
cmd := exec.Command(filepath.Join(dir, "mackerel-agent.exe"), "retire", "--force")
cmd.Dir = dir
return cmd.Run()
}

func (h *handler) start() error {
procAllocConsole.Call()
dir := execdir()
Expand Down Expand Up @@ -214,6 +221,11 @@ func (h *handler) stop() error {
return nil
}

func autoRetire() bool {
env := os.Getenv("MACKEREL_AUTO_RETIREMENT")
return env != "" && env != "0"
}

// implement https://godoc.org/golang.org/x/sys/windows/svc#Handler
func (h *handler) Execute(args []string, r <-chan svc.ChangeRequest, s chan<- svc.Status) (svcSpecificEC bool, exitCode uint32) {
s <- svc.Status{State: svc.StartPending}
Expand Down Expand Up @@ -251,6 +263,13 @@ L:
if err := h.stop(); err != nil {
h.elog.Error(stopEid, err.Error())
s <- svc.Status{State: svc.Running, Accepts: svc.AcceptStop | svc.AcceptShutdown}
} else {
if req.Cmd == svc.Shutdown && autoRetire() {
if err := h.retire(); err != nil {
h.elog.Error(stopEid, err.Error())
s <- svc.Status{State: svc.Running, Accepts: svc.AcceptShutdown | svc.AcceptShutdown}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

svc.AcceptShutdown | svc.AcceptShutdown can be replaced with single svc.AcceptShutdown ? (Or typo for svc.AcceptStop | svc.AcceptShutdown ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks your reviewing this. The duplicates must be removed. Will do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since in this case mackerel-agent itself has been stopped successfully, status reported here should not be svc.Running?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If mackerel-agent did not accept the shutdown command unintentionally, how should behave?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, This part report status of current condition. If this return Running, service manager will send svc.Shutdown again in later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If mackerel-agent did not accept the shutdown command unintentionally, how should behave?

It's better to retry retiring, but should not block the shutdown process of Windows too long.

If this return Running, service manager will send svc.Shutdown again in later.

I didn't know that. Thank you to inform.

Overall it looks good to set svc.Running to tell service manager to send svc.Shutdown again, but I'm wondering what happens when Windows shutdown is cancelled during this retry.
If Windows shutdown might be cancelled, mackerel-agent service will be left as "Running", but actually mackerel-agent is not running anymore. If that may occurs we should avoid, but if not it looks safe to set svc.Running. Do you know whether such situation may happen or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows service manager has a fixed timeout value (20 seconds) to wait for shutdown. When it goes over 20 seconds, the OS forcibly terminates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it goes over 20 seconds, the OS forcibly terminates.

Understand. Thank you for explanation!

From my comment #505 (comment):

If Windows shutdown might be cancelled, mackerel-agent service will be left as "Running", but actually mackerel-agent is not running anymore. If that may occurs we should avoid, but if not it looks safe to set svc.Running. Do you know whether such situation may happen or not?

What do you think about this? I imagine following process may happen:

  1. Shutdown started
  2. mackerel-agent service tries to stop, and…
    • mackerel-agent itself stopped
    • mackerel-agent retire failed
  3. Shutdown cancelled by some reason
    • shutdown -a, from Task Manager, or something else

In this case, will mackerel-agent service be "running" but agent is stopped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there seems to be more falls possibles.

  1. timeout while stopping agent
  2. failed to spawn mackerel-agent retire -force
  3. shutdown was canceled (as you mentioned)
  4. failed to send request of retire

I would like to make these be same as UNIX's one as possible. h.stop wait 10 seconds until terminate of mackerel-agent. If timed out, mackerel-agent will be terminated forcibly. So 1 should not be considered. 2 also not.

When failed to spawn mackerel-agent retire -force or the command exited with non-zero value, systemctl return error. I'm thinking this is same behavior as UNIX's one.

The case that shutdown is canceled (3), the session can only be used during the time-out period. So not a really cancelation. So I'm thinking terminating mackerel-agent is right thinkg.

4 possibly be occured on some environment. But we don't have idea for recover.

Either way, mackerel-agent will be detected errors by server caused by regularly posts stopped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, understand.

}
}
}
}
case <-exit:
Expand Down