-
Notifications
You must be signed in to change notification settings - Fork 204
Refactor to optimise for Windows service manager checkin #9672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
05457bc to
a93cab0
Compare
|
This pull request is now in conflicts. Could you fix it? 🙏 |
Co-authored-by: Ben McNichols <mcnichols@gmail.com>
…c-agent into play/init-svc-start
|
This pull request is now in conflicts. Could you fix it? 🙏 |
| // couldNotConnect is the errno for ERROR_FAILED_SERVICE_CONTROLLER_CONNECT. | ||
| const couldNotConnect syscall.Errno = 1063 | ||
|
|
||
| type beatService struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code change: bringin in dependency
| // After this is run, the service is considered by the OS to be stopped. | ||
| // This must be the first deferred cleanup task (last to execute). | ||
| defer func() { | ||
| agentservice.NotifyTermination() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code change: calling internal implementation
| agentservice.WaitExecutionDone() | ||
| }() | ||
|
|
||
| service.BeforeRun() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are original
| // or more contributor license agreements. Licensed under the Elastic License 2.0; | ||
| // you may not use this file except in compliance with the Elastic License 2.0. | ||
|
|
||
| package agentrun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code change: run -> agentrun to help with alphabetical ordering during init
| "github.com/elastic/elastic-agent/testing/integration" | ||
| ) | ||
|
|
||
| func TestInitOrderNotDegraded(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code change: added test
not entirely happy with testing criteriabut at least it checks it is called during init and not at the end of it
|
This pull request is now in conflicts. Could you fix it? 🙏 |
|
💛 Build succeeded, but was flaky
Failed CI StepsHistory
|
|
This pull request is now in conflicts. Could you fix it? 🙏 |
|
@michalpristas do we still need to pursue the changes here? Or can we close? |
|
My preference would be to close and go with Blakes proposal as it is less fragile, more deterministic |


This PR is huge but it does just few things.
It does not use
elastic-agent-libsfor callingProcessWindowsControlEvents.ProcessWindowsControlEventscreates an objects that serves as an service instance and communicates with service manager. But having this in libs has an effect of runninginitsection of wholeelastic-agent-libsdependency tree before registering with a service.It splits
agent/cmdpackage into sub packages isolating each command. Service Manager communication is then moved outside tointernal/pkg/agent/agentserviceas described in first step. Having this split and service in a package deliberately named makes init section ofagentservicebeing called sooner during initialization.In
agentserviceinit service we addWaitGroupwhen spinning upProcessWindowsControlEventsgoroutine. This blocks loading of subsequent packages and avoids possibility of starving ourselves of resources (when new goroutines are started later, subprocesses...). We cannot guarantee ordering of goroutines and that this one will be up in time. This is best effort of achieving that.It's essential to have a package named like this because of the way how init section loading is implemented (alphabetical sorting plays a significant role)
Result of this is that we not just moved communication a way sooner to init section (normally end of init due to big dependency tree)
But also moved it before proceeding with initialization of
This makes separation of otel into custom binary not critical for issues related to windows service manager communication timeouts.
To make it easier I added comment with a
code changeas these are not that frequentRelated #4971