Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Agent service fails to start on windows with default installation #26213

Closed
alvarolobato opened this issue Jun 8, 2021 · 22 comments · Fixed by #26665
Closed

Elastic Agent service fails to start on windows with default installation #26213

alvarolobato opened this issue Jun 8, 2021 · 22 comments · Fixed by #26665
Assignees
Labels
Agent Team:Elastic-Agent Label for the Agent team

Comments

@alvarolobato
Copy link

For confirmed bugs, please report:

  • Version: 7.13
  • Operating System: Windows Server 2008 R2 Standard and Windows 10
  • Steps to Reproduce:

These are the steps I followed in Windows 10:

  • Download the agent
  • Run the enroll command
  • The agent will enroll successfully and stay in an "updating" state, it won't move from there
C:\Users\XXXX\Downloads\elastic-agent-7.13.0-windows-x86_64>.\elastic-agent.exe install -f --url=https://6b744b5715d84795bd1e89dc17bbe068.fleet.eu-west-1.aws.found.io:443 --enrollment-token=TUVqNXRIa0J4TENCejVWcXdqcG86VTZaYUVTdWpSX3ExbUFXeTlPT2IzZw==
The Elastic Agent is currently in BETA and should not be used in production

2021-05-29T05:04:41.693+0200    INFO    cmd/enroll_cmd.go:201   Elastic Agent might not be running; unable to trigger restart
2021-05-29T05:04:41.701+0200    INFO    cmd/enroll_cmd.go:203   Successfully triggered restart on running Elastic Agent.
Successfully enrolled the Elastic Agent.
Elastic Agent has been successfully installed.
  • If I go to services, the service has been installed but it's not running.
  • If I try to run it it will terminate unexpectedly with error 1067

There are no logs in the agent's folder with the exception of a new elastic-agent- file which is created every time I try to start the service, that only contains this line:

 2021-05-29T05:10:45.146+0200	ERROR	cmd/watch.go:61	failed to load markeropen C:\Program Files\Elastic\Agent\data\.update-marker: El sistema no puede encontrar el archivo especificado.

If I run the agent from the command line it works correctly and the status in Kibana changes to healthy

2021-05-29T05:14:47.263+0200	INFO	warn/warn.go:18	The Elastic Agent is currently in BETA and should not be used in production
2021-05-29T05:14:47.281+0200	INFO	application/application.go:68	Detecting execution mode
2021-05-29T05:14:47.287+0200	INFO	application/application.go:93	Agent is managed by Fleet
2021-05-29T05:14:47.289+0200	INFO	capabilities/capabilities.go:59	capabilities file not found in C:\Program Files\Elastic\Agent\capabilities.yml
2021-05-29T05:14:48.085+0200	INFO	[composable]	composable/controller.go:46	EXPERIMENTAL - Inputs with variables are currently experimental and should not be used in production
2021-05-29T05:14:48.225+0200	INFO	[composable.providers.docker]	docker/docker.go:43	Docker provider skipped, unable to connect: protocol not available
2021-05-29T05:14:48.233+0200	INFO	[api]	api/server.go:62	Starting stats endpoint
2021-05-29T05:14:48.233+0200	INFO	application/managed_mode.go:291	Agent is starting
2021-05-29T05:14:48.233+0200	INFO	[api]	api/server.go:64	Metrics endpoint listening on: \\.\pipe\elastic-agent (configured: npipe:///elastic-agent)
2021-05-29T05:14:48.345+0200	WARN	application/managed_mode.go:304	failed to ack update open C:\Program Files\Elastic\Agent\data\.update-marker: El sistema no puede encontrar el archivo especificado.
2021-05-29T05:14:50.299+0200	INFO	stateresolver/stateresolver.go:48	New State ID is VoHXoJOY
2021-05-29T05:14:50.299+0200	INFO	stateresolver/stateresolver.go:49	Converging state requires execution of 3 step(s)
2021-05-29T05:14:55.086+0200	INFO	cmd/run.go:189	Shutting down Elastic Agent and sending last events...
2021-05-29T05:14:55.086+0200	INFO	operation/operator.go:191	waiting for installer of pipeline 'default' to finish
2021-05-29T05:15:03.393+0200	ERROR	status/reporter.go:236	Elastic Agent status changed to: 'error'
2021-05-29T05:15:03.393+0200	ERROR	fleet/fleet_gateway.go:180	failed to dispatch actions, error: operator: failed to execute step sc-run, error: context canceled: context canceled
2021-05-29T05:15:03.393+0200	INFO	process/app.go:176	Signaling application to stop because of shutdown: filebeat--7.13.0
2021-05-29T05:15:03.405+0200	INFO	application/managed_mode.go:320	Agent is stopped
2021-05-29T05:15:03.406+0200	INFO	cmd/run.go:197	Shutting down completed.
2021-05-29T05:15:03.415+0200	INFO	[api]	api/server.go:66	Stats endpoint (\\.\pipe\elastic-agent) finished: use of closed network connection

I found a WA which is going to the service, click on properties and instead of using the local account, add the credentials for the Administrator user, the service will start and work as expected.

@alvarolobato alvarolobato added Agent Team:Elastic-Agent Label for the Agent team labels Jun 8, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@blakerouse
Copy link
Contributor

error 1067 is reported when the process terminates unexpectedly

How are you performing the run of the Elastic Agent that did work? From inside the Program Files directory?

@alvarolobato
Copy link
Author

@blakerouse if I run it from inside program files directory it works, but also if I go to the service configuration and change the account to Administrator in session data.

@blakerouse
Copy link
Contributor

@alvarolobato change the account to Administrator in session data? confused by that comment? are you saying it works when you change the user that starts the service?

@alvarolobato
Copy link
Author

Yes, change the user that starts the service.

@blakerouse
Copy link
Contributor

@alvarolobato You changed it to "Administrator" and it worked, but running as "SYSTEM" did not? Just want to confirm?

@alvarolobato
Copy link
Author

alvarolobato commented Jun 14, 2021

@blakerouse Ah, sorry missed this, correct it didn't work running with the default settings the service is installed.

@alvarolobato
Copy link
Author

alvarolobato commented Jun 14, 2021

I've tried putting it back to local system account and restarting and it fails to start again, as originally.

@blakerouse
Copy link
Contributor

I just installed Elastic Agent 7.13.2 on Windows Server 2008 R2 Enterprise (only 2008 Windows server I could find), and after updating the root CA's (they are so old they are out of date) I was able to download Elastic Agent and install it.

Installation was successful and the service started as expected.

@blakerouse
Copy link
Contributor

@alvarolobato Can you attach the following output (the csv file) so I can review the permissions:

From Powershell (As Administrator):

Get-ChildItem 'C:\Program Files\Elastic\Agent' | get-acl | export-csv C:\agent-perms.csv

@cehaletx
Copy link

cehaletx commented Jun 16, 2021

out put the following:
Get login

  .\Administrator

Get acl

Directory: C:\Program Files\Elastic
Path Owner Access


Agent BUILTIN\Administrators NT SERVICE\TrustedInstaller Allow FullControl...

PS C:\AccessChk> ./accesschk.exe "NT AUTHORITY\SYSTEM" -q -d "C:\Program Files\Elastic\Agent"

Accesschk v6.13 - Reports effective permissions for securable objects
Copyright ⌐ 2006-2020 Mark Russinovich
Sysinternals - www.sysinternals.com

RW C:\Program Files\Elastic\Agent

PS C:\AccessChk> .\accesschk.exe -ucqv "Elastic Agent"

Accesschk v6.13 - Reports effective permissions for securable objects
Copyright ⌐ 2006-2020 Mark Russinovich
Sysinternals - www.sysinternals.com

Elastic Agent
Medium Mandatory Level (Default) [No-Write-Up]
RW NT AUTHORITY\SYSTEM
SERVICE_ALL_ACCESS
RW BUILTIN\Administrators
SERVICE_ALL_ACCESS
R NT AUTHORITY\INTERACTIVE
SERVICE_QUERY_STATUS
SERVICE_QUERY_CONFIG
SERVICE_INTERROGATE
SERVICE_ENUMERATE_DEPENDENTS
SERVICE_USER_DEFINED_CONTROL
READ_CONTROL
R NT AUTHORITY\SERVICE
SERVICE_QUERY_STATUS
SERVICE_QUERY_CONFIG
SERVICE_INTERROGATE
SERVICE_ENUMERATE_DEPENDENTS
SERVICE_USER_DEFINED_CONTROL
READ_CONTROL

@cehaletx
Copy link

cehaletx commented Jun 16, 2021

agent-perms.csv
Here's the output from lasdt request, and also note the Elastic SubDirectory has all the same permissions as C:\Program Files
@blakerouse

@alvarolobato
Copy link
Author

@cehaletx
Copy link

@michalpristas added some of the data you asked for, also tried running the agent in Debug from CLI here is the out put log, nothing special, Fleet shows agent as healthy, but no data still:
debug.txt

@elastic elastic deleted a comment from cehaletx Jun 16, 2021
@blakerouse
Copy link
Contributor

blakerouse commented Jun 17, 2021

In both cases it looks like the permissions are wrong for the following files:

  • elastic-agent.yml
  • fleet.yml
  • fleey.yml.old

Might though here is maybe you started Elastic Agent from the extracted directory then proceeded with installation and that is what caused the issue?

Can you try to perform the following to see if it fixes it:

rm elastic-agent.yml fleet.yml fleet.yml.old

Rename the backup elastic-agent.yml.$timestamp.yml to elastic-agent.yml.

Then re-run enroll elastic-agent.exe enroll ... from Kibana.

Do this all inside of the C:\Program Files\Elastic\Agent directory.

Then start the service from services.msc and see if it works now.

@alvarolobato
Copy link
Author

Thanks @blakerouse adding the system permissions to those files fixed it for me, without having to enroll again.

@alvarolobato
Copy link
Author

It stopped again, I'll do the full test you are asking, will come back here later.

@alvarolobato
Copy link
Author

@blakerouse I followed the steps above and same result. The two yamls lost the system permissions and the service fails to start.

Find the ACLs attached.

If I add the permissions to the files it still fails. I didn't get it to run before, it was just that it took a while to fail.
I deleted all the log files to capture new ones but no log files are generated when run from the service as SYSTEM, the folders are still there and with the right permissions.
I've sent you a process monitor log matching processes elastic and beat in case it helps.

agent-perms.csv

@alvarolobato
Copy link
Author

@blakerouse let me know if you need anything else from me

@blakerouse
Copy link
Contributor

@alvarolobato When you say it lost the permissions again, do you mean a re-install removed them? or just the service is running and its reseting the permissions?

@alvarolobato
Copy link
Author

alvarolobato commented Jun 28, 2021

@blakerouse the re-install removed the permissions.

@blakerouse
Copy link
Contributor

Okay I think that Elastic Agent needs to improve its acl setting to ensure that the SYSTEM user is placed on those files.

I think to make it even better Elastic Agent install process should recursively reset the permissions to all files under C:\Program Files\Elastic\Agent to ensure that SYSTEM and Administrators have the only permissions. Then that would solve this issue and other issuers where permissions are not correct on some files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Agent Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants