Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Agent] Agent fails to start when installed through proxy #27114

Closed
lthach2 opened this issue Jul 28, 2021 · 15 comments
Closed

[Elastic Agent] Agent fails to start when installed through proxy #27114

lthach2 opened this issue Jul 28, 2021 · 15 comments
Labels
Agent bug Team:Elastic-Agent Label for the Agent team

Comments

@lthach2
Copy link

lthach2 commented Jul 28, 2021

Attempting to install Elastic Agent via a proxy results in the agent service being installed, but it fails to start. This results in the host populating in the Fleet web UI but stuck on "Updating." When installing without a proxy, the agent service starts successfully and the host shows as "Healthy" in the Fleet web UI.

  • Elastic Version: 7.14.0-SNAPSHOT
  • Agent Version:
    Binary: 7.14.0 (build: 331b419 at 2021-07-22 02:24:06 +0000 UTC)
    Daemon: 7.14.0 (build: 331b419 at 2021-07-22 02:24:06 +0000 UTC)
  • Operating System: Windows 10
  • Steps to Reproduce:
  1. Setup a proxy server if one is not available (I setup a simple Squid proxy)

  2. Install the agent using the proxy-url flag and specify the proxy URL
    image

  3. Agent should successfully enroll and appear in the Fleet UI but show as "Updating"
    image
    image

  4. Check whether Elastic Agent service is running. It will show as Stopped.
    image

  5. Attempting to manually start the agent results in this error:
    PS C:\Program Files\Elastic\Agent> .\elastic-agent.exe restart Error: Failed trigger restart of daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing open \\\\.\\pipe\\elastic-agent-system: The system cannot find the file specified."
    image

The elastic-agent-json.log located in C:\Program Files\Elastic\Agent\data\elastic-agent-331b41\logs shows:

{"log.level":"info","@timestamp":"2021-07-28T14:18:52.631Z","log.origin":{"file.name":"application/application.go","file.line":66},"message":"Detecting execution mode","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:52.632Z","log.origin":{"file.name":"application/application.go","file.line":75},"message":"Agent is managed locally","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:52.633Z","log.origin":{"file.name":"capabilities/capabilities.go","file.line":59},"message":"capabilities file not found in C:\\Program Files\\Elastic\\Agent\\capabilities.yml","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:53.116Z","log.logger":"composable.providers.docker","log.origin":{"file.name":"docker/docker.go","file.line":43},"message":"Docker provider skipped, unable to connect: protocol not available","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:53.117Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":62},"message":"Starting stats endpoint","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:53.117Z","log.origin":{"file.name":"application/local_mode.go","file.line":168},"message":"Agent is starting","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:53.117Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":64},"message":"Metrics endpoint listening on: \\\\.\\pipe\\elastic-agent (configured: npipe:///elastic-agent)","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:53.123Z","log.origin":{"file.name":"application/local_mode.go","file.line":178},"message":"Agent is stopped","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:53.124Z","log.origin":{"file.name":"application/periodic.go","file.line":77},"message":"Configuration changes detected","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:53.133Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":48},"message":"New State ID is hHGK2Zze","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:53.133Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":49},"message":"Converging state requires execution of 2 step(s)","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:54.274Z","log.origin":{"file.name":"operation/operator.go","file.line":191},"message":"waiting for installer of pipeline 'default' to finish","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2021-07-28T14:18:56.778Z","log.origin":{"file.name":"status/reporter.go","file.line":236},"message":"Elastic Agent status changed to: 'error'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:56.778Z","log.origin":{"file.name":"process/app.go","file.line":181},"message":"Signaling application to stop because of shutdown: metricbeat--7.14.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:56.781Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":48},"message":"New State ID is hHGK2Zze","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:56.781Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":49},"message":"Converging state requires execution of 2 step(s)","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2021-07-28T14:18:56.782Z","log.origin":{"file.name":"emitter/controller.go","file.line":120},"message":"Failed to render configuration with latest context from composable controller: operator: failed to execute step sc-run, error: context canceled: context canceled","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:56.851Z","log.logger":"reexec","log.origin":{"file.name":"reexec/reexec_windows.go","file.line":35},"message":"Running as Windows service Elastic Agent; triggering service restart","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-07-28T14:18:56.858Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":66},"message":"Stats endpoint (\\\\.\\pipe\\elastic-agent) finished: use of closed network connection","ecs.version":"1.6.0"}
@lthach2 lthach2 added Agent Team:Elastic-Agent Label for the Agent team labels Jul 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@EricDavisX EricDavisX added the bug label Jul 28, 2021
@EricDavisX
Copy link
Contributor

@lthach2 I was going to ask, I believe you had cited a case where it WAS working, can you post the details and differences between the 2 scenarios here please?

@lthach2
Copy link
Author

lthach2 commented Jul 28, 2021

@EricDavisX Turns out I spoke too soon when initially reporting it was working. I assumed that since the install showed the agent was successfully enrolled, that everything was functioning, but turns out the agent service fails to start.

@lthach2
Copy link
Author

lthach2 commented Jul 28, 2021

Details about my test environment as it may require more testing in a fully on-prem environment:

  • Stack is cloud hosted so Fleet server is cloud hosted as well
  • Proxy is hosted in Endgame VMware lab with Internet access
  • Endpoint is also hosted in Endgame VMware lab

@EricDavisX
Copy link
Contributor

@urso @andresrc @ruflin any thoughts on first triage of this?

@lthach2
Copy link
Author

lthach2 commented Jul 29, 2021

Quick update after some additional testing on my end. I did another test in an on-prem environment and encountering the same issue with agent failing to start.

On-prem environment:

  • ES: 7.14.0
  • Agent: 7.14.0 (build: 331b419 at 2021-07-22 02:24:06 +0000 UTC)
  • Fleet server installed on separate endpoint
  • Proxy, fleet server, and endpoint are on the same network but I also tested with the endpoint on a different network

@urso
Copy link

urso commented Jul 29, 2021

In the logs you shared I don't see any error, only that Agent has been stopped. Can you archive and share the complete Agent folder?

It looks like enrollment did succeed. Agent write a fleet.yml file with fleet server connection details. We might want to check if that is correct.

@lthach2
Copy link
Author

lthach2 commented Jul 29, 2021

Hi @urso, sure thing. Below is the download link for the Agent folder from my test box.

https://upload.elastic.co/d/b70c1816893b40f1ed19dc6a300e0ec7b6956f6cfb948872eeb617787a39ef2f
Token: 909fcc333f10e707

@urso
Copy link

urso commented Jul 30, 2021

The authorization token seems to be wrong. I can't download your files.

@EricDavisX
Copy link
Contributor

It worked for me yesterday - I've posted them to a shared google drive you, I'll slack you details.

@urso
Copy link

urso commented Aug 2, 2021

I think I found the bug. After enrolling the Agent creates/serializes its configuration to fleet.yml. Agent expects proxy_url to be a string, but unfortunately the type url.URL gets serialized into an object:

  proxy_url:
    scheme: http
    opaque: ""
    user: null
    host: 10.1.4.118:3128
    path: ""
    rawpath: ""
    forcequery: false
    rawquery: ""
    fragment: ""
    rawfragment: ""

With this serialization bug, proxy support is currently broken. I created an issue: #27187

@EricDavisX
Copy link
Contributor

The fix was backported to 7.14.1 yesterday so we can wait a day to be sure the build has picked it up and we can re-test. Thanks @lthach2 . If we have questions about the next 'depth' we can test here, please do raise it to the team.

@lthach2
Copy link
Author

lthach2 commented Aug 11, 2021

Tested with the 7.14.1 agent and I'm now able to successfully install via proxy.

image

image

Looking at other parts of the test, where or how do we configure the Agent to connect to artifacts.elastic.co and Package Registry

image

@EricDavisX
Copy link
Contributor

@lthach2 thanks so much. I pinged the team via slack, and we can follow up more in the test issue, if this doesn't get more comment in the next day. In that case we can just close it, as the one known issue is fixed - and we'll continue testing, and log separate issues as we find them

@ruflin
Copy link
Member

ruflin commented Aug 17, 2021

Going to close this issue as the initial proxy problem seems to be resolved in the upcoming 7.14.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Agent bug Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

5 participants