Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fleet Server should not crash on startup if connection to Elasticsearch timeouts #2683

Closed
jsoriano opened this issue Jun 9, 2023 · 0 comments · Fixed by #2693
Closed

Fleet Server should not crash on startup if connection to Elasticsearch timeouts #2683

jsoriano opened this issue Jun 9, 2023 · 0 comments · Fixed by #2693
Assignees

Comments

@jsoriano
Copy link
Member

jsoriano commented Jun 9, 2023

Fleet Server should not crash on startup if connection to Elasticsearch timeouts, at least when running in standalone mode.
In Kubernetes deployments this leads to CrashLoopBackOffs, what is unexpected.

When this happens Fleet Server should report to be in unhealthy state, and continue retrying, probably with some kind of exponential backoff in case Elasticsearch is overloaded.

Other issues like unknown hosts and so on should still make Fleet Server to fail because they use to indicate misconfigurations.

This is logged when this happens, it shouldn't print the usage information because is not an error related to usage.

{"log.level":"error","ecs.version":"1.6.0","service.name":"fleet-server","cluster.addr":["https://xxxxxxxx:9200"],"cluster.maxConnsPersHost":128,"error.message":"dial tcp X.X.X.X:9200: i/o timeout","@timestamp":"2023-06-09T09:47:30.286Z","message":"fail elasticsearch info"}
{"ecs.version":"1.6.0","service.name":"fleet-server","log.level":"info","log.logger":"fleet-metrics.api","message":"Stats endpoint (127.0.0.1:5066) finished: accept tcp 127.0.0.1:5066: use of closed network connection","@timestamp":"2023-06-09T09:47:30.286Z"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","state":"FAILED","@timestamp":"2023-06-09T09:47:30.286Z","message":"Error - dial tcp X.X.X.X:9200: i/o timeout"}
{"log.level":"error","ecs.version":"1.6.0","service.name":"fleet-server","error.message":"dial tcp X.X.X.X:9200: i/o timeout","@timestamp":"2023-06-09T09:47:30.286Z","message":"Fleet Server failed"}
{"log.level":"error","ecs.version":"1.6.0","service.name":"fleet-server","error.message":"dial tcp X.X.X.X:9200: i/o timeout","@timestamp":"2023-06-09T09:47:30.286Z","message":"Exiting"}
Error: dial tcp X.X.X.X:9200: i/o timeout
Usage:
  fleet-server [flags]

Flags:
  -E, --E setting=value   Overwrite configuration value
      --agent-mode        Running under execution of the Elastic Agent
  -c, --config string     Configuration for Fleet Server (default "fleet-server.yml")
  -h, --help              help for fleet-server

dial tcp 10.253.233.41:9200: i/o timeout
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant