Fleet Server should not crash on startup if connection to Elasticsearch timeouts #2683

jsoriano · 2023-06-09T10:20:21Z

Fleet Server should not crash on startup if connection to Elasticsearch timeouts, at least when running in standalone mode.
In Kubernetes deployments this leads to CrashLoopBackOffs, what is unexpected.

When this happens Fleet Server should report to be in unhealthy state, and continue retrying, probably with some kind of exponential backoff in case Elasticsearch is overloaded.

Other issues like unknown hosts and so on should still make Fleet Server to fail because they use to indicate misconfigurations.

This is logged when this happens, it shouldn't print the usage information because is not an error related to usage.

{"log.level":"error","ecs.version":"1.6.0","service.name":"fleet-server","cluster.addr":["https://xxxxxxxx:9200"],"cluster.maxConnsPersHost":128,"error.message":"dial tcp X.X.X.X:9200: i/o timeout","@timestamp":"2023-06-09T09:47:30.286Z","message":"fail elasticsearch info"}
{"ecs.version":"1.6.0","service.name":"fleet-server","log.level":"info","log.logger":"fleet-metrics.api","message":"Stats endpoint (127.0.0.1:5066) finished: accept tcp 127.0.0.1:5066: use of closed network connection","@timestamp":"2023-06-09T09:47:30.286Z"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","state":"FAILED","@timestamp":"2023-06-09T09:47:30.286Z","message":"Error - dial tcp X.X.X.X:9200: i/o timeout"}
{"log.level":"error","ecs.version":"1.6.0","service.name":"fleet-server","error.message":"dial tcp X.X.X.X:9200: i/o timeout","@timestamp":"2023-06-09T09:47:30.286Z","message":"Fleet Server failed"}
{"log.level":"error","ecs.version":"1.6.0","service.name":"fleet-server","error.message":"dial tcp X.X.X.X:9200: i/o timeout","@timestamp":"2023-06-09T09:47:30.286Z","message":"Exiting"}
Error: dial tcp X.X.X.X:9200: i/o timeout
Usage:
  fleet-server [flags]

Flags:
  -E, --E setting=value   Overwrite configuration value
      --agent-mode        Running under execution of the Elastic Agent
  -c, --config string     Configuration for Fleet Server (default "fleet-server.yml")
  -h, --help              help for fleet-server

dial tcp 10.253.233.41:9200: i/o timeout

The text was updated successfully, but these errors were encountered:

jlind23 assigned jsoriano Jun 12, 2023

michel-laterman mentioned this issue Jun 12, 2023

[SLES15]: Fleet-server Agent gets into offline state on machine reboot. #2431

Closed

jsoriano mentioned this issue Jun 13, 2023

Avoid crashing on startup if Elasticsearch is not available #2693

Merged

8 tasks

jsoriano closed this as completed in #2693 Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fleet Server should not crash on startup if connection to Elasticsearch timeouts #2683

Fleet Server should not crash on startup if connection to Elasticsearch timeouts #2683

jsoriano commented Jun 9, 2023 •

edited

Fleet Server should not crash on startup if connection to Elasticsearch timeouts #2683

Fleet Server should not crash on startup if connection to Elasticsearch timeouts #2683

Comments

jsoriano commented Jun 9, 2023 • edited

jsoriano commented Jun 9, 2023 •

edited