Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fleet Server returns 500 on serverless causing elastic-agent to go offline #2852

Closed
olegsu opened this issue Jul 30, 2023 · 3 comments · Fixed by #3235
Closed

Fleet Server returns 500 on serverless causing elastic-agent to go offline #2852

olegsu opened this issue Jul 30, 2023 · 3 comments · Fixed by #3235
Assignees
Labels
bug Something isn't working Team:Fleet Label for the Fleet team

Comments

@olegsu
Copy link
Contributor

olegsu commented Jul 30, 2023

Steps to reproduce

  • Deploy serverless project
  • Enroll elastic agent
  • Follow the fleet server logs

I noticed that fleet server returns 500 sporadically, causing the elastic agent to go offline.
More logs we can find the monitoring cluster

image

Related

@cmacknz cmacknz added the Team:Fleet Label for the Fleet team label Jul 31, 2023
@olegsu olegsu added the bug Something isn't working label Aug 2, 2023
@michel-laterman
Copy link
Contributor

I'm not sure what we can do about this.

It looks like fleet-server had an issue querying ES and got a 500, fleet-server returned a 500 as a result, we can try to return a 503 to indicate it's a temporary failure.
If ES is failing there is not a lot we can do, if ES rejects queries fleet-server will not be able to process checkins so after a little bit the UI will show agents as offline.

@kpollich what do you think?

@kpollich
Copy link
Member

Yeah there's not really anything we can do when Elasticsearch returns a 500. Responding a 503 in this instance is probably the best we can do, but from Fleet Server's perspective a core dependent service (Elasticsearch) is unreachable at the time of this request. A better path forward might be determining why Elasticsearch is throwing a shard error and a 500 in this particular instance.

@olegsu
Copy link
Contributor Author

olegsu commented Jan 25, 2024

Thank you for the updates @michel-laterman, @kpollich
We workaround this by adding init-container to wait for fleet ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Fleet Label for the Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants