Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send status code 500 if something is unhealthy #19

Merged
merged 3 commits into from
Jan 2, 2017
Merged

Send status code 500 if something is unhealthy #19

merged 3 commits into from
Jan 2, 2017

Conversation

schneidexe
Copy link
Contributor

@schneidexe schneidexe commented Oct 7, 2016

Simple http health check usually consider response status codes. 200 is interpreted as healthy. So we should send a non-200 status in case any of nixy's health check report unhealthy. With this PR it will set the response status code of /v1/health to 500 in such a case.

@martensson
Copy link
Owner

Sorry for the delay to answer @schneidexe, had a bit busy week.
But I am a bit divided on how this is best solved. On one part if a Marathon backend is sick its not Nixy that is having a server error. That is partly why Nixy responds with a json object for external monitoring systems such as Nagios to parse and handle the errors. HTTP Status codes can also not show a difference between a "warning" (one backend is sick) and a "critical error" (all of them are down).

See https://github.com/martensson/nixy/blob/master/check_nixy as a reference.

A compromise could maybe be to respond 500 if all backends are unhealthy and/or template/config is broken. In our case we often take down single backends for patching/upgrading, but the service is still running without problems.

Tell me what you think :)

@schneidexe
Copy link
Contributor Author

schneidexe commented Oct 14, 2016

Good point. I did not think that far since I was originally running into this issue when it was sending a 200 for config or reload error which is definitely a nixy issue.

Your arguments re. WARN/CRIT are absolutely valid and reasonable. I would not start to distinguish between WARN and CRIT via different response codes now, since this might become confusing and intransparent. To do this checking the JSON is the better option.

So your proposed compromise would be best solution IMHO: send 500 for config or reload errors or if ALL marathon backends are down. If you're fine with that, I'm happy to implement it that way and update the PR.

@martensson
Copy link
Owner

That sounds good to me, feel free to send the PR! 👍

@schneidexe
Copy link
Contributor Author

schneidexe commented Dec 22, 2016

Hi @martensson. Had some busy weeks and forgot somehow about this one. I implemented the health-check now as we discussed (see last commit in this PR). So it will only send a 500 if all marathon endpoints are unhealthy

@schneidexe
Copy link
Contributor Author

Still had some mistake in the code (when first backend in the list was unhealthy), fixed it now ;)

@martensson martensson merged commit b6a490d into martensson:master Jan 2, 2017
@martensson
Copy link
Owner

Thanks a lot @schneidexe! Was on holidays last week, but will merge it now 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants