status code definition #4

pmhsfelix · 2018-01-16T22:56:04Z

I don't fully agree with the status codes definition:

For “pass” and “warn” statuses HTTP response code in the 2xx - 3xx range MUST be used. for “fail” status HTTP response code in the 4xx - 5xx range MUST be used. In case of the “warn” status, additional information SHOULD be provided, utilizing optional fields of the response.

It seems strange to use a 4xx when the request is correct (e.g. well-formed and properly authenticated) and the health resource does exist.
A 5xx should be reserved for when the health resource itself is not operating correctly.
My initial reaction would be to always use a 200 when the status response correctly represents the state of the system, even if that state is fail. I know that it is common practice to use a 5xx status to represent a failure system status, however that information should be on the resource representation, via this media type, and not on the response status.

The text was updated successfully, but these errors were encountered:

dret · 2018-01-17T00:33:23Z

On 2018-01-16 14:56, Pedro Felix wrote: I don't fully agree with the status codes definition: For “pass” and “warn” statuses HTTP response code in the 2xx - 3xx range MUST be used. for “fail” status HTTP response code in the 4xx - 5xx range MUST be used. In case of the “warn” status, additional information SHOULD be provided, utilizing optional fields of the response. * It seems strange to use a |4xx| when the request is correct (e.g. well-formed and properly authenticated) and the health resource does exist.

yup. 4xx is for client errors specifically.

* A |5xx| should be reserved for when the health resource itself is not operating correctly.

yup. if the status resource is served properly and indicates that the API is failing, that's a 2xx. the only appropriate use for 5xx is when there is an internal problem *processing the request*. https://tools.ietf.org/html/rfc7231#section-6.6

* My initial reaction would be to always use a |200| when the status response correctly represents the state of the system, even if that state is |fail|. I know that it is common practice to use a 5xx status to represent a failure system status, however that information should be on the resource representation, via this media type, and not on the response status.

+1

inadarei · 2018-01-17T11:33:20Z

Great point and good catch!

What if we added "code" property to the spec object and it would be HTTP status code of the service. This way you could add more detail the the status than just "pass", "fail", "warn" but do it in a standard way?

Does that feel useful?

dret · 2018-01-18T01:53:46Z

On 2018-01-17 03:33, Irakli Nadareishvili wrote: What if we added "code" to the spec and it would be HTTP status code of the service. This way you could add more detail the the status than just "pass", "fail", "warn" but do it in a standard way?

#8 claims that it needs to be an HTTP-level status code. i cannot comment on that requirement, but if it is one, then instead of having "code" we'd need a "watchdog" link to a resource that returns the status at the HTTP level.

inadarei · 2018-01-20T07:29:57Z

@dret wouldn't that watchdog link have the same issue, however? If it responds with the HTTP code it would have to be http code of the watchdog URI endpoint and cannot be the code of the service health itself.

Or am I totally confused here?

dret · 2018-01-20T07:43:22Z

On 2018-01-19 23:29, Irakli Nadareishvili wrote: @dret <https://github.com/dret> wouldn't that watchdog link have the same issue, however? If it responds with the HTTP code it would have to be http code of the watchdog URI endpoint and cannot be the code of the service health itself. Or am I totally confused here?

you're technically right but at least the "watchdog" resource wouldn't have the same general problems of "but i did get that service status just easily, so why would it report a 5xx?" the watchdog resource could be defined to be one that "mirrors/represents" the status of the service, instead of describing it. i think then it would make more sense to serve 5xx if the service has internal problems. i am still not sure that this is a requirement that actually exists, but if it is, then the "watchdog resource" maybe the better compromise.

inadarei · 2018-01-29T04:55:28Z

So, I have thought a lot about this (since this is significantly related to: #8)

The original design was correct. /health is not a "separate resource" and this is not a regular API design exercise. There's very significant, established precedent here. As @danielbcorreia noted in #8 a lot of infrastructural tooling (load-balancers, Consul, etcd) expect the health check endpoint to represent the health of the component.

Basically, if http://api.example.com/authz is a security microservice and is exposes /authz/health endpoint the expectation is clearly that when /authz/health return 500, the entire microservice is down. This is how things already work and ignoring it would be ignoring the reality.

The health endpoint doesn't have its own "health". It is only a conduit of the component, simply because nobody cares about the health endpoint other than it indicating the health of what it is a conduit for.

I understand this is a bit odd from URI/HTTP perspective, but if we started inventing an alternate reality with wathcdogs etc. we would be documenting a world that we wish existed, instead of the world that actually exists. Which will make this spec not practical.

inadarei · 2018-01-29T05:10:05Z

PR with the proposed wording: 68cc850

dret · 2018-03-25T09:23:36Z

i guess if this is what people have been doing that's kind of ok-ish. still not really great design, but agreed that ignoring deployed reality also has merit.

danielbcorreia mentioned this issue Jan 17, 2018

resource for clients that ignore the response body #8

Closed

inadarei closed this as completed Mar 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

status code definition #4

status code definition #4

pmhsfelix commented Jan 16, 2018

dret commented Jan 17, 2018 via email

inadarei commented Jan 17, 2018 •

edited

dret commented Jan 18, 2018 via email

inadarei commented Jan 20, 2018

dret commented Jan 20, 2018 via email

inadarei commented Jan 29, 2018 •

edited

inadarei commented Jan 29, 2018 •

edited

dret commented Mar 25, 2018

status code definition #4

status code definition #4

Comments

pmhsfelix commented Jan 16, 2018

dret commented Jan 17, 2018 via email

inadarei commented Jan 17, 2018 • edited

dret commented Jan 18, 2018 via email

inadarei commented Jan 20, 2018

dret commented Jan 20, 2018 via email

inadarei commented Jan 29, 2018 • edited

inadarei commented Jan 29, 2018 • edited

dret commented Mar 25, 2018

inadarei commented Jan 17, 2018 •

edited

inadarei commented Jan 29, 2018 •

edited

inadarei commented Jan 29, 2018 •

edited