Define common server error codes in ErrorType.Code #70

jonsyu1 · 2018-02-16T18:21:51Z

The ErrorType.Code enum is overly restrictive. We only have access to 500 for server errors:

        FAILED_PRECONDITION(500),
        INTERNAL(500),
        TIMEOUT(500),
        CUSTOM_SERVER(500);

At a bare minimum, it would be useful to define 501-504.

The text was updated successfully, but these errors were encountered:

uschi2000 · 2018-02-16T22:07:53Z

503 is available via QosException. In what circumstances would you want to use 502 and 504?

jonsyu1 · 2018-02-19T18:44:43Z

Let's say a proxy server is responsible for translating queries and dispatching requests to the appropriate backend. The proxy determines that a selected dataset is stored on a specific backend server and queries that backend.

502 Bad Gateway

The backend responds with 404. The proxy could respond with 500 here, but 502 indicates the backend's response is inconsistent with the proxy's understanding of where the datasets are stored.

504 Gateway Timeout

The backend fails to respond in a timely manner. The proxy could respond with 500 here, but 504 indicates the proxy server did not receive a timely response from the backend.

uschi2000 · 2018-02-19T23:36:25Z

I generally prefer we dispatch error handling on ErrorType.Code together with the error name, rather than on the numeric http error code itself. In what way would a standards-compliant http client (that doesn't know about error Code and Name) respond to 502/504 as opposed to 500?

jonsyu1 · 2018-02-20T06:24:32Z

That's fair, I was thinking more about how to debug a failure from a server's HTTP response. I can't imagine a generalized action a standards-compliant http client would take when given a 502/504.

jonsyu1 · 2019-06-05T01:07:12Z

@uschi2000 this came up again for health checks that define an acceptable error rate. Let A, B, C be conjure services. Requests flow A -> B -> C. C responds with a 500. In conjure, there's only 500s for service errors so B responds with 500 and A responds with 500. If C continues to respond with 500s, A, B and C will all report unhealthy. In order to maximize signal, I prefer only C to report unhealthy so I know which service to investigate. This is pretty easy to indicate by having A and B respond with 502 or 504, whichever is more appropriate.

uschi2000 · 2019-06-05T08:41:32Z

Probably a topic that better suited for in person discussion, but in short: Being a cross-service, cross-language concern, I have strong reservations against making the RPC layer more complicated that absolutely necessary. I suspect that we'll get better results when we define service health in terms of the error domain of a given service, rather than trying to derive everything from RPC. In this particular case, what is it about the B->C functionality that broke down, and how can B indicate that (1) B itself cannot currently perform it's function and (2) this is due to a problem with C.

In pseudo-code:

class B {
  void collectDataFromC() {
    data, err = c.getData()
    if err {
      // "collectDataFromC": indicates what functionality broke in B
      // "C" indicates to the deployment system that "C" may have a problem. Note that this fact can likely be derived from B's client metrics.
      healthChecks.hint("collectDataFromC", "C")
    }
  }
}

Reading this again, I guess the above still doesn't answer what B should tell A, and how A can possibly differentiate between "B's broken" and "C's broken". Dunno.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define common server error codes in ErrorType.Code #70

Define common server error codes in ErrorType.Code #70

jonsyu1 commented Feb 16, 2018

uschi2000 commented Feb 16, 2018

jonsyu1 commented Feb 19, 2018 •

edited

uschi2000 commented Feb 19, 2018

jonsyu1 commented Feb 20, 2018

jonsyu1 commented Jun 5, 2019

uschi2000 commented Jun 5, 2019 •

edited

Define common server error codes in ErrorType.Code #70

Define common server error codes in ErrorType.Code #70

Comments

jonsyu1 commented Feb 16, 2018

uschi2000 commented Feb 16, 2018

jonsyu1 commented Feb 19, 2018 • edited

uschi2000 commented Feb 19, 2018

jonsyu1 commented Feb 20, 2018

jonsyu1 commented Jun 5, 2019

uschi2000 commented Jun 5, 2019 • edited

jonsyu1 commented Feb 19, 2018 •

edited

uschi2000 commented Jun 5, 2019 •

edited