Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define common server error codes in ErrorType.Code #70

Open
jonsyu1 opened this issue Feb 16, 2018 · 6 comments
Open

Define common server error codes in ErrorType.Code #70

jonsyu1 opened this issue Feb 16, 2018 · 6 comments

Comments

@jonsyu1
Copy link

jonsyu1 commented Feb 16, 2018

The ErrorType.Code enum is overly restrictive. We only have access to 500 for server errors:

        FAILED_PRECONDITION(500),
        INTERNAL(500),
        TIMEOUT(500),
        CUSTOM_SERVER(500);

At a bare minimum, it would be useful to define 501-504.

@uschi2000
Copy link
Contributor

503 is available via QosException. In what circumstances would you want to use 502 and 504?

@jonsyu1
Copy link
Author

jonsyu1 commented Feb 19, 2018

Let's say a proxy server is responsible for translating queries and dispatching requests to the appropriate backend. The proxy determines that a selected dataset is stored on a specific backend server and queries that backend.

502 Bad Gateway

The backend responds with 404. The proxy could respond with 500 here, but 502 indicates the backend's response is inconsistent with the proxy's understanding of where the datasets are stored.

504 Gateway Timeout

The backend fails to respond in a timely manner. The proxy could respond with 500 here, but 504 indicates the proxy server did not receive a timely response from the backend.

@uschi2000
Copy link
Contributor

I generally prefer we dispatch error handling on ErrorType.Code together with the error name, rather than on the numeric http error code itself. In what way would a standards-compliant http client (that doesn't know about error Code and Name) respond to 502/504 as opposed to 500?

@jonsyu1
Copy link
Author

jonsyu1 commented Feb 20, 2018

That's fair, I was thinking more about how to debug a failure from a server's HTTP response. I can't imagine a generalized action a standards-compliant http client would take when given a 502/504.

@jonsyu1
Copy link
Author

jonsyu1 commented Jun 5, 2019

@uschi2000 this came up again for health checks that define an acceptable error rate. Let A, B, C be conjure services. Requests flow A -> B -> C. C responds with a 500. In conjure, there's only 500s for service errors so B responds with 500 and A responds with 500. If C continues to respond with 500s, A, B and C will all report unhealthy. In order to maximize signal, I prefer only C to report unhealthy so I know which service to investigate. This is pretty easy to indicate by having A and B respond with 502 or 504, whichever is more appropriate.

@uschi2000
Copy link
Contributor

uschi2000 commented Jun 5, 2019

Probably a topic that better suited for in person discussion, but in short: Being a cross-service, cross-language concern, I have strong reservations against making the RPC layer more complicated that absolutely necessary. I suspect that we'll get better results when we define service health in terms of the error domain of a given service, rather than trying to derive everything from RPC. In this particular case, what is it about the B->C functionality that broke down, and how can B indicate that (1) B itself cannot currently perform it's function and (2) this is due to a problem with C.

In pseudo-code:

class B {
  void collectDataFromC() {
    data, err = c.getData()
    if err {
      // "collectDataFromC": indicates what functionality broke in B
      // "C" indicates to the deployment system that "C" may have a problem. Note that this fact can likely be derived from B's client metrics.
      healthChecks.hint("collectDataFromC", "C")
    }
  }
}

Reading this again, I guess the above still doesn't answer what B should tell A, and how A can possibly differentiate between "B's broken" and "C's broken". Dunno.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants