-
-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix error code handling #1231
Fix error code handling #1231
Conversation
8d91abd
to
ca599c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, it would definitely be worth running this against the ciao tests since they cover a lot of different HTTP error cases.
ca599c9
to
00c795e
Compare
Regarding the Ciao tests, I would like to run them, but haven't been able to get them to work for at least a year. If you have more success, do let me know. They would be super valuable for testing this PR. |
00c795e
to
f2dd729
Compare
Pelias has for a long time returned 400 as a default status whenever anything goes wrong, as well as when a user has passed invalid parameters. By using a new exception class, it is now possible to differetiate between known parameter errors, and unexpected errors that truly represent an HTTP 500.
Ciao tests have been run and all still pass! |
f2dd729
to
a1c4829
Compare
This code, which checks all existing errors and classifies them as a certain error type, was running within a loop that probably wasn't intended. It looks like this was a mistake made in #1231
Pelias has always had a bit of trouble selecting the right HTTP response code in the face of various error states. Up until #1231 in 2018, we reported almost all timeouts from slow Elasticsearch queries as HTTP 400 errors, not something in the more appropriate 5XX range. This suggests to consumers of the Pelias API that they made a mistake in calling Pelias, instead of the reality that Pelias was just being slow. Even after that change, it turns out we were _still_ classifying timeouts to other Pelias services (like Placeholder or Interpolation) as 400 errors instead of 5XX. All the Pelias services are generally very fast, so this was not nearly as much of an issue, but timeouts do happen. This PR adds additional handling to detect timeout errors and give them their own subclass of `Error` that can be treated appropriately everywhere. Timeouts waiting for any Pelias service will now return HTTP 502 errors just like a timeout waiting for Elasticsearch.
Pelias has always had a bit of trouble selecting the right HTTP response code in the face of various error states. Up until #1231 in 2018, we reported almost all timeouts from slow Elasticsearch queries as HTTP 400 errors, not something in the more appropriate 5XX range. This suggests to consumers of the Pelias API that they made a mistake in calling Pelias, instead of the reality that Pelias was just being slow. Even after that change, it turns out we were _still_ classifying timeouts to other Pelias services (like Placeholder or Interpolation) as 400 errors instead of 5XX. All the Pelias services are generally very fast, so this was not nearly as much of an issue, but timeouts do happen. This PR adds additional handling to detect timeout errors and give them their own subclass of `Error` that can be treated appropriately everywhere. Timeouts waiting for any Pelias service will now return HTTP 502 errors just like a timeout waiting for Elasticsearch.
Background
For a long time, Pelias has used 400 as the default HTTP error code, and only a select few Elasticsearch exceptions would result in an HTTP 500 response.
This has the effect of hiding a lot of times when something was in fact wrong.
Since HTTP 400 generally signals that the request from the user has been incorrectly crafted, sending 400 error codes instead of 500 is a big problem. Besides sending a misleading signal that it's user error, many user agents will not retry after a 400 response.
Additionally, it makes it harder to monitor the health of a Pelias install. There's no way to tell if users are sending lots of genuinely invalid requests, or if the service is unhealthy.
Changes
This PR adds a new exception class,
PeliasParameterError
. All sanitizers now return errors that are instances of this class, andmiddleware/sendJSON
checks if any errors it sees are instances of the class.Requests that result in known sanitizer errors and one or two known Elasticsearch exceptions result in specific error codes. Everything else is now considered an unknown error and results in a 500.
The complexity of the error handling code is greatly reduced. As a bonus, we can now finally get rid of the 4 year old, massively out of date elasticsearch-exceptions NPM module dependency.
Fixes #1108