Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error code handling #1231

Merged
merged 1 commit into from
Nov 2, 2018
Merged

Fix error code handling #1231

merged 1 commit into from
Nov 2, 2018

Conversation

orangejulius
Copy link
Member

@orangejulius orangejulius commented Nov 1, 2018

Background

For a long time, Pelias has used 400 as the default HTTP error code, and only a select few Elasticsearch exceptions would result in an HTTP 500 response.

This has the effect of hiding a lot of times when something was in fact wrong.

Since HTTP 400 generally signals that the request from the user has been incorrectly crafted, sending 400 error codes instead of 500 is a big problem. Besides sending a misleading signal that it's user error, many user agents will not retry after a 400 response.

Additionally, it makes it harder to monitor the health of a Pelias install. There's no way to tell if users are sending lots of genuinely invalid requests, or if the service is unhealthy.

Changes

This PR adds a new exception class, PeliasParameterError. All sanitizers now return errors that are instances of this class, and middleware/sendJSON checks if any errors it sees are instances of the class.

Requests that result in known sanitizer errors and one or two known Elasticsearch exceptions result in specific error codes. Everything else is now considered an unknown error and results in a 500.

The complexity of the error handling code is greatly reduced. As a bonus, we can now finally get rid of the 4 year old, massively out of date elasticsearch-exceptions NPM module dependency.

Fixes #1108

Copy link
Member

@missinglink missinglink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, it would definitely be worth running this against the ciao tests since they cover a lot of different HTTP error cases.

middleware/sendJSON.js Outdated Show resolved Hide resolved
middleware/sendJSON.js Outdated Show resolved Hide resolved
sanitizer/PeliasParameterError.js Show resolved Hide resolved
sanitizer/PeliasParameterError.js Show resolved Hide resolved
@orangejulius
Copy link
Member Author

Regarding the Ciao tests, I would like to run them, but haven't been able to get them to work for at least a year. If you have more success, do let me know. They would be super valuable for testing this PR.

Pelias has for a long time returned 400 as a default status whenever
anything goes wrong, as well as when a user has passed invalid
parameters.

By using a new exception class, it is now possible to differetiate
between known parameter errors, and unexpected errors that truly
represent an HTTP 500.
@orangejulius
Copy link
Member Author

Ciao tests have been run and all still pass!

@orangejulius orangejulius merged commit 70cbcb3 into master Nov 2, 2018
@orangejulius orangejulius deleted the fix-error-codes branch November 2, 2018 15:15
orangejulius added a commit that referenced this pull request Oct 28, 2021
This code, which checks all existing errors and classifies them as a
certain error type, was running within a loop that probably wasn't
intended.

It looks like this was a mistake made in
#1231
orangejulius added a commit that referenced this pull request Oct 28, 2021
Pelias has always had a bit of trouble selecting the right HTTP response
code in the face of various error states.

Up until #1231 in 2018, we reported
almost all timeouts from slow Elasticsearch queries as HTTP 400 errors,
not something in the more appropriate 5XX range. This suggests to
consumers of the Pelias API that they made a mistake in calling Pelias,
instead of the reality that Pelias was just being slow.

Even after that change, it turns out we were _still_ classifying
timeouts to other Pelias services (like Placeholder or Interpolation) as
400 errors instead of 5XX. All the Pelias services are generally very
fast, so this was not nearly as much of an issue, but timeouts do
happen.

This PR adds additional handling to detect timeout errors and give them
their own subclass of `Error` that can be treated appropriately
everywhere. Timeouts waiting for any Pelias service will now return HTTP
502 errors just like a timeout waiting for Elasticsearch.
orangejulius added a commit that referenced this pull request Oct 28, 2021
Pelias has always had a bit of trouble selecting the right HTTP response
code in the face of various error states.

Up until #1231 in 2018, we reported
almost all timeouts from slow Elasticsearch queries as HTTP 400 errors,
not something in the more appropriate 5XX range. This suggests to
consumers of the Pelias API that they made a mistake in calling Pelias,
instead of the reality that Pelias was just being slow.

Even after that change, it turns out we were _still_ classifying
timeouts to other Pelias services (like Placeholder or Interpolation) as
400 errors instead of 5XX. All the Pelias services are generally very
fast, so this was not nearly as much of an issue, but timeouts do
happen.

This PR adds additional handling to detect timeout errors and give them
their own subclass of `Error` that can be treated appropriately
everywhere. Timeouts waiting for any Pelias service will now return HTTP
502 errors just like a timeout waiting for Elasticsearch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants