Fix error code handling #1231

orangejulius · 2018-11-01T04:13:01Z

Background

For a long time, Pelias has used 400 as the default HTTP error code, and only a select few Elasticsearch exceptions would result in an HTTP 500 response.

This has the effect of hiding a lot of times when something was in fact wrong.

Since HTTP 400 generally signals that the request from the user has been incorrectly crafted, sending 400 error codes instead of 500 is a big problem. Besides sending a misleading signal that it's user error, many user agents will not retry after a 400 response.

Additionally, it makes it harder to monitor the health of a Pelias install. There's no way to tell if users are sending lots of genuinely invalid requests, or if the service is unhealthy.

Changes

This PR adds a new exception class, PeliasParameterError. All sanitizers now return errors that are instances of this class, and middleware/sendJSON checks if any errors it sees are instances of the class.

Requests that result in known sanitizer errors and one or two known Elasticsearch exceptions result in specific error codes. Everything else is now considered an unknown error and results in a 500.

The complexity of the error handling code is greatly reduced. As a bonus, we can now finally get rid of the 4 year old, massively out of date elasticsearch-exceptions NPM module dependency.

Fixes #1108

missinglink

Looks good, it would definitely be worth running this against the ciao tests since they cover a lot of different HTTP error cases.

middleware/sendJSON.js

sanitizer/PeliasParameterError.js

orangejulius · 2018-11-01T15:15:55Z

Regarding the Ciao tests, I would like to run them, but haven't been able to get them to work for at least a year. If you have more success, do let me know. They would be super valuable for testing this PR.

Pelias has for a long time returned 400 as a default status whenever anything goes wrong, as well as when a user has passed invalid parameters. By using a new exception class, it is now possible to differetiate between known parameter errors, and unexpected errors that truly represent an HTTP 500.

orangejulius · 2018-11-02T01:23:28Z

Ciao tests have been run and all still pass!

This code, which checks all existing errors and classifies them as a certain error type, was running within a loop that probably wasn't intended. It looks like this was a mistake made in #1231

Pelias has always had a bit of trouble selecting the right HTTP response code in the face of various error states. Up until #1231 in 2018, we reported almost all timeouts from slow Elasticsearch queries as HTTP 400 errors, not something in the more appropriate 5XX range. This suggests to consumers of the Pelias API that they made a mistake in calling Pelias, instead of the reality that Pelias was just being slow. Even after that change, it turns out we were _still_ classifying timeouts to other Pelias services (like Placeholder or Interpolation) as 400 errors instead of 5XX. All the Pelias services are generally very fast, so this was not nearly as much of an issue, but timeouts do happen. This PR adds additional handling to detect timeout errors and give them their own subclass of `Error` that can be treated appropriately everywhere. Timeouts waiting for any Pelias service will now return HTTP 502 errors just like a timeout waiting for Elasticsearch.

orangejulius force-pushed the fix-error-codes branch from 8d91abd to ca599c9 Compare November 1, 2018 04:23

orangejulius requested a review from missinglink November 1, 2018 05:38

missinglink reviewed Nov 1, 2018

View reviewed changes

middleware/sendJSON.js Outdated Show resolved Hide resolved

middleware/sendJSON.js Outdated Show resolved Hide resolved

sanitizer/PeliasParameterError.js Show resolved Hide resolved

sanitizer/PeliasParameterError.js Show resolved Hide resolved

orangejulius force-pushed the fix-error-codes branch from ca599c9 to 00c795e Compare November 1, 2018 14:13

orangejulius force-pushed the fix-error-codes branch from 00c795e to f2dd729 Compare November 2, 2018 01:18

orangejulius force-pushed the fix-error-codes branch from f2dd729 to a1c4829 Compare November 2, 2018 01:23

orangejulius merged commit 70cbcb3 into master Nov 2, 2018

orangejulius deleted the fix-error-codes branch November 2, 2018 15:15

orangejulius mentioned this pull request Oct 28, 2021

Return 502 response code for service timeouts instead of 400 #1573

Merged

orangejulius mentioned this pull request Jul 3, 2023

Improve configurability #1654

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix error code handling #1231

Fix error code handling #1231

orangejulius commented Nov 1, 2018 •

edited

Loading

missinglink left a comment

orangejulius commented Nov 1, 2018

orangejulius commented Nov 2, 2018

Fix error code handling #1231

Fix error code handling #1231

Conversation

orangejulius commented Nov 1, 2018 • edited Loading

Background

Changes

missinglink left a comment

Choose a reason for hiding this comment

orangejulius commented Nov 1, 2018

orangejulius commented Nov 2, 2018

orangejulius commented Nov 1, 2018 •

edited

Loading