DM-43121: Set up server to client error propagation #973

dhirving · 2024-03-06T20:10:51Z

Set up a framework for transferring exceptions from the server to the client and raising them in RemoteButler. Exceptions intended as user-facing error messages should now be raised as exceptions that subclass a new class ButlerUserError. This allows us to distinguish between "intentional"/"expected" error messages which should be sent to the client and should not raise alarms, vs server-internal breakage that our team will have to investigate and fix.

Any ButlerUserError raised on the server will be caught and transferred to the client as JSON with a 422 status code, then raised as the original exception type on the client.

The top-level _exceptions.py will eventually become the single file where Butler-specific exceptions for the public API are defined. Added a new exception type DatasetNotFoundError to replace some LookupErrors raised when datasets were not found. Hoisted MissingDatasetTypeError from registry._exceptions to the top-level _exceptions, and put its base classes in a separate file _exceptions_legacy.py to indicate they should not be used from new code.

Updated enough exceptions to pass the tests in test_butler.py using this framework. The registry tests in test_remote_butler.py will require more work in a later PR.

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

codecov · 2024-03-06T20:22:09Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.90%. Comparing base (9d79d6e) to head (f6eb3c8).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #973      +/-   ##
==========================================
+ Coverage   88.89%   88.90%   +0.01%     
==========================================
  Files         329      329              
  Lines       42280    42330      +50     
  Branches     8699     8703       +4     
==========================================
+ Hits        37583    37634      +51     
  Misses       3444     3444              
+ Partials     1253     1252       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

timj

Looks good.

timj · 2024-03-07T20:05:57Z

python/lsst/daf/butler/_exceptions.py

+), "Subclasses of ButlerUserError must have unique 'error_type' property"
+
+
+def create_butler_user_error(error_type: str, message: str) -> ButlerUserError:


Can you please add a docstring for this since it is used elsewhere and would help people reading the code and getting previews from VSCode.

timj · 2024-03-07T20:49:59Z

python/lsst/daf/butler/remote_butler/_remote_butler.py

+        try:
+            model = self._get_file_info(datasetRefOrType, dataId, collections, kwargs)
+        except DatasetNotFoundError as e:
+            raise FileNotFoundError(str(e)) from e


Maybe we need to talk about this oddity for the future.

This is a good call out... I'm pretty sure this is just incorrect even though it passes tests.

I thought this was needed because of the datastore vs registry thing. Datastore pretty consistently throws FileNotFoundError when stuff can't be found, and we're mostly acting in place of datastore here.

But I took a closer look at DirectButler and it does actually throw DatasetNotFoundError (formerly LookupError) for most common cases of not being able to lookup a ref in get(). Apparently that just isn't covered in test_butler.py.

The FileNotFoundErrors being expected in the tests are actually for a couple of weird edge cases:

changing the UUID of a ref to something invalid (line 500 in test_butler.py)

holding on to a ref after calling pruneDatasets(purge=True), then trying to call get() on it (line 417 in test_butler.py)

Arguably in both these cases it would make just as much sense to throw the DatasetNotFoundError, but because these checks are inside Datastore they happen to throw FileNotFoundError because that's what Datastore does.

Add an exception class that can be used to tag an exception as being a user-facing error message. Set up a FastAPI exception handler that can forward these error messages to the client side. Have RemoteButler raise the corresponding exception when errors are received from the server.

Hoist MissingDatasetTypeError up to the top-level exceptions file, and mark it as as ButlerUserError so that it can be sent from the server to the client. This allows removing some code specific to this exception type from RemoteButler.

In unit tests, we now convert unhandled server errors to a special exception class 'UnhandledServerError' that is not part of Butler's usual exception hierarchy. The FastAPI TestClient by default passes unhandled exceptions up from the server to the client. This is useful behavior for unit testing because it gives you traceability from the test to the problem in the server code. However, because RemoteButler is in some ways just a proxy for the server-side Butler, we raise similar exceptions on the client and server side. Thus the default TestClient behavior can mask missing error-handling logic. The registry test suite in test_remote_butler.py does not yet have all error-handling logic in place, so it opts out of the new behavior.

Remove one-off code handling FileNotFoundError and 404s in RemoteButler, replacing it with the new error propagation framework.

LookupError is currently thrown from a variety of places for mostly-unrelated reasons, most of which don't need to be propagated to the server. Give it a more specific name instead.

It turns out that DirectButler typically throws two different exceptions from get(): 1. DatasetNotFoundError if the dataset can't be found in registry 2. FileNotFoundError if the file itself cannot be found or under some other error conditions (some of which involve not being able to find the dataset in registry.) Added test coverage for case #1, and fixed RemoteButler to comply with it. Loosened the tests for some unusual error conditions to allow either of the two exception types, because RemoteButler doesn't have enough information available to replicate the DirectButler behavior.

dhirving force-pushed the tickets/DM-43121 branch from 15b404c to 78d6fbc Compare March 6, 2024 22:42

dhirving marked this pull request as ready for review March 7, 2024 17:50

timj approved these changes Mar 7, 2024

View reviewed changes

dhirving force-pushed the tickets/DM-43121 branch from 8d09b13 to 8a89a98 Compare March 8, 2024 22:35

dhirving added 6 commits March 8, 2024 15:35

Propagate MissingDatasetTypeError to client

9d2fd1d

Hoist MissingDatasetTypeError up to the top-level exceptions file, and mark it as as ButlerUserError so that it can be sent from the server to the client. This allows removing some code specific to this exception type from RemoteButler.

Remove bespoke error-handling code in RemoteButler

0ca6baa

Remove one-off code handling FileNotFoundError and 404s in RemoteButler, replacing it with the new error propagation framework.

Rename ButlerLookupError to DatasetNotFoundError

f720a80

LookupError is currently thrown from a variety of places for mostly-unrelated reasons, most of which don't need to be propagated to the server. Give it a more specific name instead.

Add test coverage for error handling edge cases

5f502db

dhirving force-pushed the tickets/DM-43121 branch from 8a89a98 to 98b0e44 Compare March 8, 2024 22:35

dhirving force-pushed the tickets/DM-43121 branch 2 times, most recently from 2a535c5 to 9bb59b2 Compare March 11, 2024 17:49

Work around mypy regression

f6eb3c8

dhirving force-pushed the tickets/DM-43121 branch from 9bb59b2 to f6eb3c8 Compare March 11, 2024 17:53

dhirving merged commit ccf44ff into main Mar 11, 2024
18 checks passed

dhirving deleted the tickets/DM-43121 branch March 11, 2024 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-43121: Set up server to client error propagation #973

DM-43121: Set up server to client error propagation #973

dhirving commented Mar 6, 2024 •

edited

codecov bot commented Mar 6, 2024 •

edited

timj left a comment

timj Mar 7, 2024

timj Mar 7, 2024

dhirving Mar 7, 2024

		), "Subclasses of ButlerUserError must have unique 'error_type' property"


		def create_butler_user_error(error_type: str, message: str) -> ButlerUserError:

DM-43121: Set up server to client error propagation #973

DM-43121: Set up server to client error propagation #973

Conversation

dhirving commented Mar 6, 2024 • edited

Checklist

codecov bot commented Mar 6, 2024 • edited

Codecov Report

timj left a comment

Choose a reason for hiding this comment

timj Mar 7, 2024

Choose a reason for hiding this comment

timj Mar 7, 2024

Choose a reason for hiding this comment

dhirving Mar 7, 2024

Choose a reason for hiding this comment

dhirving commented Mar 6, 2024 •

edited

codecov bot commented Mar 6, 2024 •

edited