Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-43121: Set up server to client error propagation #973

Merged
merged 8 commits into from Mar 11, 2024
Merged

Conversation

dhirving
Copy link
Contributor

@dhirving dhirving commented Mar 6, 2024

Set up a framework for transferring exceptions from the server to the client and raising them in RemoteButler. Exceptions intended as user-facing error messages should now be raised as exceptions that subclass a new class ButlerUserError. This allows us to distinguish between "intentional"/"expected" error messages which should be sent to the client and should not raise alarms, vs server-internal breakage that our team will have to investigate and fix.

Any ButlerUserError raised on the server will be caught and transferred to the client as JSON with a 422 status code, then raised as the original exception type on the client.

The top-level _exceptions.py will eventually become the single file where Butler-specific exceptions for the public API are defined. Added a new exception type DatasetNotFoundError to replace some LookupErrors raised when datasets were not found. Hoisted MissingDatasetTypeError from registry._exceptions to the top-level _exceptions, and put its base classes in a separate file _exceptions_legacy.py to indicate they should not be used from new code.

Updated enough exceptions to pass the tests in test_butler.py using this framework. The registry tests in test_remote_butler.py will require more work in a later PR.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

Copy link

codecov bot commented Mar 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.90%. Comparing base (9d79d6e) to head (f6eb3c8).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #973      +/-   ##
==========================================
+ Coverage   88.89%   88.90%   +0.01%     
==========================================
  Files         329      329              
  Lines       42280    42330      +50     
  Branches     8699     8703       +4     
==========================================
+ Hits        37583    37634      +51     
  Misses       3444     3444              
+ Partials     1253     1252       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dhirving dhirving marked this pull request as ready for review March 7, 2024 17:50
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

), "Subclasses of ButlerUserError must have unique 'error_type' property"


def create_butler_user_error(error_type: str, message: str) -> ButlerUserError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a docstring for this since it is used elsewhere and would help people reading the code and getting previews from VSCode.

try:
model = self._get_file_info(datasetRefOrType, dataId, collections, kwargs)
except DatasetNotFoundError as e:
raise FileNotFoundError(str(e)) from e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to talk about this oddity for the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good call out... I'm pretty sure this is just incorrect even though it passes tests.

I thought this was needed because of the datastore vs registry thing. Datastore pretty consistently throws FileNotFoundError when stuff can't be found, and we're mostly acting in place of datastore here.

But I took a closer look at DirectButler and it does actually throw DatasetNotFoundError (formerly LookupError) for most common cases of not being able to lookup a ref in get(). Apparently that just isn't covered in test_butler.py.

The FileNotFoundErrors being expected in the tests are actually for a couple of weird edge cases:

  • changing the UUID of a ref to something invalid (line 500 in test_butler.py)
  • holding on to a ref after calling pruneDatasets(purge=True), then trying to call get() on it (line 417 in test_butler.py)

Arguably in both these cases it would make just as much sense to throw the DatasetNotFoundError, but because these checks are inside Datastore they happen to throw FileNotFoundError because that's what Datastore does.

Add an exception class that can be used to tag an  exception as being a user-facing error message.  Set up a FastAPI exception handler that can forward these error messages to the client side.  Have RemoteButler  raise the corresponding exception when errors are received from the server.
Hoist MissingDatasetTypeError up to the top-level exceptions file, and mark it as as ButlerUserError so that it can be sent from the server to the client.  This allows removing some code specific to this exception type from RemoteButler.
In unit tests, we now convert unhandled server errors to a special exception class 'UnhandledServerError' that is not part of Butler's usual exception hierarchy.

The FastAPI TestClient by default passes unhandled exceptions up from the server to the client.  This is useful behavior for unit testing because it gives you traceability from the test to the problem in the server code. However, because RemoteButler is in some ways just a proxy for the server-side Butler, we raise similar exceptions on the client and server side. Thus the default TestClient behavior can mask missing error-handling logic.

The registry test suite in test_remote_butler.py does not yet have all error-handling logic in place, so it opts out of the new behavior.
Remove one-off code handling FileNotFoundError and 404s in RemoteButler, replacing it with the new error propagation framework.
LookupError is currently thrown from a variety of places for mostly-unrelated reasons, most of which don't need to be propagated to the server.  Give it a more specific name instead.
It turns out that DirectButler typically throws two different exceptions from get():
1. DatasetNotFoundError if the dataset can't be found in registry
2.  FileNotFoundError if the file itself cannot be found or under some other error conditions (some of which involve not being able to find the dataset in registry.)

Added test coverage for case #1, and fixed RemoteButler to comply with it.  Loosened the tests for some unusual error conditions to allow either of the two exception types, because RemoteButler doesn't have enough information available to replicate the DirectButler behavior.
@dhirving dhirving force-pushed the tickets/DM-43121 branch 2 times, most recently from 2a535c5 to 9bb59b2 Compare March 11, 2024 17:49
@dhirving dhirving merged commit ccf44ff into main Mar 11, 2024
18 checks passed
@dhirving dhirving deleted the tickets/DM-43121 branch March 11, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants