Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do (next gen, SC oriented) remote error serialization for cross host/process propagation? #5

Open
goodboy opened this issue Jul 6, 2018 · 5 comments
Labels
discussion enhancement New feature or request help wanted Extra attention is needed testing
Milestone

Comments

@goodboy
Copy link
Owner

goodboy commented Jul 6, 2018

It's like the sloppiest and laziest thing atm..

Doesn't rpyc have some fancy way it does this. Seems like there's a homegrown traceback serializer. Here's their theory of operation.

I specifically don't want to go down the proxy route (one of tractor's tenets) but I think for exceptions it's a special case.

@goodboy goodboy changed the title How to do proper remote error propogation How to do proper remote error propagation Jul 6, 2018
@goodboy
Copy link
Owner Author

goodboy commented Jul 8, 2018

More hints from celery on exception pickling?

@goodboy
Copy link
Owner Author

goodboy commented Oct 17, 2019

Thanks to @njsmith for pointing out the traceback serializers in jinga2 and also @dhirschfeld for pointing out tblib which seems to be derived from it.

@goodboy goodboy added this to the 0.1.0.a0 milestone Oct 17, 2019
@ryanhiebert
Copy link
Collaborator

I think that we want to have error propagation that somehow includes explicit mention of host boundaries in a readable way. That's probably the relatively easy part of the issue, but I don't want it to be overlooked. The traceback should show when the host/process changes, which means that the code itself may be different. If there were some way to give a good representation of which version of the code it found on the other side as well, that seems really excellent.

Of course, if everything really is on the same version of the same code, then it'll be redundant. And we could, when things are happy, potentially unseralize the exceptions and raise them as more than just a wrapper exception with the traceback from the other server, which would be cool. Not something that we can rely on in all cases though, so we have to have a good fallback when things don't match and the exception and traceback don't propagate well.

This is all really interesting, and I don't know what I'm talking about, but it looks very neat.

@goodboy
Copy link
Owner Author

goodboy commented Oct 18, 2019

error propagation that somehow includes explicit mention of host boundaries in a readable way

I think this should mostly be included in a mailbox / address in every message (the actor model way). Right now tractor is kind of doing this by having each portal aware of the far end address on either side. I'm trying to think of whether it matters if an actor is local to the host or remote - maybe just for certain network-comms related error handling? A lot of this will be delegated to lower layers in tractor (depending on IPC transport - TCP versus NNG etc.) so I guess relying on specific error types that might change across versions might pose a problem? I feel like a decently designed exception inheritance tree should mostly cover this?

If there were some way to give a good representation of which version of the code it found on the other side as well, that seems really excellent.

In my mind the the primary code that cares about remote errors is an actor's supervisor, and I wonder, should a super care about what version of the code is being run? Is this maybe the concern of something else? For example, if the application required that info couldn't the parent just immediately ask for a version from its child just after spawning? When will it be useful to a super in the general case to know about its child's code version? Maybe in a system where there is hot code swapping like in erlang? I'm still not even sure if such a feature should be built into the core of tractor - might be better oriented as a small "native app" on top?

I think the main question is how much does a remote super need to know about a child's error types / internal code. To me, too much coupling here would mean the super is more part of the app then part of distributed computing system - which maybe is fine in some cases but then won't the super need to have special consideration for details of the child anyway? At face value it would seem to me a super needs to know as much about a child's remote errors as a try/except block needs to know about code it calls (that may change in future revisions). The except: blocks here can be many, specific, and as nested as desired?

If a super is supposed to fulfill its conventional role then I think some set of error "classes" might be necessary to help (custom) supervisor authors determine what types of failure recovery (or cancellation) logic is available. Having a set of contracts for what errors should be raised in which situation is something that can be iterated over time if designed right - but still there will be a foreseeable super handlers-to-error types compatibility problem over multiple versions running in the same cluster(s).

Anyway, too many new questions 😼!

The short assertion is that we already do pack task info in the exception msg and announce / pack the actor uid in the RemoteActorError on the receiving side.

I don't at the moment see any problem with requiring all such remote errors to include the address/actor uid/ task uid info in every error. It's probably just going to make logging system integration that much easier and useful. I also don't see a problem with reconstructing remote errors into local objects other then performance.

@goodboy
Copy link
Owner Author

goodboy commented Oct 18, 2023

Heh, so we're already kinda requiring the whole uid-in-error-as-msg bit as part of the soon to #357 land and we might as well use the new multi-address support we're experimenting with in #367.

Addresses in every error seems like a handy thing for unwinding complex inter-actor-tree service failures especially if we ever get to multi-host supervision APIs down the road ..

@goodboy goodboy changed the title How to do proper remote error propagation How to do proper remote error msg serialization and cross host/process propagation? Oct 18, 2023
@goodboy goodboy changed the title How to do proper remote error msg serialization and cross host/process propagation? How to do (next gen, SC oriented) remote error serialization for cross host/process propagation? Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request help wanted Extra attention is needed testing
Projects
None yet
Development

No branches or pull requests

2 participants