How to do (next gen, SC oriented) remote error serialization for cross host/process propagation? #5

goodboy · 2018-07-06T03:53:46Z

It's like the sloppiest and laziest thing atm..

Doesn't rpyc have some fancy way it does this. Seems like there's a homegrown traceback serializer. Here's their theory of operation.

I specifically don't want to go down the proxy route (one of tractor's tenets) but I think for exceptions it's a special case.

The text was updated successfully, but these errors were encountered:

goodboy · 2018-07-08T08:40:35Z

More hints from celery on exception pickling?

goodboy · 2019-10-17T17:15:05Z

Thanks to @njsmith for pointing out the traceback serializers in jinga2 and also @dhirschfeld for pointing out tblib which seems to be derived from it.

ryanhiebert · 2019-10-18T13:37:35Z

I think that we want to have error propagation that somehow includes explicit mention of host boundaries in a readable way. That's probably the relatively easy part of the issue, but I don't want it to be overlooked. The traceback should show when the host/process changes, which means that the code itself may be different. If there were some way to give a good representation of which version of the code it found on the other side as well, that seems really excellent.

Of course, if everything really is on the same version of the same code, then it'll be redundant. And we could, when things are happy, potentially unseralize the exceptions and raise them as more than just a wrapper exception with the traceback from the other server, which would be cool. Not something that we can rely on in all cases though, so we have to have a good fallback when things don't match and the exception and traceback don't propagate well.

This is all really interesting, and I don't know what I'm talking about, but it looks very neat.

goodboy · 2019-10-18T15:51:47Z

error propagation that somehow includes explicit mention of host boundaries in a readable way

I think this should mostly be included in a mailbox / address in every message (the actor model way). Right now tractor is kind of doing this by having each portal aware of the far end address on either side. I'm trying to think of whether it matters if an actor is local to the host or remote - maybe just for certain network-comms related error handling? A lot of this will be delegated to lower layers in tractor (depending on IPC transport - TCP versus NNG etc.) so I guess relying on specific error types that might change across versions might pose a problem? I feel like a decently designed exception inheritance tree should mostly cover this?

If there were some way to give a good representation of which version of the code it found on the other side as well, that seems really excellent.

In my mind the the primary code that cares about remote errors is an actor's supervisor, and I wonder, should a super care about what version of the code is being run? Is this maybe the concern of something else? For example, if the application required that info couldn't the parent just immediately ask for a version from its child just after spawning? When will it be useful to a super in the general case to know about its child's code version? Maybe in a system where there is hot code swapping like in erlang? I'm still not even sure if such a feature should be built into the core of tractor - might be better oriented as a small "native app" on top?

I think the main question is how much does a remote super need to know about a child's error types / internal code. To me, too much coupling here would mean the super is more part of the app then part of distributed computing system - which maybe is fine in some cases but then won't the super need to have special consideration for details of the child anyway? At face value it would seem to me a super needs to know as much about a child's remote errors as a try/except block needs to know about code it calls (that may change in future revisions). The except: blocks here can be many, specific, and as nested as desired?

If a super is supposed to fulfill its conventional role then I think some set of error "classes" might be necessary to help (custom) supervisor authors determine what types of failure recovery (or cancellation) logic is available. Having a set of contracts for what errors should be raised in which situation is something that can be iterated over time if designed right - but still there will be a foreseeable super handlers-to-error types compatibility problem over multiple versions running in the same cluster(s).

Anyway, too many new questions 😼!

The short assertion is that we already do pack task info in the exception msg and announce / pack the actor uid in the RemoteActorError on the receiving side.

I don't at the moment see any problem with requiring all such remote errors to include the address/actor uid/ task uid info in every error. It's probably just going to make logging system integration that much easier and useful. I also don't see a problem with reconstructing remote errors into local objects other then performance.

goodboy · 2023-10-18T20:04:17Z

Heh, so we're already kinda requiring the whole uid-in-error-as-msg bit as part of the soon to #357 land and we might as well use the new multi-address support we're experimenting with in #367.

Addresses in every error seems like a handy thing for unwinding complex inter-actor-tree service failures especially if we ever get to multi-host supervision APIs down the road ..

goodboy changed the title ~~How to do proper remote error propogation~~ How to do proper remote error propagation Jul 6, 2018

goodboy added this to the 0.1.0.a0 milestone Oct 17, 2019

goodboy mentioned this issue Oct 30, 2019

N-depth nursery error propagation tests #89

Open

goodboy mentioned this issue Nov 24, 2019

Parallel generator pool thing #90

Open

goodboy added discussion enhancement New feature or request help wanted Extra attention is needed testing labels Jan 29, 2020

goodboy modified the milestones: 0.0.0.alpha0, 0.0.0a0.dev0 Jul 31, 2020

goodboy modified the milestones: 0.0.0a0.dev0, 0.0.0.alpha0 Feb 25, 2021

goodboy mentioned this issue Dec 6, 2021

Embedded RemoteActorErrors and other exceptions #268

Open

goodboy mentioned this issue Jan 27, 2022

Toying with a message type for "object references" #294

Open

goodboy changed the title ~~How to do proper remote error propagation~~ How to do proper remote error msg serialization and cross host/process propagation? Oct 18, 2023

goodboy changed the title ~~How to do proper remote error msg serialization and cross host/process propagation?~~ How to do (next gen, SC oriented) remote error serialization for cross host/process propagation? Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do (next gen, SC oriented) remote error serialization for cross host/process propagation? #5

How to do (next gen, SC oriented) remote error serialization for cross host/process propagation? #5

goodboy commented Jul 6, 2018 •

edited

Loading

goodboy commented Jul 8, 2018

goodboy commented Oct 17, 2019

ryanhiebert commented Oct 18, 2019

goodboy commented Oct 18, 2019 •

edited

Loading

goodboy commented Oct 18, 2023

How to do (next gen, SC oriented) remote error serialization for cross host/process propagation? #5

How to do (next gen, SC oriented) remote error serialization for cross host/process propagation? #5

Comments

goodboy commented Jul 6, 2018 • edited Loading

goodboy commented Jul 8, 2018

goodboy commented Oct 17, 2019

ryanhiebert commented Oct 18, 2019

goodboy commented Oct 18, 2019 • edited Loading

goodboy commented Oct 18, 2023

goodboy commented Jul 6, 2018 •

edited

Loading

goodboy commented Oct 18, 2019 •

edited

Loading