Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better ergonomics for exceptions #56

Closed
3 of 11 tasks
njsmith opened this issue Feb 20, 2017 · 7 comments
Closed
3 of 11 tasks

Better ergonomics for exceptions #56

njsmith opened this issue Feb 20, 2017 · 7 comments

Comments

@njsmith
Copy link
Member

njsmith commented Feb 20, 2017

Our error-reporting story is tolerable at this point (see #49 for some history), but could definitely still be improved. This collects some misc ideas:

  • Our tracebacks are very cluttered and hard to read (lots of internal helper functions etc. all over the place) – can we cut this down?
  • Tracebacks should include task context, like: as an exception propagates in (Error.unwrap) and out (_task_finished) of tasks, update a record like exc.__trio_task_context__ with entries like (len(tb), "exit", task), and use them to add context to printed tracebacks. (The need to compute len is a bit unfortunate though... I guess ideally these should be attributes on the tb objects themselves? But tb objects don't have __dict__s. Or we could just stash the tb object.) MultiError.filter needs to be careful to propagate these correctly when propagating tracebacks.
  • MultiError.catch is OK, but could be better
    • We don't propagate __context__ on MultiErrors
    • We should really have an async version... but bpo29600 makes this difficult!
    • The filter interface is maybe a bit awkward and error prone?
    • If people try to work around the awkwardness by doing some sort of try: raise exc except: ... thing, then they get corrupt tracebacks (we assume that the filter callback doesn't touch the traceback). We could potentially detect new tb frames appearing and fix them up.
  • Better IPython integration: right now we override the printing of MultiErrors and use our regular MultiError printing code. This is good b/c it gives full information, but bad b/c it loses out on IPython's enhanced exception formatting. Better ergonomics for working with MultiErrors #49 has some notes on how this might be possible to get both. (Note that this becomes more of an issue if we do as suggested above and start customizing all exception output and not just MultiError!) Possibly should chat with the IPython devs about this.
@bluetech
Copy link
Member

Are you familiar with C#'s AggregateException?

Some notes compared to MultiError:

  • (Subjective) The name is a bit more memorable, to me at least.

  • Trio follows the rule "any unhandled exceptions are re-raised inside the parent task. If there are multiple exceptions, then they’re collected up into a single MultiError exception".

    C# however follows the rule that an AggregateException is thrown even in the single-exception case. That is, if the operation is such that it needs to aggregate exceptions, then AggregateException is what it throws always (example).

    I think the C# way is better:

    • In order to write correct code, the caller must handle the MultiError case. If using MultiError.catch, this handles the single-error case already. If doing own try-except (the way I prefer), must handle the two cases separately, which is redundant and annoying.

    • It is impossible to be lazy/ignorant of the multi-error case and only handle the single-error case.

    • Simpler documentation, simpler static typing.

  • AggregateException has a flatten method which recursively merges nested AggregateExceptions. I think MutliError.filter does some of that, however I personally find it simpler to just catch MultiError and iterate over the exceptions, so a flatten(self: MultiError) -> MultiError can be useful.

@belm0
Copy link
Member

belm0 commented Aug 11, 2018

+1 for an option to elide stack frames. Here I'll consider how this might behave and show a concrete example.

Proposed behavior:

  • by default we'll elide trio module stack frames, along with any frames from a list of "trio dependency" modules which descend from trio frames
  • only runs of at least MIN_FRAMES_ELIDED will be skipped (hint: probably 2)

Program specimen is (roughly) https://gist.github.com/nonsleepr/43420a0ce659f1b870544fffb9e5cda4

Original trace (with paths shortened):

Traceback (most recent call last):
  File "demo.py", line 58, in <module>
    test_multiplexer_with_error()
  File "demo.py", line 55, in test_multiplexer_with_error
    trio.run(runner2)
  File "/.../site-packages/trio/_core/_run.py", line 1277, in run
    return result.unwrap()
  File "/.../site-packages/outcome/_sync.py", line 107, in unwrap
    raise self.error
  File "/.../site-packages/trio/_core/_run.py", line 1387, in run_impl
    msg = task.context.run(task.coro.send, next_send)
  File "/.../site-packages/contextvars/__init__.py", line 38, in run
    return callable(*args, **kwargs)
  File "/.../site-packages/trio/_core/_run.py", line 970, in init
    self.entry_queue.spawn()
  File "/.../site-packages/async_generator/_util.py", line 42, in __aexit__
    await self._agen.asend(None)
  File "/.../site-packages/async_generator/_impl.py", line 366, in step
    return await ANextIter(self._it, start_fn, *args)
  File "/.../site-packages/async_generator/_impl.py", line 202, in send
    return self._invoke(self._it.send, value)
  File "/.../site-packages/async_generator/_impl.py", line 209, in _invoke
    result = fn(*args)
  File "/.../site-packages/trio/_core/_run.py", line 317, in open_nursery
    await nursery._nested_child_finished(nested_child_exc)
  File "/.../contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/.../site-packages/trio/_core/_run.py", line 202, in open_cancel_scope
    yield scope
  File "/.../site-packages/trio/_core/_multierror.py", line 144, in __exit__
    raise filtered_exc
  File "/.../site-packages/trio/_core/_run.py", line 202, in open_cancel_scope
    yield scope
  File "/.../site-packages/trio/_core/_run.py", line 317, in open_nursery
    await nursery._nested_child_finished(nested_child_exc)
  File "/.../site-packages/trio/_core/_run.py", line 428, in _nested_child_finished
    raise MultiError(self._pending_excs)
  File "/.../site-packages/trio/_core/_run.py", line 1387, in run_impl
    msg = task.context.run(task.coro.send, next_send)
  File "/.../site-packages/contextvars/__init__.py", line 38, in run
    return callable(*args, **kwargs)
  File "demo.py", line 52, in runner2
    nursery.start_soon(writer2, mx, (7,9))
  File "/.../site-packages/async_generator/_util.py", line 42, in __aexit__
    await self._agen.asend(None)
  File "/.../site-packages/async_generator/_impl.py", line 366, in step
    return await ANextIter(self._it, start_fn, *args)
  File "/.../site-packages/async_generator/_impl.py", line 202, in send
    return self._invoke(self._it.send, value)
  File "/.../site-packages/async_generator/_impl.py", line 209, in _invoke
    result = fn(*args)
  File "/.../site-packages/trio/_core/_run.py", line 317, in open_nursery
    await nursery._nested_child_finished(nested_child_exc)
  File "/.../contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/.../site-packages/trio/_core/_run.py", line 202, in open_cancel_scope
    yield scope
  File "/.../site-packages/trio/_core/_multierror.py", line 144, in __exit__
    raise filtered_exc
  File "/.../site-packages/trio/_core/_run.py", line 202, in open_cancel_scope
    yield scope
  File "/.../site-packages/trio/_core/_run.py", line 317, in open_nursery
    await nursery._nested_child_finished(nested_child_exc)
  File "/.../site-packages/trio/_core/_run.py", line 428, in _nested_child_finished
    raise MultiError(self._pending_excs)
  File "/.../site-packages/trio/_core/_run.py", line 1387, in run_impl
    msg = task.context.run(task.coro.send, next_send)
  File "/.../site-packages/contextvars/__init__.py", line 38, in run
    return callable(*args, **kwargs)
  File "demo.py", line 15, in reader
    raise e
  File "demo.py", line 9, in reader
    value = await mx[key]
  File "multiplexer.py", line 36, in __getitem__
    value = await trio.hazmat.wait_task_rescheduled(abort_fn)
  File "/.../site-packages/trio/_core/_traps.py", line 159, in wait_task_rescheduled
    return (await _async_yield(WaitTaskRescheduled(abort_func))).unwrap()
  File "/.../site-packages/outcome/_sync.py", line 107, in unwrap
    raise self.error
Exception: Ka-Boom!

Trace with trio frames elided (including async_generator, contextvars, and contextlib dependencies) using MIN_FRAMES_ELIDED=2:

Traceback (most recent call last):
  File "demo.py", line 58, in <module>
    test_multiplexer_with_error()
  File "demo.py", line 55, in test_multiplexer_with_error
    trio.run(runner2)
  File "/.../site-packages/trio/_core/_run.py", line 1277, in run
    return result.unwrap()
  File "/.../site-packages/outcome/_sync.py", line 107, in unwrap
    raise self.error
  [... skipping 16 stack frames ...]
  File "demo.py", line 52, in runner2
    nursery.start_soon(writer2, mx, (7,9))
  [... skipping 13 stack frames ...]
  File "demo.py", line 15, in reader
    raise e
  File "demo.py", line 9, in reader
    value = await mx[key]
  File "multiplexer.py", line 36, in __getitem__
    value = await trio.hazmat.wait_task_rescheduled(abort_fn)
  File "/.../site-packages/trio/_core/_traps.py", line 159, in wait_task_rescheduled
    return (await _async_yield(WaitTaskRescheduled(abort_func))).unwrap()
  File "/.../site-packages/outcome/_sync.py", line 107, in unwrap
    raise self.error
Exception: Ka-Boom!

@belm0
Copy link
Member

belm0 commented Aug 18, 2018

notes from @njsmith:

I think probably the single biggest win will be rewriting open_nursery to be an explicit context manager (with aexit and aenter methods) instead of using @asynccontextmanager

[source of extraneous stack frames is] really the combination of @asynccontextmanager and @async_generator (which is needed to define async generators on py35, so that @asynccontextmanager has something to wrap)

what's nice about this is that not only will it get rid of like 6 extra stack frames that show up each time an exception goes through a nursery, it'll also fix a bug in how exceptions and nurseries interact (#393 )

there are probably also some smaller wins to be had, but they're smaller and won't fix other bugs. Or there are big things we could do like coming up with a general strategy for suppressing display of certain frames in tracebacks, but a goal for py38 is to move our special traceback logic into the interpreter, and while it's probably possible to convince python upstream to add code to suppress frames in tracebacks, it's a lot of work

@belm0
Copy link
Member

belm0 commented Aug 18, 2018

From #393 :

I think the only option is to give up on using @acontextmanager to implement open_nursery, and switch to implementing the async context manager protocol directly. This is a bit tricky, but I've actually done most of the work already (see also #340). I guess this will be the thing that pushes us over into actually switching to that! (And this will also fix #229, so that's nice.)

@belm0
Copy link
Member

belm0 commented Aug 22, 2018

I've seen some recent developer articles complaining about unintelligible exception traces from RxJava and Kotlin. Granted it presents future obstacles, but traces with elided frames would be a big differentiator for Trio.

https://proandroiddev.com/forget-rxjava-kotlin-coroutines-are-all-you-need-part-1-2-4f62ecc4f99b

@njsmith
Copy link
Member Author

njsmith commented Aug 22, 2018

@belm0 Oh yeah, it's absolutely a real problem, and we need to find a solution one way or another.

The thing about traceback elision that makes me hesitate is, in the long run, we have to find a way to get out of the business of monkeypatching every traceback printer in the world. It's just not maintainable. (See the discussion at the beginning of #611.) And that means that if we want traceback elision to work long-term, we need to come up with a way to move the whole Python ecosystem in that direction. This is doable, and should probably be done: the pytest folks have their own hack for it (__traceback_hide__, and since Python switched implementing most of the core import machinery in Python those frames have been causing various problems too (see bpo-24305 and linked bugs)... but pushing something like this through python-dev is a ton of work. Like "plan to spend multiple full-time person-months on it" level of work.

OTOH, keeping the tracebacks simple in the first place is something we can do with relatively small, straightforward changes today. In many cases it's probably simpler than elision (e.g., it's not totally obvious how to write/tune/test the heuristics, and you avoid needing to create/test/document tuning knobs like MIN_FRAMES_ELIDED). And, if we run out of low-hanging fruit and the tracebacks are still bad, then when we propose adding elision features to Python we can say we've done our due diligence and proved that the problem really does require interpreter changes.

Let's take a look at a concrete case. I'll use your example from #612 (comment):

Traceback (most recent call last):
  File "demo.py", line 58, in <module>
    test_multiplexer_with_error()
  File "demo.py", line 55, in test_multiplexer_with_error
    trio.run(runner2)

User code, this is what we want to see.

  File "/.../site_packages/trio/_core/_run.py", line 1264, in run
    return result.unwrap()
  File "/.../site-packages/outcome/_sync.py", line 107, in unwrap
    raise self.error
  File "/.../site_packages/trio/_core/_run.py", line 1374, in run_impl
    msg = task.context.run(task.coro.send, next_send)
  File "/.../site-packages/contextvars/__init__.py", line 38, in run
    return callable(*args, **kwargs)
  File "/.../site_packages/trio/_core/_run.py", line 957, in init
    self.entry_queue.spawn()
  File "/.../site_packages/trio/_core/_run.py", line 313, in __aexit__
    type(new_exc), new_exc, new_exc.__traceback__
  File "/.../contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/.../site_packages/trio/_core/_run.py", line 200, in open_cancel_scope
    yield scope
  File "/.../site_packages/trio/_core/_multierror.py", line 144, in __exit__
    raise filtered_exc
  File "/.../site_packages/trio/_core/_run.py", line 200, in open_cancel_scope
    yield scope
  File "/.../site_packages/trio/_core/_run.py", line 309, in __aexit__
    await self._nursery._nested_child_finished(exc)
  File "/.../site_packages/trio/_core/_run.py", line 415, in _nested_child_finished
    raise MultiError(self._pending_excs)
  File "/.../site_packages/trio/_core/_run.py", line 1374, in run_impl
    msg = task.context.run(task.coro.send, next_send)
  File "/.../site-packages/contextvars/__init__.py", line 38, in run
    return callable(*args, **kwargs)

This is... a lot of junk, mostly specific to entering trio. I'm not going to squint at it too closely right now, because it's the bottom of the traceback where it's most important to make things readable, plus I think a lot of this clutter will go away if we fix the things below. Maybe not all of it, but it'll be easier to revisit after dealing with the higher priority stuff.

  File "demo.py", line 52, in runner2
    nursery.start_soon(writer2, mx, (7,9))

User code; this is what we want to see.

  File "/.../site_packages/trio/_core/_run.py", line 313, in __aexit__
    type(new_exc), new_exc, new_exc.__traceback__
  File "/.../contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/.../site_packages/trio/_core/_run.py", line 200, in open_cancel_scope
    yield scope
  File "/.../site_packages/trio/_core/_multierror.py", line 144, in __exit__
    raise filtered_exc
  File "/.../site_packages/trio/_core/_run.py", line 200, in open_cancel_scope
    yield scope
  File "/.../site_packages/trio/_core/_run.py", line 309, in __aexit__
    await self._nursery._nested_child_finished(exc)
  File "/.../site_packages/trio/_core/_run.py", line 415, in _nested_child_finished
    raise MultiError(self._pending_excs)

Ick. This is what I was referring to here when I mentioned making the whole nursery __aexit__ path raise-free. I think that's pretty straightforward actually, and will reduce this to a single frame.

Also, when we stop compressing MultiErrors (see #611), this will be the point where we switch indentation levels and say "switched to a new task".

  File "/.../site_packages/trio/_core/_run.py", line 1374, in run_impl
    msg = task.context.run(task.coro.send, next_send)

This is noise... when we catch a user exception in run_impl, we should unconditionally discard the top frame in the traceback (which will be this one).

  File "/.../site-packages/contextvars/__init__.py", line 38, in run
    return callable(*args, **kwargs)

This line will disappear on 3.7, where contextvars.Context.run is implemented in C, so maybe not worth worrying about. Or we could discard it at the same time we fix the run_impl frame above.

  File "demo.py", line 15, in reader
    raise e
  File "demo.py", line 9, in reader
    value = await mx[key]
  File "multiplexer.py", line 41, in __getitem__
    value = await trio.hazmat.wait_task_rescheduled(abort_fn)

User code, this is what we want to see :-).

  File "/.../site_packages/trio/_core/_traps.py", line 159, in wait_task_rescheduled
    return (await _async_yield(WaitTaskRescheduled(abort_func))).unwrap()
  File "/.../site-packages/outcome/_sync.py", line 107, in unwrap
    raise self.error
Exception: Ka-Boom!

We could remove a line here by inlining unwrap into wait_task_rescheduled. Not a huge win, but unwrap is trivial and these lines probably show up in a lot of tracebacks? Specifically they show up whenever someone uses reschedule(task, Error(...)) to throw an exception into a task. That's not the most common way for an exception to be created, but it certainly happens.

So to-do list for now, ordered by bang-for-buck:

  • Discard 1 or 2 frames from exceptions when we catch them in the main loop in run_impl
  • Remove raise from the nursery __aexit__ path
  • Consider inlining unwrap into wait_task_rescheduled
  • Revisit the run setup code and see what's left and what can be done about it

@njsmith
Copy link
Member Author

njsmith commented Sep 2, 2018

The main cluttered tracebacks problem was fixed by #612, #631, #640, and for all the other complaints at the top of this thread the current plan for fixing them is in #611, so I'm going to close this.

There are probably still a few places where we can fine-tune exception tracebacks, but we can open a new issue for that. Or possibly even claim one of the reserved issues if we want to have a single place for people to report little nits as they notice them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants