Skip to content

Strict egs everywhere: drop use of strict_exception_groups=False throughout!#390

Merged
goodboy merged 24 commits intomainfrom
strict_egs_everywhere
Aug 18, 2025
Merged

Strict egs everywhere: drop use of strict_exception_groups=False throughout!#390
goodboy merged 24 commits intomainfrom
strict_egs_everywhere

Conversation

@goodboy
Copy link
Copy Markdown
Owner

@goodboy goodboy commented Aug 16, 2025

Making us future-proofed for any trio release which finally removes
the flag and thus all "loose support" throughout the runtime core.

Flag removal is made possible by a new
tractor.trionics.collapse_eg() stackable @acm which can be layed
onto trio.Nursery blocks and get effectively the same behaviour
(minus warnings and some tb mangling) as
strict_exeception_groups=False by delegating verbatim to trio's
internal trio._core._run.collapse_exception_group() except we,

  • remove the NONSTRICT_EXCEPTIONGROUP_NOTE deprecation-note
    guard-check; we know we want an explicit collapse when used.

  • mask out the traceback rewriting in collapse case, i don't think it
    really matters for us?

Note that the .collapse_eg() soln was already introduced prior (in
https://pikers.dev/goodboy/tractor/pulls/18) but wasn't yet used nor
was it implemented correctly until this patch..


Change-set lower level deats,

  • re-implementing collapse_eg() to not use except* (which in
    imho is a massive footgun of syntax they prolly should have never
    added, see my original rant-comment in 048b154) and instead catch
    BaseExceptionGroup and passing it directly to the trio
    collapser.

  • wrap our collapse_exception_group() delegation call in
    a maybe_collapse_eg() which predicates whether a single non-eg is
    found.

  • add a '( ^^^ this exc was collapsed from a group ^^^ )' note to
    any collapsed and re-raised exceptions.

  • since we do ^, also strip out any ""during the handling of <beg> the following.." beg traceback content by always raising
    a collapsed exec from its orig .__cause__ if possible or from None.

  • adding a bit of tooling to our .collapse_eg() including,

    • frame hiding via __tracebackhide__ = hid_tb
    • breakpointing support both in and outside the actor runtime with
      a new bp: bool flag which only breaks when a beg with >1
      sub-exceptions is detected, which is obviously handy to
      introspect (the source of) unexpected egs.

Updates to various test suites to match,

  • cluster API suites needed some tweaks that ended up having nothing
    to do with actual strict-eg-tns but are included here.

  • d2ac9ec a cancellation test needed a longer timeout in order to
    avoid the expected assert-error getting eg-embedded?

    • i added a case i still don't really grok as a pytest.xfail()
    • seems to have to do with the interplay of trio.fail_after()'s
      cancelling of the parent scope
  • 0ffcea1 adjusts a test_root_infect_asyncio.py suite to expect
    non-eg/collapsed-exc raises.


Misc core tweaks,

  • e3a542f ensures we never shield-wait
    ipc_server.wait_for_no_more_peers() which was causing hangs in
    some trionics related tests; the shield doesn't seem to be
    necessary in any case..

In follow up..

Base automatically changed from better_reprs to main August 16, 2025 21:20
@goodboy goodboy force-pushed the strict_egs_everywhere branch from b32121f to 0cbf02b Compare August 16, 2025 21:21
goodboy added 21 commits August 18, 2025 10:46
That is just throughout the core library, not the tests yet. Again, we
simply change over to using our (nearly equivalent?)
`.trionics.collapse_eg()` in place of the already deprecated
`strict_exception_groups=False` flag in the following internals,
- the conc-fan-out tn use in `._discovery.find_actor()`.
- `._portal.open_portal()`'s internal tn used to spawn a bg rpc-msg-loop
  task.
- the daemon and "run-in-actor" layered tn pair allocated in
  `._supervise._open_and_supervise_one_cancels_all_nursery()`.

The remaining loose-eg usage in `._root` and `._runtime` seem to be
necessary to keep the test suite green?? For the moment these are left
out.
Since the `bdb` module was added to the namespace lookup set in
`._exceptions.get_err_type()` we can now relay a RAE-boxed
`bdb.BdbQuit`.
Seems to cause the following test suites to fail however..

- 'test_advanced_faults.py::test_ipc_channel_break_during_stream'
- 'test_advanced_faults.py::test_ipc_channel_break_during_stream'
- 'test_clustering.py::test_empty_mngrs_input_raises'

Also tweak some ctxc request logging content.
Seems to add one more cancellation suite failure as well as now cause
the discovery test to error instead of fail?
I dunno what exactly I was thinking but we definitely don't want to
**ever** raise from the original exc-group, instead always raise from
any original `.__cause__` to be consistent with the embedded src-error's
context.

Also, adjust `maybe_collapse_eg()` to return `False` in the non-single
`.exceptions` case, again don't know what I was trying to do but this
simplifies caller logic and the prior return-semantic had no real
value..

This fixes some final usage in the runtime (namely top level nursery
usage in `._root`/`._runtime`) which was previously causing test suite
failures prior to this fix.
Replacing yet another loose-eg-flag. Also toss in a todo to maybe use
the unmasker around the `open_root_actor()` body.
Since it turns out the semantics are basically inverse of normal
`except` (particularly for re-raising) which is hard to get right, and
bc it's a lot easier to just delegate to what `trio` already has behind
the `strict_exception_groups=False` setting, Bp

I added a rant here which will get removed shortly likely, but i think
going forward recommending against use of `except*` is prudent for
anything low level enough in the runtime (like trying to filter begs).

Dirty deats,
- copy `trio._core._run.collapse_exception_group()` to here with only
  a slight mod to remove the notes check and tb concatting for the
  collapse case.
- rename `maybe_collapse_eg()` - > `get_collapsed_eg()` and delegate it
  directly to the former `trio` fn; return `None` when it returns the
  same beg without collapse.
- simplify our own `collapse_eg()` to either raise the collapsed `exc`
  or original `beg`.
It was originally this way; I forgot to flip it back when discarding the
`except*` handler impl..

Specially handle the `exc.__cause__` case where we raise from any
detected underlying cause and OW `from None` to suppress the eg's tb.
Seems that the way the actor-nursery interacts with the
`.trionics.gather_contexts()` API on cancellation makes our
`.trionics.collapse_eg()` not work as intended?

I need to dig into how `ActorNursery.cancel()` and `.__aexit__()` might
be causing this discrepancy..

Consider this a commit-of-my-index type save for rn.
Namely `test_empty_mngrs_input_raises()` was failing due to
lazy-iterator use as input to `mngrs` which i guess i added support for
a while back (by it doing a `list(mngrs)` internally)? So just change it
to `gather_contexts(mngrs=())` and also tweak the `trio.fail_after(3)`
since it appears that the prior 1sec was causing
too-fast-of-a-cancellation (before the cluster fully spawned) and thus
the expected `ValueError` never to show..

Also, mask the `tractor.trionics.collapse_eg()` usage (again?) in
`open_actor_cluster()` since it seems unnecessary.
Was failing due to the `.fail_after()` timeout being *too short* and
somehow the new interplay of that with strict-exception groups resulting
in the `TooSlowError` never raising but instead an eg with the embedded
`AssertionError`?? I still don't really get it honestly..

I've written up lengthy notes around the different `delay` settings that
can be used to see the diff outcomes, the failing case being the one
i still don't really grok and think is justification for `trio` to
bubble inner `Cancelled`s differently possibly?

For now i've included the original failing case as an `xfail`
parametrization for now which will hopefully drive a follow lowlevel
`trio` test in `test_trioisms`!
As mentioned in prior testing commit, it can cause the worst kind of
hangs, the SIGINT ignoring kind.. Pretty sure there was never any reason
outside some esoteric multi-actor debugging case, and pretty sure that
already was solved?
@goodboy goodboy force-pushed the strict_egs_everywhere branch from 048eb4a to e3a542f Compare August 18, 2025 14:51
@goodboy goodboy requested a review from guilledk August 18, 2025 15:37
await n.start(spawn_and_sleep_forever)
async with (

# XXX ?TODO? why no work!?
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one needs more investigation but likely interplays with stuff we'll resolve in #391?

@goodboy goodboy force-pushed the strict_egs_everywhere branch from 5188d2a to 88c1c08 Compare August 18, 2025 17:31
@goodboy goodboy merged commit b05abea into main Aug 18, 2025
2 checks passed
@goodboy goodboy deleted the strict_egs_everywhere branch August 18, 2025 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants