Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8.3.4 changed atexit behavior #13021

Open
hynek opened this issue Dec 2, 2024 · 8 comments
Open

8.3.4 changed atexit behavior #13021

hynek opened this issue Dec 2, 2024 · 8 comments

Comments

@hynek
Copy link
Contributor

hynek commented Dec 2, 2024

So this is rather bizarre and was also wild to debug/bisect and I don't have an MRE, but maybe someone has an idea.

In a nutshell, due to 40741c4 / #12867), pytest crashes with a SEGFAULT after the tests.

Of course, it's not pytest that's crashing but the bane of my existence: https://github.com/sqlanywhere/sqlanydb

When running on commit 3d3ec57, I get an output like this after my tests already passed:

Exception during reset or similar
Traceback (most recent call last):
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlalchemy/pool/base.py", line 986, in _finalize_fairy
    fairy._reset(
    ~~~~~~~~~~~~^
        pool,
        ^^^^^
    ...<2 lines>...
        asyncio_safe=can_manipulate_connection,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlalchemy/pool/base.py", line 1432, in _reset
    pool._dialect.do_rollback(self)
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlalchemy/engine/default.py", line 699, in do_rollback
    dbapi_connection.rollback()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlanydb.py", line 657, in rollback
    return self.api.sqlany_rollback(self.con())
                                    ~~~~~~~~^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlanydb.py", line 648, in con
    self.handleerror(InterfaceError, "not connected", -101)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlanydb.py", line 644, in handleerror
    eh(self, None, errorclass, errorvalue, sqlcode)
    ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlanydb.py", line 383, in standardErrorHandler
    raise errorclass(errorvalue,sqlcode)
sqlanydb.InterfaceError: ('not connected', -101)
Exception terminating connection <sqlanydb.Connection object at 0x10e423d40>
Traceback (most recent call last):
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlalchemy/pool/base.py", line 986, in _finalize_fairy
    fairy._reset(
    ~~~~~~~~~~~~^
        pool,
        ^^^^^
    ...<2 lines>...
        asyncio_safe=can_manipulate_connection,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlalchemy/pool/base.py", line 1432, in _reset
    pool._dialect.do_rollback(self)
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlalchemy/engine/default.py", line 699, in do_rollback
    dbapi_connection.rollback()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlanydb.py", line 657, in rollback
    return self.api.sqlany_rollback(self.con())
                                    ~~~~~~~~^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlanydb.py", line 648, in con
    self.handleerror(InterfaceError, "not connected", -101)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlanydb.py", line 644, in handleerror
    eh(self, None, errorclass, errorvalue, sqlcode)
    ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlanydb.py", line 383, in standardErrorHandler
    raise errorclass(errorvalue,sqlcode)
sqlanydb.InterfaceError: ('not connected', -101)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlalchemy/pool/base.py", line 374, in _close_connection
    self._dialect.do_terminate(connection)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlalchemy/engine/default.py", line 705, in do_terminate
    self.do_close(dbapi_connection)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlalchemy/engine/default.py", line 708, in do_close
    dbapi_connection.close()
    ~~~~~~~~~~~~~~~~~~~~~~^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlanydb.py", line 688, in close
    c = self.con()
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlanydb.py", line 648, in con
    self.handleerror(InterfaceError, "not connected", -101)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlanydb.py", line 644, in handleerror
    eh(self, None, errorclass, errorvalue, sqlcode)
    ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlanydb.py", line 383, in standardErrorHandler
    raise errorclass(errorvalue,sqlcode)
sqlanydb.InterfaceError: ('not connected', -101)

When running on 40741c4 or later (and -q -X faulthandler, I get:

Fatal Python error: Segmentation fault

Current thread 0x0000000200a9e200 (most recent call first):
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlanydb.py", line 657 in rollback
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlalchemy/engine/default.py", line 699 in do_rollback
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlalchemy/pool/base.py", line 1432 in _reset
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlalchemy/pool/base.py", line 986 in _finalize_fairy
  File "/Users/hynek/Work/customer-api/.venv/lib/python3.13/site-packages/sqlalchemy/pool/base.py", line 729 in <lambda>

Stepping through the debugger unveils its part of atexit handling, which points at this unholy gem: https://github.com/sqlanywhere/sqlanydb/blob/338845cc18256d8b8c9a9b2383c58a196213b151/sqlanydb.py#L541

Anyone any idea? 😳

@hynek hynek changed the title 8.3.4 changes behavior in atexit behavior 8.3.4 changed atexit behavior Dec 2, 2024
@RonnyPfannschmidt
Copy link
Member

Astounding terrifying

I fear that the segfault is from a memory layout change and was latent to begin with

@RonnyPfannschmidt
Copy link
Member

Hows the behavior with assert rewrite disabled

@pfmoore
Copy link

pfmoore commented Dec 2, 2024

Are threads involved? There was a discussion here on weird behaviours of atexit handlers in the presence of threads.

I'm pretty sure the real answer is that what sqlanydb is doing is black magic of the darkest type, and they simply shouldn't do it - but that may not be much help...

@hynek
Copy link
Contributor Author

hynek commented Dec 2, 2024

Astounding terrifying

I fear that the segfault is from a memory layout change and was latent to begin with

Exactly my reaction and suspicion. 😰

(I’ll respond to all the qs when I’m back home)

@hynek
Copy link
Contributor Author

hynek commented Dec 2, 2024

Before I forget: I tried running with Python 3.12 and it behaves the same way between 3d3ec57 and 40741c4, so it's not a 3.13 thing.


Hows the behavior with assert rewrite disabled

I can confirm that --assert=plain fixes the segfault.

(for posterity, uv is pretty cool for this: uv run --with "git+https://github.com/pytest-dev/pytest@40741c4aca50582cc9701ff01504b9e6dcd3396f" python -q -X faulthandler -Im pytest -vvs --assert=plain)


Are threads involved? There was a discussion here on weird behaviours of atexit handlers in the presence of threads.

Not knowingly, but who's to say who started some weird background threads?

I'm pretty sure the real answer is that what sqlanydb is doing is black magic of the darkest type, and they simply shouldn't do it - but that may not be much help...

yes to both :|

@hynek
Copy link
Contributor Author

hynek commented Dec 2, 2024

Are threads involved? There was a discussion here on weird behaviours of atexit handlers in the presence of threads.

So I've done more pdb-stepping, and I found that there is, in fact, another thread and it's Sentry's <function SessionFlusher._ensure_running.<locals>._thread at 0x111bbd4e0> – but skipping Sentry's init doesn't change anything.


From what I can see in the weird segfault traceback and stepping thru the cleanups, it looks like sqlanydb cleans itself completely up in an atexit handler and then somehow, sqlalchemy's cleanup fairies wake up and try to do a clean up too, including a rollback, but the driver is unusable at this point.

All this is of no interest to y'all tho – the q is why assert rewriting should cause this to go from an exception to an segfault. 😳

@nicoddemus
Copy link
Member

the q is why assert rewriting should cause this to go from an exception to an segfault.

This is really bizarre... we can see from the diff that all 40741c4 did was copy some AST objects to preserve their original location. How that relates to the problem at hand is a mystery. 😬

@hynek
Copy link
Contributor Author

hynek commented Dec 3, 2024

How that relates to the problem at hand is a mystery. 😬

Yeah, as @RonnyPfannschmidt noted, 40741c4 has probably nothing to do with my problems directly.

But, generally speaking it looks like assert rewriting is having some influence over cleanup such that either:

  1. SQLAlchemy's _ConnectionFairys run later (after atexit, where the sqlanydb driver has already annihilated itself),
  2. somehow causes things to segfault because somethingsomething memory alignment somethingsomething???

I somewhat suspect it's rather 1, especially because I hope it ;) since it seems more straightforward.

My gut feeling says that the interpreter is in a bit of precarious state at that point in time anyways and I wouldn't bother y'all if I didn't suspect if that might be a more generalizable problem. 🤔


If the other person in the world who uses sqlanydb and SQLAlchemy finds this: I've worked around the problem by passing sqlalchemy.create_engine(..., pool_reset_on_return=False), because I do my own, explicit, rollbacks anyway).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants