Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transaction context fails to relay the original exception if rollback raises. #302

Closed
skandalfo opened this issue Aug 10, 2018 · 14 comments
Closed
Labels
bug Describes a bug in the system.
Milestone

Comments

@skandalfo
Copy link

skandalfo commented Aug 10, 2018

  • GINO version: 0.7.5
  • Python version: 3.5.3
  • asyncpg version: 0.17.0
  • aiocontextvars version: 0.1.2
  • PostgreSQL version: 9.6.9

Description

I'm using PostgreSQL with serializable isolation level transactions. I've tried to write a decorator to make Python object functions retry the whole transaction whenever a transaction is cancelled by a serialization conflict (or a transient database disconnect).

What I Did

I wrote this code for a decorator(that depends on a specific hierarchies on for the class of the object decorated, but that's a detail):

from asyncpg.exceptions import SerializationError, PostgresConnectionError

def serializable_transaction(num_retries=100, connection_error_delay=10):
    def decorate(fn):
        async def wrapper(self, *args, **kwargs):
            retries = num_retries
            while True:
                try:
                    async with self.server.db.transaction(isolation='serializable'):
                        return await fn(self, *args, **kwargs)
                except SerializationError:
                    self.logger.warning(
                        F('Transaction serialization failure, {#E:{}#} retries left.', retries),
                        exc_info=True)
                    retries -= 1
                    if retries <= 0:
                        raise
                except PostgresConnectionError:
                    self.logger.warning(
                        F('Database connection failure, {#E:{}#} retries left.', retries),
                        exc_info=True)
                    retries -= 1
                    if retries <= 0:
                        raise
                    await self.sleep(connection_error_delay)

        return wrapper

    return decorate

The original coroutine is awaited for within an async with context for the transaction. If a SerializationError is raised, the whole thing is retried.

But this is not working as intended. The reason for that being the case lies in here:

https://github.com/fantix/gino/blob/d074dc4304cb050ba5b49a599b59af1d0d6c8e64/gino/transaction.py#L178

When a SerializationError bubbles up, the context tries to roll the transaction back. But the transaction has already been aborted, and rolling back fails with a different exception, which is the one that comes out of the context manager code.

At that point the caller has lost the information about what the original exception was, and thus my decorator doesn't work.

A possible approach would be to just log any exception raised by the statement in the link, and continue to raise the original exception.

@wwwjfy
Copy link
Member

wwwjfy commented Aug 11, 2018

I suspect it's something else causing your situation. Because:

  1. If an exception happens when handling exceptions, Python will give tracebacks from all exceptions by During handling of the above exception, another exception occurred:

  2. When the error happens, the transaction status is failed, NOT rolled back, rollback command shouldn't cause exceptions. I tried the following in psql:

gino=# begin transaction isolation level serializable ;
BEGIN
Time: 0.635 ms
gino=# update prices set value = 3;
ERROR:  could not serialize access due to concurrent update
Time: 1803.009 ms (00:01.803)
gino=# rollback;
ROLLBACK
Time: 0.216 ms
gino=#
  1. In asyncpg, the only place to set transaction status to rolled back is in the rollback method. So I suspect in your code somewhere rollback is called already. https://github.com/MagicStack/asyncpg/blob/master/asyncpg/transaction.py#L203

An easy fix would be to check transaction status before calling self._tx.rollback() , but I'd like to see where your issue come from.

@skandalfo
Copy link
Author

I didn't capture the exception message at the time, but the reason I came to file this bug was that I read the message like "can't rollback an already-aborted transaction".

All transaction management in my code is use the engine's async with mode, and all SQL to the database goes through GINO (albeit with explicit specification of bind all the time, rather than using the shorthand mechanism).

At some point calls of several functions decorated like the above might have been nested, but that (due to the support of nested transactions) shouldn't be an issue.

I'll undo my local patch and see if I can reproduce the situation, then report back.

@skandalfo
Copy link
Author

I found the traces in syslog. 😄

This is (excluding other server logging) what I get with just embedding that rollback call in a try: except: pass exception-ignoring block:

Aug 10 15:49:01 BanaNAS telescreen[29415]: Traceback (most recent call last):
Aug 10 15:49:01 BanaNAS telescreen[29415]:   File "/usr/local/lib/python3.5/dist-packages/TeleScreen-0.1-py3.5.egg/telescreen/db_utils.py", line 13, in wrapper
Aug 10 15:49:01 BanaNAS telescreen[29415]:     return await fn(self, *args, **kwargs)
Aug 10 15:49:01 BanaNAS telescreen[29415]:   File "/usr/local/lib/python3.5/dist-packages/TeleScreen-0.1-py3.5.egg/telescreen/signals.py", line 53, in _check_expiration
Aug 10 15:49:01 BanaNAS telescreen[29415]:     await active_signal.delete(bind=self.server.db)
Aug 10 15:49:01 BanaNAS telescreen[29415]:   File "/usr/local/lib/python3.5/dist-packages/gino-0.7.5-py3.5.egg/gino/crud.py", line 559, in _delete
Aug 10 15:49:01 BanaNAS telescreen[29415]:     return (await bind.status(clause))[0]
Aug 10 15:49:01 BanaNAS telescreen[29415]:   File "/usr/local/lib/python3.5/dist-packages/gino-0.7.5-py3.5.egg/gino/engine.py", line 697, in status
Aug 10 15:49:01 BanaNAS telescreen[29415]:     return await conn.status(clause, *multiparams, **params)
Aug 10 15:49:01 BanaNAS telescreen[29415]:   File "/usr/local/lib/python3.5/dist-packages/gino-0.7.5-py3.5.egg/gino/engine.py", line 364, in status
Aug 10 15:49:01 BanaNAS telescreen[29415]:     return await result.execute(status=True)
Aug 10 15:49:01 BanaNAS telescreen[29415]:   File "/usr/local/lib/python3.5/dist-packages/gino-0.7.5-py3.5.egg/gino/dialects/base.py", line 207, in execute
Aug 10 15:49:01 BanaNAS telescreen[29415]:     context.statement, context.timeout, args, 1 if one else 0)
Aug 10 15:49:01 BanaNAS telescreen[29415]:   File "/usr/local/lib/python3.5/dist-packages/gino-0.7.5-py3.5.egg/gino/dialects/asyncpg.py", line 175, in async_execute
Aug 10 15:49:01 BanaNAS telescreen[29415]:     query, executor, timeout)
Aug 10 15:49:01 BanaNAS telescreen[29415]:   File "/usr/local/lib/python3.5/dist-packages/asyncpg-0.17.0-py3.5-linux-armv7l.egg/asyncpg/connection.py", line 1410, in _do_execute
Aug 10 15:49:01 BanaNAS telescreen[29415]:     result = await executor(stmt, None)
Aug 10 15:49:01 BanaNAS telescreen[29415]:   File "asyncpg/protocol/protocol.pyx", line 212, in bind_execute
Aug 10 15:49:01 BanaNAS telescreen[29415]:   File "/usr/lib/python3.5/asyncio/futures.py", line 380, in __iter__
Aug 10 15:49:01 BanaNAS telescreen[29415]:     yield self  # This tells Task to wait for completion.
Aug 10 15:49:01 BanaNAS telescreen[29415]:   File "/usr/lib/python3.5/asyncio/tasks.py", line 304, in _wakeup
Aug 10 15:49:01 BanaNAS telescreen[29415]:     future.result()
Aug 10 15:49:01 BanaNAS telescreen[29415]:   File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
Aug 10 15:49:01 BanaNAS telescreen[29415]:     raise self._exception
Aug 10 15:49:01 BanaNAS telescreen[29415]: asyncpg.exceptions.SerializationError: could not serialize access due to read/write dependencies among transactions

And this was before the exception-ignoring "fix":

Aug  9 21:51:01 BanaNAS telescreen[25030]: Traceback (most recent call last):
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/gino-0.7.5-py3.5.egg/gino/transaction.py", line 174, in __aexit__
Aug  9 21:51:01 BanaNAS telescreen[25030]:     await self._tx.commit()
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/gino-0.7.5-py3.5.egg/gino/dialects/asyncpg.py", line 241, in commit
Aug  9 21:51:01 BanaNAS telescreen[25030]:     await self._tx.commit()
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/asyncpg-0.17.0-py3.5-linux-armv7l.egg/asyncpg/transaction.py", line 211, in commit
Aug  9 21:51:01 BanaNAS telescreen[25030]:     await self.__commit()
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/asyncpg-0.17.0-py3.5-linux-armv7l.egg/asyncpg/transaction.py", line 179, in __commit
Aug  9 21:51:01 BanaNAS telescreen[25030]:     await self._connection.execute(query)
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/asyncpg-0.17.0-py3.5-linux-armv7l.egg/asyncpg/connection.py", line 273, in execute
Aug  9 21:51:01 BanaNAS telescreen[25030]:     return await self._protocol.query(query, timeout)
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "asyncpg/protocol/protocol.pyx", line 332, in query
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/lib/python3.5/asyncio/futures.py", line 380, in __iter__
Aug  9 21:51:01 BanaNAS telescreen[25030]:     yield self  # This tells Task to wait for completion.
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/lib/python3.5/asyncio/tasks.py", line 304, in _wakeup
Aug  9 21:51:01 BanaNAS telescreen[25030]:     future.result()
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
Aug  9 21:51:01 BanaNAS telescreen[25030]:     raise self._exception
Aug  9 21:51:01 BanaNAS telescreen[25030]: asyncpg.exceptions.SerializationError: could not serialize access due to read/write dependencies among transactions
Aug  9 21:51:01 BanaNAS telescreen[25030]: DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.
Aug  9 21:51:01 BanaNAS telescreen[25030]: HINT:  The transaction might succeed if retried.
Aug  9 21:51:01 BanaNAS telescreen[25030]: During handling of the above exception, another exception occurred:
Aug  9 21:51:01 BanaNAS telescreen[25030]: Traceback (most recent call last):
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/TeleScreen-0.1-py3.5.egg/telescreen/__init__.py", line 133, in main
Aug  9 21:51:01 BanaNAS telescreen[25030]:     *[server.run() for server in servers.values()], loop=loop))
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "uvloop/loop.pyx", line 1364, in uvloop.loop.Loop.run_until_complete
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
Aug  9 21:51:01 BanaNAS telescreen[25030]:     raise self._exception
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
Aug  9 21:51:01 BanaNAS telescreen[25030]:     result = coro.throw(exc)
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/TeleScreen-0.1-py3.5.egg/telescreen/serverobject.py", line 99, in run
Aug  9 21:51:01 BanaNAS telescreen[25030]:     await self.start()
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/TeleScreen-0.1-py3.5.egg/telescreen/serverobject.py", line 65, in start
Aug  9 21:51:01 BanaNAS telescreen[25030]:     await self._start()
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/TeleScreen-0.1-py3.5.egg/telescreen/server.py", line 196, in _start
Aug  9 21:51:01 BanaNAS telescreen[25030]:     await self._start_group(self.signals)
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/TeleScreen-0.1-py3.5.egg/telescreen/server.py", line 164, in _start_group
Aug  9 21:51:01 BanaNAS telescreen[25030]:     await self.gather(*[component.start() for component in group.values()])
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/TeleScreen-0.1-py3.5.egg/telescreen/serverobject.py", line 124, in gather
Aug  9 21:51:01 BanaNAS telescreen[25030]:     return await asyncio.gather(*coros_or_futures, return_exceptions=return_exceptions, loop=self.server.loop)
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/lib/python3.5/asyncio/futures.py", line 380, in __iter__
Aug  9 21:51:01 BanaNAS telescreen[25030]:     yield self  # This tells Task to wait for completion.
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/lib/python3.5/asyncio/tasks.py", line 304, in _wakeup
Aug  9 21:51:01 BanaNAS telescreen[25030]:     future.result()
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
Aug  9 21:51:01 BanaNAS telescreen[25030]:     raise self._exception
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step
Aug  9 21:51:01 BanaNAS telescreen[25030]:     result = coro.send(None)
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/TeleScreen-0.1-py3.5.egg/telescreen/serverobject.py", line 65, in start
Aug  9 21:51:01 BanaNAS telescreen[25030]:     await self._start()
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/TeleScreen-0.1-py3.5.egg/telescreen/signals.py", line 23, in _start
Aug  9 21:51:01 BanaNAS telescreen[25030]:     await self._check_expiration(when)
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/TeleScreen-0.1-py3.5.egg/telescreen/db_utils.py", line 13, in wrapper
Aug  9 21:51:01 BanaNAS telescreen[25030]:     return await fn(self, *args, **kwargs)
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/gino-0.7.5-py3.5.egg/gino/engine.py", line 176, in __aexit__
Aug  9 21:51:01 BanaNAS telescreen[25030]:     return await tx.__aexit__(*exc_info)
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/gino-0.7.5-py3.5.egg/gino/transaction.py", line 178, in __aexit__
Aug  9 21:51:01 BanaNAS telescreen[25030]:     await self._tx.rollback()
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/gino-0.7.5-py3.5.egg/gino/dialects/asyncpg.py", line 244, in rollback
Aug  9 21:51:01 BanaNAS telescreen[25030]:     await self._tx.rollback()
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/asyncpg-0.17.0-py3.5-linux-armv7l.egg/asyncpg/transaction.py", line 219, in rollback
Aug  9 21:51:01 BanaNAS telescreen[25030]:     await self.__rollback()
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/asyncpg-0.17.0-py3.5-linux-armv7l.egg/asyncpg/transaction.py", line 187, in __rollback
Aug  9 21:51:01 BanaNAS telescreen[25030]:     self.__check_state('rollback')
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/asyncpg-0.17.0-py3.5-linux-armv7l.egg/asyncpg/transaction.py", line 165, in __check_state
Aug  9 21:51:01 BanaNAS telescreen[25030]:     self.__check_state_base(opname)
Aug  9 21:51:01 BanaNAS telescreen[25030]:   File "/usr/local/lib/python3.5/dist-packages/asyncpg-0.17.0-py3.5-linux-armv7l.egg/asyncpg/transaction.py", line 157, in __check_state_base
Aug  9 21:51:01 BanaNAS telescreen[25030]:     opname))
Aug  9 21:51:02 BanaNAS telescreen[25030]: asyncpg.exceptions._base.InterfaceError: cannot rollback; the transaction is in error state

So this might be an asyncpg issue?

@skandalfo
Copy link
Author

Aaand... apparently asyncpg and postgresql diverge about whether a rollback on a failed transaction should report failure or not:

https://github.com/MagicStack/asyncpg/blob/a6fdf5c7290cd0dc1ac5ba63c599818649d1625e/asyncpg/transaction.py#L145

wwwjfy added a commit to wwwjfy/gino that referenced this issue Aug 12, 2018
running rollback on failed or already rolled back transactions has no
effect and asyncpg would throw exceptions
@wwwjfy
Copy link
Member

wwwjfy commented Aug 12, 2018

After I reproduced your error, I found I overlooked the error happens when committing the transaction, not during fn.

I've submitted #304, which should be able to fix your issue.

@skandalfo
Copy link
Author

Thanks!

It looks a little bit rough to inspect an internal _state attribute of asyncpg.Transaction, but then asyncpg isn't really offering any alternative other than trying to commit/rollback and checking inside a generic exception (or ignoring it).

@wwwjfy
Copy link
Member

wwwjfy commented Aug 12, 2018

Right. I want to avoid to commit/rollback and just catch any exception, which may bury undesired bugs.

@fantix
Copy link
Member

fantix commented Aug 13, 2018

Thanks for the issue report and Tony's PR!

apparently asyncpg and postgresql diverge about whether a rollback on a failed transaction should report failure or not:

https://github.com/MagicStack/asyncpg/blob/a6fdf5c7290cd0dc1ac5ba63c599818649d1625e/asyncpg/transaction.py#L145

This is not true. A failed transaction in PostgreSQL does not usually have TransactionState.FAILED which only indicates failing begin, commit or rollback commands issued by asyncpg.Transaction. Therefore below example shall raise the correct exception without the cannot rollback; the transaction is in error state:

async with db.transaction():
    await db.status('select 1/0')

Actually, TransactionState.FAILED is used by asyncpg to guarantee that no further start(), commit() or rollback() could be called again on the same asyncpg.Transaction object after a failure of any of them occurs.

Therefore the root issue is that, GINO shouldn't call rollback() on the same transaction after commit() fails - a failing commit() will reset the transaction status, it is unnecessary to call rollback() again. And we don't need to worry about _state anymore.

@fantix fantix added the bug Describes a bug in the system. label Aug 13, 2018
@fantix fantix added this to the v0.7.x milestone Aug 13, 2018
@wwwjfy
Copy link
Member

wwwjfy commented Aug 13, 2018

My intention to check the state is also to prevent exceptions when users call rollback multiple times on the same transaction. It might not be our job though, I'll fix

@fantix
Copy link
Member

fantix commented Aug 13, 2018

Yeah I got that. Thanks for the fix!

@fantix fantix closed this as completed in f75dfa6 Aug 13, 2018
@skandalfo
Copy link
Author

I'd like to reopen this.

If you look at the first backtrace I pasted, the SerializationError was actually raised in a delete operation, and not the final commit call.

In this case... I'd expect the transaction to be in the failed state already (at least in postgres; not sure about asyncpg's idea of it). Would the unconditional rollback-on-exception fail then with the same error again?

@wwwjfy
Copy link
Member

wwwjfy commented Aug 14, 2018

@skandalfo In my first reply #302 (comment), I was using a demo where the exception happens in an update statement. This should be the same case as you mentioned.

When a normal (other than begin, commit, rollback) statement fails in asyncpg, the transaction state in asyncpg won't change. But the rollback at the exit of the transaction won't fail, i.e. the code in your example should work as expected. I tested in my own demo. It'll help if there is a way to reproduce your issue.

@fantix
Copy link
Member

fantix commented Aug 14, 2018

Yeah Tony's right, the unconditional rollback-on-exception is supposed to succeed. Please try with latest master in your code if possible, and feel free to reopen if bug persists.

@skandalfo
Copy link
Author

I'll make sure I upgrade to master ASAP, and will keep an eye on the logs, but I don't expect to see anything weird now.

Thanks for clarifying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Describes a bug in the system.
Projects
None yet
Development

No branches or pull requests

3 participants