-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlite3 module breaks transactions and potentially corrupts data #54949
Comments
The python sqlite module automatically commits open transactions Attached patch addresses the issue. Patch is against 2.6.1, but Patch also allows pragma statement. |
Here are some tests. |
See also bpo-8145. It would be nice if someone could sort all this out, but I'm not knowledgeable enough to do so. For this patch, it would be a significant change it behaviour. Therefore it would have to be a new feature controlled by a flag of some sort (which is part of the reason for the reference to bpo-8145). |
I find the way that the sqlite3 module handles transactions pretty We will apply this patch to our python build because having DDL Thanks |
I want transactional DDL too. I was tremendously surprised that I could not duplicate the way sqlite3 behaves on the command line from witin pysqlite. Instead of this patch, I would rather be able to instruct pysqlite to always begin a transaction for any kind of statement (I hear this is a requirement for DB-API compliance) and never perform an implicit commit for any reason. For example, someone on the google code project had a complaint that 'SAVEPOINT' (create a subtransaction) automatically commits because pysqlite doesn't know about it. I tried to trick pysqlite into doing what I wanted by prepending /* update */ to every CREATE TABLE statement but it didn't seem to quite work. Instead, I would prefer a pysqlite that does not strcmp(statement) at all. |
The attached patch is my take on this issue. I ran into the problem that during schema upgrades dropping a table was not rolled back. In another instance, renaming a table was not rolled back. This greatly increases the risk of data loss for our application. Because I do not currently foresee which commands might need dropping out of a transaction and because of backwards compatibility woes, I added a new field to the sqlite3.Connection: operation_needs_transaction_callback This function is called to decide if a running transaction should be implicitly committed (I'd consider this dangerous), if a transaction has to be started if not running (should normally always hold) or if the transaction state should be left alone. For example, the "pragma foreign_keys = ..." is a no-op inside a transaction, therefore an implicit begin would be possibly harmful. In our application, we enable foreign keys when getting the connection out of the SQLAlchemy pool via a pool listener, which would be disabled if there is an implicit begin triggered. The patch also adds a bunch of unit tests to cover the new behaviour. |
Same patch for Python 3. In fact, this also adds a missing Py_XDECREF. |
Torsten basically you are suggesting that PRAGMA would never work at all with my 'do not strcmp() the sql at all, always begin a transaction' approach? |
No. Most pragmas should still work and getting the current setting of a pragma should also work. I was only referring to http://www.sqlite.org/pragma.html#pragma_foreign_keys Quote:
I did not see such a restriction for other pragmas though I admit that I did not reread the list in full. |
The attached patch is an updated version which adds a bit of documentation. |
I updated the patch for upstream pysqlite2. Available at http://code.google.com/p/pysqlite/issues/detail?id=24 Patch over there is for Python 2 (tested with our production Python 2.6). |
Opened http://bugs.python.org/issue12997 in case there is a way to solve the foreign_key PRAGMA issue with a less disruptive fix. |
The proposed patches don't fix the problem. They may well allow DDL in transactions, but that's not the real problem. A database library must *NEVER* implicitly commit or rollback. That's completely insane. >>> import this
...
Explicit is better than implicit. If a statement can't be executed in a transaction, then trying to execute such a statement in a transaction is an error. Period. An exception *MUST* be raised. It is wrong to guess that the user really wanted to commit first. >>> import this
...
Errors should never pass silently.
...
In the face of ambiguity, refuse the temptation to guess. If such an exception is raised, you'll run into the problem exactly once before figuring out that you need to commit or rollback first. You can then change your code to safely and explicitly account for that limitation. This issue is not an enhancement, it's a bug. One might argue that fixing it will break existing code, but it won't. Any affected code is already broken. It's depending on a promised level of safety, and this bug is breaking that promise. It's better to fail noisily and consistently than to usually work, but sometimes silently corrupt data. As is, it's pretty much guaranteed that there is existing code that does DDL expecting to be in a transaction when it isn't. Even a well-tested schema upgrade can fail in production if the data is slightly different that in a testing database. Unique constraints can fail, etc. I frequently use the psycopg2 module and psql command, which get this issue right. They've saved me several times. The current behavior is buggy and should be changed to be correct. There should not merely be an option to enable the correct behavior. There should also be no option to enable the old buggy behavior. It's too dangerous. In general, it's a code-smell to inspect the SQL that's being executed before sending it to the database. The only reason I can think of to inspect SQL is to work around brokenness in the underlying database. Suppose, for example, that SQLite implicitly committed when it got DDL. (I don't know what it actually does. Just suppose.) In that case, it would be necessary to check for DDL and raise an exception before passing the statement to the database. If the database correctly errors in such a situation, then the python wrapper should just pass the DDL to the database and deal with the resulting error. That's way easier that trying to correctly detect what the statement type is. Parsing SQL correctly is hard. As evidence of that, note that the existing statement detection code is broken: it doesn't strip comments first! A simple SELECT statement preceeded by a comment will implicitly commit! But commenting is good. >>> import this
...
Readability counts. So it's not even just an issue of DDL or PRAGMAs! Anything that causes the underlying database to implicitly commit must be avoided (yet the python module goes out of it's way to add *MORE* such buggy behavior!) Fixing brokenness in the underlying database is the only reason to inspect SQL as it passes through this module. If there are other ways to avoid such problems, they will likely be better than inspecting SQL. Anything which causes implicit rollbacks in the underlying database is a similar issue, but easier to handle without parsing SQL. It's not safe to implicitly rollback, even if a new transaction is begun. For example, you might be doing foreign key validation outside the database. If one INSERT were implicitly rolled back and a new transaction begun, a second INSERT may then succeed and be committed. Now you have a constraint violation, even though you correctly checked it before. If there is anything that triggers an implicit rollback in the underlying database connection, those rollbacks must be dectected all subsequent statements or actions on the python wrapper *should* raise exceptions until an explicit rollback is performed, except for commit() which *must* raise. For example: con.isolation_level = 'DEFERRED'
try:
# Nothing in here can implicitly commit or rollback safely.
# Doing so risks data corruption.
cur.execute("INSERT INTO foo ...")
If nothing in the database does implicit COMMITs or ROLLBACKs, then it's probably sufficient to leave all the error handling to the database and only raise exceptions when it indicates problems. Here is an example demonstrating that a SELECT can trigger an implicit commit. Python 2.7.3 (default, Sep 26 2013, 20:03:06)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> # Create a connection.
... import sqlite3
>>> con = sqlite3.connect(":memory:", isolation_level='DEFERRED')
>>> cur = con.cursor()
>>>
>>> # Create some state.
... cur.execute("CREATE TABLE foo (i int);")
<sqlite3.Cursor object at 0x7fc1b7d51500>
>>> cur.execute("INSERT INTO foo VALUES (0);")
<sqlite3.Cursor object at 0x7fc1b7d51500>
>>> con.commit()
>>>
>>> # Change the state, do a SELECT, then rollback.
... cur.execute("UPDATE foo SET i=1")
<sqlite3.Cursor object at 0x7fc1b7d51500>
>>> cur.execute("SELECT 1;")
<sqlite3.Cursor object at 0x7fc1b7d51500>
>>> con.rollback()
>>>
>>> # Verify that the rollback worked.
... cur.execute("SELECT i FROM foo")
<sqlite3.Cursor object at 0x7fc1b7d51500>
>>> i = cur.fetchone()[0]
>>> print 'i is', i
i is 0
>>> assert i == 0
>>>
>>> # Change some state, do a different SELECT, then attempt to rollback.
... cur.execute("UPDATE foo SET i=2")
<sqlite3.Cursor object at 0x7fc1b7d51500>
>>> # This will incorrectly commit:
... cur.execute("""
... /*
... * Comments are good for readability, but because of this bug, they can
... * cause incorrect implicit commits.
... */
... SELECT 1;
... """)
<sqlite3.Cursor object at 0x7fc1b7d51500>
>>> con.rollback()
>>>
>>>
>>> # The rollback succeeded and rolling back a different transaction than
>>> # the one we expected.
... cur.execute("SELECT i FROM foo")
<sqlite3.Cursor object at 0x7fc1b7d51500>
>>> i = cur.fetchone()[0]
>>> print 'i is', i
i is 2
>>> assert i == 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError |
well Adam, you might also be surprised to see pysqlite not doing very well on the *other* side of the equation either; that is, when it *begins* the transaction. Issue http://bugs.python.org/issue9924 refers to this and like this one, hasn't seen any activity since 2011. You might be interested in the rationale on that one, which is that python apps may wish to have some degree of concurrency against a sqlite file for an application that is doing all SELECTs. I am certainly in favor of a pure pep-249 approach that emits BEGIN on the first execute() call period, and never implicitly rolls back or commits. However, I disagree this should be enabled by default nor that there should not be any option for the old behavior. I also think the "delayed BEGIN" feature should still be available to counteract SQLite's particular difficulty with concurrency. If there is code that relies upon a bug in order to function, which would then fail to function in that way if the bug is fixed, then by definition fixing that bug is a backwards-incompatible change. Python std lib can't afford to roll out a change that blatantly backwards-incompatible. The issue regarding comments not being parsed unfortunately should also be a separate issue that needs to be fixed, as nasty as it is that pysqlite tries to parse SQL. I disagree that most users are expecting SQLite's DDL to be transactional; transactional DDL is a luxury feature to many, including the vast numbers of developers that use MySQL, not to mention Oracle, neither of which have any transactional DDL capabilities. |
On Thu, 09 Jan 2014 20:29:15 +0000, Adam Tomjack <report@bugs.python.org> wrote:
That's not how our backward compatibility policy works. If a program Now, that said, silent data corruption can change that calculus, but It is possible that there are specific things that can be fixed without
There are reasons why the sqlite module was designed the way it was,
By default, SQLite starts a transaction when any command that changes
Like I said, this would need to be a new feature, due to our backward
If you want full control of transactions while using the sqlite module,
Yes, the fact that statement detection (and what statements are But note that in Python code it would be much more natural to
Yep.
Do you have a better way to suggest?
There are no implicit rollbacks at issue anywhere here, that I can see.
Yes, the comment issue should be fixed. That's a different bug, though, I'm sorry about that, but Python has a strong commitment to backward |
could we please get the option to opt-out of that behaviour, as a extra connection option maybe with the normal python sqlite bindings its impossible |
Opt out of what behavior? |
the sqlite binding deciding how to handle transactions |
As noted above you get that by setting isolation_level to None. That feature has always been available. (With isolation_level set to None, the sqlite wrapper module itself never issues any BEGIN statements.) |
http://hg.python.org/releasing/3.4/file/447c758cdc26/Modules/_sqlite/cursor.c#l789 also i dont see the isolation level being taking into account in other parts of the code |
This issue has been open for 4 years, last update was 2 months ago. Lack of transactional DDL is a big deal for Python programs that use SQLite heavily. We have a patch for Python 3 that applies cleanly and as far as I can tell works fine. I've been using it in testing for months, and I'm about to deploy it in production. What's the next step here? What do we need to do to get something merged? |
This bug appears to be fixed upstream: ghaering/pysqlite@f254c53 |
It's not a backward compatible change, so we'll need a migration strategy if we want to apply this (and I'd certainly like to). |
Please note that after the mentioned commit, I restored backwards compatibility with commit ghaering/pysqlite@796b3af Now the only difference is that the implicit commits *before* DDL statements are gone. I consider these a bug anyway. |
Excellent. |
While I agree that the current behaviour is a bug (this bug), and one that renders the package unusable for me (I used apsw or different language instead), I unfortunately have to disagree with Gerhard that the change is not a problem. It can be. The implicit commit before DDL statements that is gone is not a problem, but the implicit commit *after* them that is *also* gone is. Because if somebody only does DDL statements, and upgrade script might do just that, they will now need a commit. Scripts that use dbapi2 in a portable way will have the commit, because other databases that support DDL in transactions require it already. But scripts that are only used with current sqlite and possibly mysql (behaviour of which the current version mimics) may not and will break. |
Oh, I see I misunderstood Gerhard's last commit. So now the problem should be only if there is a DML statement followed by DDL statement and no commit afterwards. Well, that is indeed probably stupid. |
Since a better solution seems to be around the corner, I'm withdrawing my proposal. I'm attaching the current state of my patch. It isn't functional. Mostly it proves that my API proposal is very inconvenient to implement in C. That's why I kept it around for over a year without completing it. |
Implicit transactions can also cause mysterious "database is locked" bugs, even when a large timeout is set on the connection. Many, many people are affected by this bug (search the web for "python sqlite database is locked"). The attached code demonstrates this bug and possible (unintuitive) fixes. If there won't be a fix soon, then at least the documentation should note this quirky behavior. |
Isn't this issue at least partly about the statement parsing code in the sqlite module? I've fixed this upstream a while ago in ghaering/pysqlite@94eae50 Could somebody perhaps bring this to the Python repository. Here's a documentation update that's also required then: |
@rian - implicit transactions are part of the DBAPI spec. Looking at the original description, the purpose of this bug is to not *end* the transaction when DDL is received. So there's no solution for "database is locked" here, other than pysqlite's usual behavior of not emitting BEGIN until DML is encountered (which works well). |
@mike Yes you're right, This bug report is different, my mistake. DB-API may require implicit transactions but pysqlite should open a transaction before *any* non-DDL statement, including "SELECT", not just DML statements. Currently one must issue a dummy DML statement just to open a transaction that otherwise would start with a SELECT statement. I see nothing in DB-API (https://www.python.org/dev/peps/pep-0249/) that says transactions should implicitly open before DML statements and not SELECT statements. |
@rian - your issue for this is http://bugs.python.org/issue9924. The implicit BEGIN in all cases will probably never be the default but we do need an option for this to be the case, in order to support SERIALIZABLE isolation. |
Here is a patch. It contains the following commits:
Changes:
Note: sqlite3_stmt_readonly() has been added in SQLite 3.7.4 (released on 2010-12-07) so this patch will make older SQLite versions unusable with the sqlite3 module. |
Here is an updated patch. I'd like to get this in before 3.6 beta 1. |
The latest patch removes the current statement parsing and unexpected implicit commits. It looks good to me. Unfortunately it also introduces a new kind of statement parsing that detects DDL statements and doesn't open a transaction in that case, while it should. See ghaering/pysqlite#105 for details. As a consequence, this change will allow Django to remove one of the two special cases for transaction in SQLite: but not the other: It's still a step forward so I support merging it. |
I'm in favor of merging Berker's patch, once the string comparison is made a little more robust. Right now there's a risk of matching a prefix rather than a whole word, and potentially overrunning the buffer (unless I've forgotten how stricmp works). |
New changeset 284676cf2ac8 by Berker Peksag in branch 'default': |
I'm using Python 3.6.0 RC2, I don't know if it's a bug. When I try to run VACUUM command, an exception raised: conn.execute('begin') # <- remove this line get the same result
conn.execute('VACUUM') sqlite3.OperationalError: cannot VACUUM from within a transaction ~~~~~~~~~~~~~~~~~~~ |
It is an unexpected change in behavior and is therefore probably a bug. It will work if you set isolation_level=None. Because of that I don't think it is a release critical bug, though one could wish we'd discovered it earlier. Please open a new issue for this. |
bpo-29003 created |
While attempting to build a Python 3.6 RPM for RHEL/CentOS 6, I noticed the following warning. *** WARNING: renaming "_sqlite3" since importing it failed: build/lib.linux-i686-3.6-pydebug/_sqlite3.cpython-36dm-i386-linux-gnu.so: undefined symbol: sqlite3_stmt_readonly The resolution of this issue introduced usage of the sqlite3_stmt_readonly interface. That interface wasn't added to sqlite until 3.7.4 (http://www.sqlite.org/releaselog/3_7_4.html). My RPM build failed because RHEL/CentOS 6 only has sqlite 3.6.20. I understand that Python can't support old libraries forever, but can this minimum sqlite version be noted somewhere in the documentation? |
Please open a new issue for this request. |
With Berker Peksag's patch merged in 2016, in the default non auto-commit mode, sqlite3 implicitly issues a BEGIN statement before any non SELECT statement not already in a transaction, EXCEPT DDL statements, for backwards compatibility reasons: + /* For backwards compatibility reasons, do not start a transaction if a Is there any plan to cover DDL statements as well in a future Python version and deprecate the old behaviour? That would avoid having to insert BEGIN statements manually for getting transactional DDL statements, like in psycopg2 for PostgreSQL databases. |
Please open a new issue for this question. |
@Aymeric Augustin
+1. We could use this new autocommit property to enable the new full transactional mode (that is to say with transactional DDL):
To transition from the old partial transactional mode (without transactional DDL) by default to the new full transactional mode (with transactional DDL) by default, we could use the following migration strategy:
|
@r. David Murray
Done here: https://bugs.python.org/issue39457 |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: