Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved (and better documented) support for transactions #121

Open
simonw opened this issue Jul 8, 2020 · 3 comments
Open

Improved (and better documented) support for transactions #121

simonw opened this issue Jul 8, 2020 · 3 comments
Labels
documentation enhancement New feature or request python-library question Further information is requested research

Comments

@simonw
Copy link
Owner

simonw commented Jul 8, 2020

Originally posted by @simonw in #118 (comment)

We should put some thought into how this library supports and encourages smart use of transactions.

@simonw simonw added question Further information is requested research labels Jul 8, 2020
@tsibley
Copy link
Contributor

tsibley commented Jul 8, 2020

Better transaction handling would be really great. Some of my thoughts on implementing better transaction discipline are in #118 (comment).

My preferences:

  • Each CLI command should operate in a single transaction so that either the whole thing succeeds or the whole thing is rolled back. This avoids partially completed operations when an error occurs part way through processing. Partially completed operations are typically much harder to recovery from gracefully and may cause inconsistent data states.

  • The Python API should be transaction-agnostic and rely on the caller to coordinate transactions. Only the caller knows how individual insert, create, update, etc operations/methods should be bundled conceptually into transactions. When the caller is the CLI, for example, that bundling would be at the CLI command-level. Other callers might want to break up operations into multiple transactions. Transactions are usually most useful when controlled at the application-level (like logging configuration) instead of the library level. The library needs to provide an API that's conducive to transaction use, though.

  • The Python API should provide a context manager to provide consistent transactions handling with more useful defaults than Python's sqlite3 module. The latter issues implicit BEGIN statements by default for most DML (INSERT, UPDATE, DELETE, … but not SELECT, I believe), but not DDL (CREATE TABLE, DROP TABLE, CREATE VIEW, …). Notably, the sqlite3 module doesn't issue the implicit BEGIN until the first DML statement. It does not issue it when entering the with conn block, like other DBAPI2-compatible modules do. The with conn block for sqlite3 only arranges to commit or rollback an existing transaction when exiting. Including DDL and SELECTs in transactions is important for operation consistency, though. There are several existing bugs.python.org tickets about this and future changes are in the works, but sqlite-utils can provide its own API sooner. sqlite-utils's Database class could itself be a context manager (built on the sqlite3 connection context manager) which additionally issues an explicit BEGIN when entering. This would then let Python API callers do something like:

db = sqlite_utils.Database(path)

with db: # ← BEGIN issued here by Database.__enter__
    db.insert(…)
    db.create_view(…)
# ← COMMIT/ROLLBACK issue here by sqlite3.connection.__exit__

@simonw
Copy link
Owner Author

simonw commented Jul 8, 2020

I'm with you on most of this. Completely agreed that the CLI should do everything in a transaction.

The one thing I'm not keen on is forcing calling code to explicitly start a transaction, for a couple of reasons:

  1. It will break all of the existing code out there
  2. It doesn't match to how I most commonly use this library - as an interactive tool in a Jupyter notebook, where I'm generally working against a brand new scratch database and any errors don't actually matter

So... how about this: IF you wrap your code in a with db: block then the .insert() and suchlike methods expect you to manage transactions yourself. But if you don't use the context manager they behave like they do at the moment (or maybe a bit more sensibly).

That way existing code works as it does today, lazy people like me can call .insert() without thinking about transactions, but people writing actual production code (as opposed to Jupyter hacks) have a sensible way to take control of the transactions themselves.

@tsibley
Copy link
Contributor

tsibley commented Jul 9, 2020

Yep, I agree that makes more sense for backwards compat and more casual use cases. I think it should be possible for the Database/Queryable methods to DTRT based on seeing if it's within a context-manager-managed transaction.

@simonw simonw changed the title Better documented support for transactions Improved (and better documented) support for transactions Jul 9, 2020
@simonw simonw added documentation enhancement New feature or request labels Jul 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation enhancement New feature or request python-library question Further information is requested research
Projects
None yet
Development

No branches or pull requests

2 participants