Fable review of 4.0rc1#767
Conversation
Consolidated findings from a final review before the stable 4.0 release: five release blockers, semantic decisions to lock in now, polish items, and verified-sound areas. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
Assessment of the CREATE TABLE parser as the foundation for 4.1 transform() improvements: fit for 4.1, two tokenizer bugs to fix first (comments, numeric literal defaults), test wiring, and design decisions around round-trip fidelity and API visibility. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
Table.delete_where() ran its DELETE via a bare execute() with no commit, unlike Table.delete() which wraps in db.atomic(). The connection was left with an open implicit transaction, so the deletion (and all subsequent writes, including later atomic() blocks which switched to savepoint mode) was silently rolled back when the connection closed. Wrap the DELETE in db.atomic() and remove the now-unnecessary atomic() wrapper from the delete_where documentation example. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
Migrations.apply() now wraps each migration function and its _sqlite_migrations tracking row in db.atomic(), so they commit together: a failing migration rolls back cleanly, is not recorded, and stays pending, eliminating the double-apply hazard where a partially-failed migration left committed side effects behind. Migrations that cannot run inside a transaction (VACUUM, journal mode changes, manual transaction management) can opt out by registering with @migrations(transactional=False). Also fixed the test fixtures which used db.query() for INSERT statements - query() is a lazy generator, so those inserts never actually executed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
drop-table on a view name silently dropped the view, and drop-view on a table name silently dropped the table, because both used db[name].drop() which dispatches on the actual object type. They now use db.table() / db.view() and exit with an explanatory error pointing at the correct command. --ignore still exits cleanly but no longer drops anything. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
Previously query() was a generator function, so nothing - including the SQL itself - ran until the result was first iterated. A write statement passed to query() silently did nothing, and SQL errors surfaced far from the call site. The SQL now executes as soon as query() is called, while rows are still fetched lazily during iteration. Statements that return no rows now raise a ValueError directing callers to execute() instead. As a side effect, statements like INSERT ... RETURNING now work naturally with query(). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
Both ran their INSERT INTO fts(fts) statements via a bare execute() with no commit, leaving the connection inside an open implicit transaction - the FTS operation and all subsequent writes were then silently rolled back when the connection closed. Both are now wrapped in db.atomic(), matching delete_where() and the other write operations. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
A record missing a primary key value - or with None for one - can never match an existing row, because a NULL primary key never satisfies ON CONFLICT. Such records were previously inserted as brand new rows (silently, for multi-record upsert_all calls) or triggered a KeyError after the insert had already happened. Both cases, including upserting an empty record, now raise PrimaryKeyRequired before any SQL is executed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
Changing the journal mode assigns conn.isolation_level, which commits any open transaction as a side effect - silently breaking the rollback guarantee of atomic() blocks and of user-managed transactions. Both methods now raise RuntimeError if a transaction is open. Calling them when the database is already in the requested mode remains a no-op. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
_AppliedMigration.applied_at was annotated datetime.datetime but the value is the TEXT timestamp read straight from the _sqlite_migrations table - always a string, matching the format written by both this module and the older sqlite-migrate package. Added a test pinning the runtime type. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #767 +/- ##
==========================================
- Coverage 95.34% 95.16% -0.19%
==========================================
Files 9 9
Lines 3330 3431 +101
==========================================
+ Hits 3175 3265 +90
- Misses 155 166 +11 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
test_query_executes_eagerly failed on the sqlean CI matrix jobs because sqlean.dbapi2.OperationalError is not a subclass of the stdlib sqlite3.OperationalError. Import sqlite3 via sqlite_utils.utils like the other test modules do. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
with Database(...) closes the connection on exit without committing, so uncommitted changes are rolled back. This was the existing behavior but was undocumented and untested - it is now stated in the docs and pinned by a test. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
…odes Documents that atomic() opens a deferred BEGIN, that calling it while the connection is already inside a transaction produces a savepoint whose changes only persist when the outer transaction commits, and that Python 3.12+ autocommit=True/False connections are not supported. Adds a test pinning the savepoint-inside-manual- BEGIN behavior. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
An unknown --stop-before value previously matched nothing and every migration was silently applied, including the one the user meant to stop before. The CLI now errors unless each value matches a known migration. The CLI also always passed stop_before as a list, but older duck-typed sqlite-migrate Migrations objects compare it against a single string name - so the flag was silently ignored for exactly the migration files the compatibility shim exists to support. Legacy sets now receive a single string, with an error if multiple values target one legacy set. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
pending() and applied() no longer create the _sqlite_migrations table or perform the one-way legacy schema upgrade - that now only happens in apply(). The CLI --list path no longer creates the database file when it does not exist. applied() results are now explicitly ordered by insertion. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
- insert/upsert into a name that is actually a view now shows a clean error instead of a traceback - db.view() on a name that is a table now says so in its error - Database.__getitem__ docstring documents that views are returned for view names; hash_id docstring corrected (it is a column name, not a boolean) - Registering two migrations with the same name in one set now raises ValueError at registration time instead of executing both and failing with IntegrityError at apply time - build-system requires setuptools>=77, needed for the PEP 639 license expression Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
Type detection has been the default for CSV/TSV data since 4.0a1, making this flag a no-op kept only for backwards compatibility. 4.0 is the release where it can be removed - the flag now errors, prompting scripts to drop it. --no-detect-types is unchanged. The dead detect_types parameter has been removed from insert_upsert_implementation. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
User-facing argument validation in db.py previously used bare assert statements, which vanish entirely under python -O and raise AssertionError - an exception type callers should not have to catch for input validation. Fifteen validation sites now raise ValueError with the same messages. Internal invariants (an unreachable branch and a post-update rowcount check) remain asserts. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
New docs page describing the changes needed to upgrade between major versions, with a detailed 3.x to 4.0 section covering CLI, Python API and packaging changes, plus brief sections for the 2.x to 3.0 and 1.x to 2.0 upgrades. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
|
I'm not 100% confident in the design where changes are rolled back if you didn't commit them at the end of a |
These were working documents for the 4.0 pre-release review - the actionable findings have all been fixed on this branch and the changelog and upgrading guide carry the user-facing record. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
New top-level 'Transactions and saving your changes' section in the Python API documentation, stating the fundamental contract up front: every write method commits its own changes before returning, no commit() call is ever needed. The db.atomic() documentation moves under it, alongside new subsections covering the two cases where the user owns the commit - raw db.execute() writes and manually managed transactions - and the supported connection modes. Cross-referenced from Getting started (the first insert example now says the rows are saved immediately), a warning on db.execute(), the closing-a-database section and the upgrading guide. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
The method existed only to raise NotImplementedError, since full-text search is not supported for views, and it showed up in the generated API reference as a documented View method. Calling enable_fts() on a View now raises AttributeError like any other missing method. The sqlite-utils enable-fts command uses db.table() and shows a clean error when pointed at a view instead of a traceback. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
The manual-transaction warning could be read as covering db.atomic() too. Nested atomic() blocks are safe - they become savepoints inside the migration's transaction, preserving the all-or-nothing rollback guarantee. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
…ting Expands the closing-a-database docs: exit is equivalent to close(), open transactions can only come from manual transaction control, auto-committing could silently persist half-finished work, and the behavior deliberately differs from sqlite3.Connection's own context manager. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
| Managing transactions yourself | ||
| ------------------------------ | ||
|
|
||
| You can take full manual control using ``db.execute("begin")`` (or any raw write, as above) followed by ``db.conn.commit()`` or ``db.conn.rollback()``. |
There was a problem hiding this comment.
That makes me think maybe we should have a db.begin() and db.commit() and db.rollback() method, since it's a bit ugly in the documentation to suddenly start showing raw db.execute("begin") strings and then use methods on db.conn.
|
|
||
| Two related safeguards to be aware of: | ||
|
|
||
| - ``db.enable_wal()`` and ``db.disable_wal()`` raise a ``RuntimeError`` if called while a transaction is open, because changing the journal mode would commit it as a side effect. |
There was a problem hiding this comment.
Let's introduce our own exception for here, it's weird to use RuntimeError.
| Supported connection modes | ||
| -------------------------- | ||
|
|
||
| ``db.atomic()`` and the automatic per-method transactions are designed for connections in Python's default transaction handling mode. Connections created with the Python 3.12+ ``sqlite3.connect(..., autocommit=True)`` or ``autocommit=False`` options are not supported, because ``commit()`` and ``rollback()`` behave differently on those connections. |
There was a problem hiding this comment.
What does "not supported" mean, will it actively break things or just behave in weird ways?
There was a problem hiding this comment.
The empirical answer is worse than "behaves in weird ways" — it breaks silently, in the most dangerous way possible, on the most basic operation. Under Python 3.13, with a connection passed to
Database()in either mode:
autocommit=True:db["t"].insert({"id": 1})appears to succeed — the row is readable in-process — but the connection is left inside an open transaction, a second connection can't even see the table, and when the connection closes, everything vanishes. No error at any point. The mechanism:atomic()runs a realBEGIN, but Python documentsconn.commit()as a no-op onautocommit=Trueconnections, so the commit at the end of every write method silently does nothing.autocommit=False: identical symptoms via a different route — the connection is always inside an implicit transaction, soatomic()permanently takes its savepoint branch and never commits anything. Insert looks fine in-process; after close, the table doesn't exist.So "not supported" currently means: every write the library makes is silently discarded at close. Nothing raises, nothing warns. That's the same silent-data-loss family as the
delete_wherebug — except here it's the entire API, and the docs sentence we wrote ("not supported") badly understates it.
There was a problem hiding this comment.
Given that answer I said:
In that case, do an experiment now (don't commit any code) which sees if there is a minimal change we can make that results in our entire test suite passing for connections that use the new autocommit setting
And Fable did that and said:
The working tree is clean, so I can experiment freely and
git checkoutat the end. Protocol: (1) force all internally-created connections toautocommit=Truevia an env var and bypass the new guard, (2) baseline the damage under Python 3.13, (3) apply the smallest fix I hypothesize —atomic()andcommit()/rollback()issuing literal SQL statements, which behave identically in every mode — and (4) count and categorize what still fails. Then the same forautocommit=False.Baseline: 276 failures under
autocommit=Truewith no mitigation. Now the hypothesized minimal fix — literal transaction statements, which mean the same thing in every connection mode:Experiment complete. Capturing the diff summary and reverting everything:
Experiment done, everything reverted (tree clean, suite back to 1118 green on the committed code). Here's what it found:
Answer: yes — for
autocommit=True, near-total compatibility is a ~20-line changeBaseline: forcing every internally-created connection to
autocommit=Truebreaks the library catastrophically, as expected: 276 of 1134 tests fail (everyatomic()commit is a silent no-op, so transactions pile up open and everything downstream rots).The minimal fix — replace mode-dependent connection method calls with literal SQL statements, which mean the same thing in every mode:
- In
atomic()'s top-level branch:conn.rollback()→execute("ROLLBACK"),conn.commit()→execute("COMMIT")(theBEGINwas already a statement; the savepoint branch was already pure SQL).- In
db.commit()/db.rollback(): literalCOMMIT/ROLLBACK, guarded byin_transaction(a bareCOMMITwith no transaction errors, unlike the no-op method).- One call-site cleanup: the CLI
bulkcommand still usedwith db.conn:— switching it todb.atomic()(which it should arguably use anyway for consistency).Result: 276 → 3 failures, and two of those are the constructor-guard tests themselves, which the experiment deliberately bypasses — so really one genuine incompatibility:
test_database_context_manager, which pins that rawdb.execute()writes roll back at close. Underautocommit=Truethere are no implicit transactions — raw writes commit instantly — so that documented semantic simply doesn't exist in that mode. That's not a bug to fix; it's the honest behavioral delta that would need documenting. Also notable: my new savepoint test only passed after switching it fromdb.conn.rollback()todb.rollback()— a nice confirmation that the methods you asked for this morning are exactly the mode-agnostic idiom.Crucially, the fix is backwards-compatible: with the literal-statement changes in place and no autocommit forcing, the full suite passes untouched (1118/1118) in legacy mode. So this change could ship in 4.x at any time without breaking anyone, and it would even fix the tracer blind spot (transaction statements become visible if routed through
execute()).
autocommit=Falseis a different story: 266 failures even with the fix. As predicted, its always-inside-an-implicit-transaction model breaksatomic()'sin_transactionsniffing at the root — supporting it needs a real redesign (explicit nesting-depth tracking instead of state sniffing), not a patch.Implication for the 5.0 speculation: the door is even more open than I guessed. When demand arrives,
autocommit=Truesupport is a minor-release-sized change (literal statements + relax the guard forTrueonly + document the no-implicit-rollback delta + sweep the remaining fourwith db.conn:sites in cli.py). Onlyautocommit=False— or changing your own default — carries major-release weight. Nothing worth doing before stable 4.0, but the experiment's conclusion is worth remembering when the first "support my autocommit connection" issue gets filed.
There was a problem hiding this comment.
I'm going to implement this and do an RC which works in both modes.
Manual transaction control previously required mixing raw
db.execute("begin") strings with methods on db.conn. These thin
wrappers give the documentation and callers a single consistent
idiom. Docs updated to use them throughout.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
…e_wal A dedicated sqlite_utils.db.TransactionError exception is clearer to catch than a generic RuntimeError, and gives future transaction-state guards a home. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
Connections created with sqlite3.connect(autocommit=True) make commit() a documented no-op, and autocommit=False connections are permanently inside a transaction - in both modes every write made by this library appeared to work in-process but was silently discarded when the connection closed. Database() now raises TransactionError for these connections instead of losing data. Tests are skipped on Python versions before 3.12, where the autocommit parameter does not exist. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
Raw write statements previously opened an implicit transaction that stayed open until something committed it - the write was visible on the same connection, making it look saved, but was silently rolled back when the connection closed. execute() now commits any implicit transaction it opens, so raw writes behave like every other write in the library: committed as soon as they run, unless an explicit transaction (db.begin() or db.atomic()) is open, in which case they join it. Row-returning writes such as INSERT ... RETURNING via db.query() commit once their rows have been iterated. atomic(), commit() and rollback() now issue literal COMMIT/ROLLBACK statements, which have identical semantics in every sqlite3 connection mode. The CLI bulk command uses db.atomic() instead of 'with db.conn:'. This simplifies the transaction contract to: everything commits immediately unless you explicitly opened a transaction. Docs updated throughout; changelog and upgrading guide cover the breaking change for code that relied on rolling back uncommitted execute() writes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
| This rarely matters in practice. Everything that writes to the database - | ||
| including raw ``db.execute()`` statements - commits automatically, so a | ||
| transaction can only be open here if you explicitly started one with | ||
| ``db.begin()`` and have not yet committed it. In that case the decision to | ||
| commit stays with you: committing automatically on exit could silently | ||
| persist half-finished work, for example if your code returned early from the | ||
| block. Call ``db.commit()`` when the work is complete. |
There was a problem hiding this comment.
I don't like wrapped text like this in docs.
There was a problem hiding this comment.
Look at the 10 lines of docs/python-api.rst following 151 - these have been text-wrapped, but the style for this project's documentation is NOT to text-wrap RST. Fix that by unwrapping the block, and find any other instances in the RST files that have been wrapped like that but should not have been.
Runs the entire test suite against connections created with the Python 3.12+ sqlite3.connect(autocommit=True) mode, by patching connect() to default autocommit=True for internally-created connections. Errors on older Python versions. The full suite passes in both modes with no test changes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
parametrize received a generator from Path.glob(), which pytest 10 will reject. sorted() materializes the list and makes the parameter order deterministic as a bonus. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
To test that our code works correctly with the new autocommit option introduced in Python 3.13+ #767 (comment)
The documentation style for this project is one line per paragraph, not text-wrapped RST. Unwraps three recently added paragraphs in python-api.rst and two older ones in plugins.rst. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
|
I said to Claude Code:
It came back with:
|
Initial review document: https://github.com/simonw/sqlite-utils/blob/0c369a447eeaf39084f0d14a45b3eeb7eacb631b/fable-review-4.0rc1.md
Then we fixed a lot of stuff.
Claude-Session: https://claude.ai/code/session_01UnLnhsH25Nnv7LHhekUfPd
📚 Documentation preview 📚: https://sqlite-utils--767.org.readthedocs.build/en/767/