Expanded analysis of SQL operations#2749
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2749 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 66 66
Lines 9812 10002 +190
======================================
- Misses 9812 10002 +190 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Require view-table permission for reads discovered inside write SQL analysis, including INSERT ... SELECT and CREATE TABLE ... AS SELECT. Record additional SQLite authorizer callbacks as Operation values so unsupported functions, savepoints, virtual table DDL, and unknown callbacks are denied unless explicitly handled.
|
Some useful feedback from GPT-5.5 xhigh/Codex:
|
|
I'm confident it is not possible with a SQLite3 analyzer callback to tell the difference between an Solution: I'm going to enforce that the user has |
Raw SQL insert and update statements can have broader effects than their SQLite authorizer callbacks reveal. INSERT OR REPLACE and UPDATE OR REPLACE can delete conflicting rows while only surfacing insert or update operations. Expand table insert and update operations to require insert-row, update-row, and delete-row together. Keep delete operations mapped to delete-row, and update the analysis UI/API to report and evaluate multiple required permissions for a single operation. Refs #2749 (comment)
|
This issue, marked as P2 is interesting:
Effectively we're saying that, if a user can create a table, they can trigger any function on subsequent inserts to that table. So we don't have a good way of blocking function calls. Does this actually matter? I'm not convinced that it does, for our application. |
|
I think I'm going to need documentation around the new This does raise the question of whether |
|
The reason I started down the path of analyzing SQL writes to check if the user should be able to write to those tables actually related to stored queries. If I want Datasette to work as more of an application platform - especially interesting now that agents can help non-SQL-literate people write SQL queries - then it makes sense that we might want a wider range of less sophisticated users able to construct new SQL write queries... if we can keep that safe. On that basis, being able to distribute I'm going to keep pushing on this a little further, but I'd like to get really confident that this is feasible before I ship the next alpha. |
|
There is something very promising about datasette-agent getting the ability to execute write queries within a tightly controlled permission landscape though - that on its own may be worth sticking with this. |
|
(I do worry that when Datasette extends to other databases like PostgreSQL it may prove impossible to recreate the per-table finely grained permissions we've built for SQLite though.) |
|
Oh this is interesting... I had Claude look into "whether the SQLite3 authorizer mechanism can be used to detect an INSERT OR REPLACE based just on running an explain query" and it came back with a solution that uses https://claude.ai/share/c4212606-3fee-4b7c-bc97-505e0348ccac Here's Python code it wrote: import sqlite3
# SQLite p5 flag bits relevant to OP_Insert (from sqliteInt.h):
OPFLAG_NCHANGE = 0x01
OPFLAG_EPHEM = 0x02
OPFLAG_ISUPDATE = 0x04 # this OP_Insert is the rewrite half of an UPDATE
OPFLAG_APPEND = 0x08
OPFLAG_USESEEKRESULT = 0x10
OPFLAG_LASTROWID = 0x20
def _main_program(rows):
"""Return only the rows belonging to the top-level VDBE program.
EXPLAIN dumps the main program first (addresses 0..N), then any trigger
sub-programs each starting their own address counting from 0. We stop as
soon as the address column resets or decreases.
"""
out = []
prev = -1
for r in rows:
addr = r[0]
if addr <= prev:
break
out.append(r)
prev = addr
return out
def classify(conn, sql):
"""Classify an INSERT-flavoured statement using only EXPLAIN bytecode.
Returns one of:
'insert' - plain INSERT or INSERT OR ABORT/FAIL/IGNORE/ROLLBACK
'insert_replace' - INSERT OR REPLACE (or bare REPLACE shorthand)
'upsert' - INSERT ... ON CONFLICT DO ...
'update' - UPDATE or UPDATE OR REPLACE
'other' - SELECT, DELETE, DDL, etc.
"""
rows = _main_program(conn.execute("EXPLAIN " + sql).fetchall())
# Only consider Insert/Delete opcodes that count as user-visible row changes
# (OPFLAG_NCHANGE). This excludes the sqlite_master writes performed during
# DDL like CREATE TABLE.
inserts = [r for r in rows if r[1] == "Insert" and (r[6] & OPFLAG_NCHANGE)]
deletes = [r for r in rows if r[1] == "Delete"]
fresh_inserts = [r for r in inserts if not (r[6] & OPFLAG_ISUPDATE)]
update_inserts = [r for r in inserts if r[6] & OPFLAG_ISUPDATE]
if not inserts:
return "other"
if update_inserts and not fresh_inserts:
return "update"
if len(fresh_inserts) > 1:
# Two fresh-insert sites is characteristic of UPSERT (one per branch).
return "upsert"
if update_inserts and fresh_inserts:
# One update-insert + one fresh-insert is also UPSERT.
return "upsert"
# Exactly one fresh-insert and no update-inserts.
if deletes:
return "insert_replace"
return "insert"
# ----------------------------- tests -----------------------------
def run(label, setup, sql, expected):
conn = sqlite3.connect(":memory:")
for s in setup:
conn.execute(s)
got = classify(conn, sql)
mark = "OK " if got == expected else "FAIL"
print(f" {mark} {label:50s} expected={expected:16s} got={got}")
conn.close()
T = ["CREATE TABLE t (id INTEGER PRIMARY KEY, name TEXT UNIQUE, val INTEGER)"]
T2 = T + ["CREATE TABLE src (name TEXT, val INTEGER)"]
T3 = ["CREATE TABLE replace_log (id INTEGER PRIMARY KEY, replace TEXT)"]
T4 = T + [
"CREATE TABLE log (msg TEXT)",
"CREATE TRIGGER tr AFTER INSERT ON t BEGIN DELETE FROM log WHERE msg='x'; END",
]
print("Baseline INSERT variants:")
run("plain INSERT", T, "INSERT INTO t (name, val) VALUES ('a', 1)", "insert")
run("INSERT OR IGNORE", T, "INSERT OR IGNORE INTO t (name, val) VALUES ('a', 1)", "insert")
run("INSERT OR ABORT", T, "INSERT OR ABORT INTO t (name, val) VALUES ('a', 1)", "insert")
run("INSERT OR FAIL", T, "INSERT OR FAIL INTO t (name, val) VALUES ('a', 1)", "insert")
run("INSERT OR ROLLBACK", T, "INSERT OR ROLLBACK INTO t (name, val) VALUES ('a', 1)", "insert")
run("INSERT OR REPLACE", T, "INSERT OR REPLACE INTO t (name, val) VALUES ('a', 1)", "insert_replace")
run("REPLACE shorthand", T, "REPLACE INTO t (name, val) VALUES ('a', 1)", "insert_replace")
print("\nINSERT ... SELECT:")
run("plain INSERT...SELECT", T2, "INSERT INTO t(name,val) SELECT name,val FROM src", "insert")
run("INSERT OR REPLACE...SELECT", T2, "INSERT OR REPLACE INTO t(name,val) SELECT name,val FROM src", "insert_replace")
print("\nMulti-row VALUES:")
run("plain multi-row", T, "INSERT INTO t(name,val) VALUES ('a',1),('b',2),('c',3)", "insert")
run("REPLACE multi-row", T, "INSERT OR REPLACE INTO t(name,val) VALUES ('a',1),('b',2),('c',3)", "insert_replace")
print("\nDecoys & adversarial naming:")
run("table called 'replace_log'", T3, "INSERT INTO replace_log (replace) VALUES ('hello')", "insert")
print("\nUPSERT variants:")
run("ON CONFLICT DO NOTHING",
T, "INSERT INTO t(name,val) VALUES ('a',1) ON CONFLICT(name) DO NOTHING", "insert")
run("ON CONFLICT DO UPDATE",
T, "INSERT INTO t(name,val) VALUES ('a',1) ON CONFLICT(name) DO UPDATE SET val=val+1", "upsert")
print("\nTriggers that contain DELETE elsewhere:")
run("plain INSERT, AFTER trigger does DELETE",
T4, "INSERT INTO t(name,val) VALUES ('a',1)", "insert")
run("INSERT OR REPLACE, AFTER trigger does DELETE",
T4, "INSERT OR REPLACE INTO t(name,val) VALUES ('a',1)", "insert_replace")
print("\nCTE prefixes:")
run("WITH ... INSERT",
T, "WITH x(a,b) AS (VALUES('a',1)) INSERT INTO t(name,val) SELECT a,b FROM x", "insert")
run("WITH ... INSERT OR REPLACE",
T, "WITH x(a,b) AS (VALUES('a',1)) INSERT OR REPLACE INTO t(name,val) SELECT a,b FROM x", "insert_replace")
print("\nNon-INSERT statements:")
run("SELECT", T, "SELECT * FROM t", "other")
run("UPDATE", T, "UPDATE t SET val=2 WHERE name='a'", "update")
run("UPDATE OR REPLACE", T, "UPDATE OR REPLACE t SET name='b' WHERE id=1", "update")
run("DELETE", T, "DELETE FROM t WHERE name='a'", "other")
run("CREATE TABLE", [], "CREATE TABLE x(a)", "other")And the output from running it: |
|
I modified that to print the SQL and the explain output, here's one of those The fact that detecting this requires doing tricks like |
|
That Claude transcript also identified that SQLite has the perfect feature for this called the pre-update hook - https://sqlite.org/c3ref/preupdate_blobwrite.html - but it's only available "if SQLite is compiled using the SQLITE_ENABLE_PREUPDATE_HOOK compile-time option" and is not exposed by |
I'm looking at this one now. Neither the authorizer nor an explain can tell if a So we could maybe detect create table using default functions by running it in a fresh connection, or a transaction, and checking that and then rolling back? I'm not convinced it's worth solving this though. I don't think blocking default functions in create tables is worthwhile. |
Stop marking sqlite_master and sqlite_schema reads as internal as soon as the SQLite authorizer reports them. The later DDL-aware pass still treats schema catalog access as internal when it accompanies semantic CREATE, ALTER, or DROP operations. This makes explicit catalog reads in write SQL fall through to the deny-by-default path as unsupported read schema operations, preventing queries from copying private table definitions into writable tables. Refs #2749 (comment)
Reject VACUUM explicitly during write-query permission analysis so arbitrary write SQL and untrusted stored write queries cannot run it, even when the actor has execute-write-sql. Refs #2749 (comment) (P3)
Reject VACUUM explicitly during write-query permission analysis so arbitrary write SQL and untrusted stored write queries cannot run it, even when the actor has execute-write-sql. Refs #2749 (comment) (P3)
668f250 to
11bddc8
Compare
|
I had Codex use Showboat to exercise the SQL write API, here's the result: https://gist.github.com/simonw/3ba1ac83ba438b6d6558eb2ceff1adce Everything worked well, but it did spot one weird almost-bug: If you run this twice: create index if not exists idx_showboat_dogs_name on showboat_dogs(name)The first one returns I've decided not to try to fix this one, since the fix is very non-obvious (it's not easy to detect from either the analyzer operations or the |
Refs:
📚 Documentation preview 📚: https://datasette--2749.org.readthedocs.build/en/2749/