Skip to content

Expanded analysis of SQL operations#2749

Draft
simonw wants to merge 8 commits into
mainfrom
codex/operation-sql-analysis
Draft

Expanded analysis of SQL operations#2749
simonw wants to merge 8 commits into
mainfrom
codex/operation-sql-analysis

Conversation

@simonw
Copy link
Copy Markdown
Owner

@simonw simonw commented May 27, 2026

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

❌ Patch coverage is 0% with 227 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (316daf9) to head (0c5053c).

Files with missing lines Patch % Lines
datasette/utils/sql_analysis.py 0.00% 116 Missing ⚠️
datasette/stored_queries.py 0.00% 53 Missing ⚠️
datasette/views/query_helpers.py 0.00% 41 Missing ⚠️
datasette/views/database.py 0.00% 8 Missing ⚠️
datasette/views/execute_write.py 0.00% 5 Missing ⚠️
datasette/permissions.py 0.00% 4 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##            main   #2749    +/-   ##
======================================
  Coverage   0.00%   0.00%            
======================================
  Files         66      66            
  Lines       9812   10002   +190     
======================================
- Misses      9812   10002   +190     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Require view-table permission for reads discovered inside write SQL analysis, including INSERT ... SELECT and CREATE TABLE ... AS SELECT.

Record additional SQLite authorizer callbacks as Operation values so unsupported functions, savepoints, virtual table DDL, and unknown callbacks are denied unless explicitly handled.
@simonw
Copy link
Copy Markdown
Owner Author

simonw commented May 27, 2026

Some useful feedback from GPT-5.5 xhigh/Codex:

Findings

  • [P1] REPLACE deletes rows but analyzes as plain insert.
    datasette/utils/sql_analysis.py relies on authorizer callbacks, but SQLite does not emit a delete callback for conflict replacement. I verified insert or replace into users(id, email) values (3, 'b@example.com') is allowed with only insert-row, and it deletes the existing b@example.com row. Same issue for update or replace, which can delete a conflicting row while requiring only update-row.

  • [P1] User SQL reads from sqlite_master are always hidden as internal.
    At datasette/utils/sql_analysis.py, any read of sqlite_master/sqlite_schema is immediately marked internal. That lets a user with insert-row on log, but no view-table on secret, run insert into log select sql from sqlite_master where name='secret' and copy the private table schema. The table schema endpoint separately enforces view-table, so this looks like a real permission bypass for schema metadata.

  • [P2] Function blocking can be bypassed through schema expressions.
    Functions are only recorded when SQLite authorizer reports SQLITE_FUNCTION at datasette/utils/sql_analysis.py, but defaults, CHECK/generated/index expressions during DML did not show up in analysis. Example: create table d(x default (hex(randomblob(4)))) and then insert into d default values were both allowed without any function operation. Also, datasette/utils/sql_analysis.py marks allowlisted names like printf, length, and substr internal in CREATE TABLE AS SELECT with no source table, so create table t as select printf(...) bypasses the current “unsupported function” rejection.

  • [P2] Virtual table control inserts can be destructive but analyze as insert-row.
    Existing virtual tables are treated as ordinary "table" targets for DML. With FTS5 contentless tables, insert into ft(ft) values('delete-all') analyzed as only insert but cleared the search index. If insert-row is meant to permit only row insertion semantics, virtual table DML needs special handling.

  • [P3] VACUUM can produce an empty analysis.
    datasette/utils/sql_analysis.py has no fallback when EXPLAIN <sql> produces no authorizer callbacks. analyze_sql_tables(conn, "vacuum") returned operations=(), and ensure_query_write_permissions("data", "vacuum") allowed it because there was nothing to reject.

I used ad-hoc uv run python -c ... probes throughout. Foreign-key cascades, triggers, views, and RETURNING subqueries looked much better: those table reads/writes were reported correctly in the cases I tried. No files were modified.

@simonw
Copy link
Copy Markdown
Owner Author

simonw commented May 27, 2026

I'm confident it is not possible with a SQLite3 analyzer callback to tell the difference between an INSERT and a INSERT OR REPLACE INTO - as such, it's not possible for us to say "if the user has insert-row on a table then it's OK for them to run arbitrary INSERT against it", because they could always delete or update rows using alternative SQL.

Solution: I'm going to enforce that the user has insert-row AND update-row AND delete-row on any table that they wish to run SQL inserts against.

Raw SQL insert and update statements can have broader effects than their SQLite authorizer callbacks reveal. INSERT OR REPLACE and UPDATE OR REPLACE can delete conflicting rows while only surfacing insert or update operations.

Expand table insert and update operations to require insert-row, update-row, and delete-row together. Keep delete operations mapped to delete-row, and update the analysis UI/API to report and evaluate multiple required permissions for a single operation.

Refs #2749 (comment)
@simonw
Copy link
Copy Markdown
Owner Author

simonw commented May 27, 2026

This issue, marked as P2 is interesting:

create table d(x default (hex(randomblob(4)))) - then an insert

Effectively we're saying that, if a user can create a table, they can trigger any function on subsequent inserts to that table. So we don't have a good way of blocking function calls.

Does this actually matter? I'm not convinced that it does, for our application.

@simonw
Copy link
Copy Markdown
Owner Author

simonw commented May 27, 2026

I think I'm going to need documentation around the new execute-write-sql permission that warns that this permission should only be given to trusted users, because while it attempts to enforce their insert-row etc there may be ways they can workaround this.

This does raise the question of whether execute-write-sql having extra permission checks is worthwhile though - it's a lot of extra complexity, and if we can't guarantee that it's robust should we bother at all, or should we just say execute-write-sql lets users do anything they like?

@simonw
Copy link
Copy Markdown
Owner Author

simonw commented May 27, 2026

The reason I started down the path of analyzing SQL writes to check if the user should be able to write to those tables actually related to stored queries.

If I want Datasette to work as more of an application platform - especially interesting now that agents can help non-SQL-literate people write SQL queries - then it makes sense that we might want a wider range of less sophisticated users able to construct new SQL write queries... if we can keep that safe.

On that basis, being able to distribute execute-write-sql to more of our users while still being confident that they can't mess up too much stuff is a good idea.

I'm going to keep pushing on this a little further, but I'd like to get really confident that this is feasible before I ship the next alpha.

@simonw
Copy link
Copy Markdown
Owner Author

simonw commented May 27, 2026

There is something very promising about datasette-agent getting the ability to execute write queries within a tightly controlled permission landscape though - that on its own may be worth sticking with this.

@simonw
Copy link
Copy Markdown
Owner Author

simonw commented May 27, 2026

(I do worry that when Datasette extends to other databases like PostgreSQL it may prove impossible to recreate the per-table finely grained permissions we've built for SQLite though.)

@simonw
Copy link
Copy Markdown
Owner Author

simonw commented May 27, 2026

Oh this is interesting... I had Claude look into "whether the SQLite3 authorizer mechanism can be used to detect an INSERT OR REPLACE based just on running an explain query" and it came back with a solution that uses EXPLAIN bytecode output and it seems to have found one!

https://claude.ai/share/c4212606-3fee-4b7c-bc97-505e0348ccac

Here's Python code it wrote:

import sqlite3

# SQLite p5 flag bits relevant to OP_Insert (from sqliteInt.h):
OPFLAG_NCHANGE        = 0x01
OPFLAG_EPHEM          = 0x02
OPFLAG_ISUPDATE       = 0x04   # this OP_Insert is the rewrite half of an UPDATE
OPFLAG_APPEND         = 0x08
OPFLAG_USESEEKRESULT  = 0x10
OPFLAG_LASTROWID      = 0x20


def _main_program(rows):
    """Return only the rows belonging to the top-level VDBE program.

    EXPLAIN dumps the main program first (addresses 0..N), then any trigger
    sub-programs each starting their own address counting from 0. We stop as
    soon as the address column resets or decreases.
    """
    out = []
    prev = -1
    for r in rows:
        addr = r[0]
        if addr <= prev:
            break
        out.append(r)
        prev = addr
    return out


def classify(conn, sql):
    """Classify an INSERT-flavoured statement using only EXPLAIN bytecode.

    Returns one of:
        'insert'           - plain INSERT or INSERT OR ABORT/FAIL/IGNORE/ROLLBACK
        'insert_replace'   - INSERT OR REPLACE  (or bare REPLACE shorthand)
        'upsert'           - INSERT ... ON CONFLICT DO ...
        'update'           - UPDATE or UPDATE OR REPLACE
        'other'            - SELECT, DELETE, DDL, etc.
    """
    rows = _main_program(conn.execute("EXPLAIN " + sql).fetchall())
    # Only consider Insert/Delete opcodes that count as user-visible row changes
    # (OPFLAG_NCHANGE). This excludes the sqlite_master writes performed during
    # DDL like CREATE TABLE.
    inserts = [r for r in rows if r[1] == "Insert" and (r[6] & OPFLAG_NCHANGE)]
    deletes = [r for r in rows if r[1] == "Delete"]

    fresh_inserts  = [r for r in inserts if not (r[6] & OPFLAG_ISUPDATE)]
    update_inserts = [r for r in inserts if      r[6] & OPFLAG_ISUPDATE]

    if not inserts:
        return "other"
    if update_inserts and not fresh_inserts:
        return "update"
    if len(fresh_inserts) > 1:
        # Two fresh-insert sites is characteristic of UPSERT (one per branch).
        return "upsert"
    if update_inserts and fresh_inserts:
        # One update-insert + one fresh-insert is also UPSERT.
        return "upsert"
    # Exactly one fresh-insert and no update-inserts.
    if deletes:
        return "insert_replace"
    return "insert"


# ----------------------------- tests -----------------------------

def run(label, setup, sql, expected):
    conn = sqlite3.connect(":memory:")
    for s in setup:
        conn.execute(s)
    got = classify(conn, sql)
    mark = "OK  " if got == expected else "FAIL"
    print(f"  {mark}  {label:50s}  expected={expected:16s} got={got}")
    conn.close()


T  = ["CREATE TABLE t (id INTEGER PRIMARY KEY, name TEXT UNIQUE, val INTEGER)"]
T2 = T + ["CREATE TABLE src (name TEXT, val INTEGER)"]
T3 = ["CREATE TABLE replace_log (id INTEGER PRIMARY KEY, replace TEXT)"]
T4 = T + [
    "CREATE TABLE log (msg TEXT)",
    "CREATE TRIGGER tr AFTER INSERT ON t BEGIN DELETE FROM log WHERE msg='x'; END",
]

print("Baseline INSERT variants:")
run("plain INSERT",       T, "INSERT INTO t (name, val) VALUES ('a', 1)",              "insert")
run("INSERT OR IGNORE",   T, "INSERT OR IGNORE INTO t (name, val) VALUES ('a', 1)",    "insert")
run("INSERT OR ABORT",    T, "INSERT OR ABORT INTO t (name, val) VALUES ('a', 1)",     "insert")
run("INSERT OR FAIL",     T, "INSERT OR FAIL INTO t (name, val) VALUES ('a', 1)",      "insert")
run("INSERT OR ROLLBACK", T, "INSERT OR ROLLBACK INTO t (name, val) VALUES ('a', 1)",  "insert")
run("INSERT OR REPLACE",  T, "INSERT OR REPLACE INTO t (name, val) VALUES ('a', 1)",   "insert_replace")
run("REPLACE shorthand",  T, "REPLACE INTO t (name, val) VALUES ('a', 1)",             "insert_replace")

print("\nINSERT ... SELECT:")
run("plain INSERT...SELECT",       T2, "INSERT INTO t(name,val) SELECT name,val FROM src",            "insert")
run("INSERT OR REPLACE...SELECT",  T2, "INSERT OR REPLACE INTO t(name,val) SELECT name,val FROM src", "insert_replace")

print("\nMulti-row VALUES:")
run("plain multi-row",   T, "INSERT INTO t(name,val) VALUES ('a',1),('b',2),('c',3)",            "insert")
run("REPLACE multi-row", T, "INSERT OR REPLACE INTO t(name,val) VALUES ('a',1),('b',2),('c',3)", "insert_replace")

print("\nDecoys & adversarial naming:")
run("table called 'replace_log'", T3, "INSERT INTO replace_log (replace) VALUES ('hello')", "insert")

print("\nUPSERT variants:")
run("ON CONFLICT DO NOTHING",
    T, "INSERT INTO t(name,val) VALUES ('a',1) ON CONFLICT(name) DO NOTHING", "insert")
run("ON CONFLICT DO UPDATE",
    T, "INSERT INTO t(name,val) VALUES ('a',1) ON CONFLICT(name) DO UPDATE SET val=val+1", "upsert")

print("\nTriggers that contain DELETE elsewhere:")
run("plain INSERT, AFTER trigger does DELETE",
    T4, "INSERT INTO t(name,val) VALUES ('a',1)", "insert")
run("INSERT OR REPLACE, AFTER trigger does DELETE",
    T4, "INSERT OR REPLACE INTO t(name,val) VALUES ('a',1)", "insert_replace")

print("\nCTE prefixes:")
run("WITH ... INSERT",
    T, "WITH x(a,b) AS (VALUES('a',1)) INSERT INTO t(name,val) SELECT a,b FROM x", "insert")
run("WITH ... INSERT OR REPLACE",
    T, "WITH x(a,b) AS (VALUES('a',1)) INSERT OR REPLACE INTO t(name,val) SELECT a,b FROM x", "insert_replace")

print("\nNon-INSERT statements:")
run("SELECT",             T, "SELECT * FROM t",                                "other")
run("UPDATE",             T, "UPDATE t SET val=2 WHERE name='a'",              "update")
run("UPDATE OR REPLACE",  T, "UPDATE OR REPLACE t SET name='b' WHERE id=1",    "update")
run("DELETE",             T, "DELETE FROM t WHERE name='a'",                   "other")
run("CREATE TABLE",       [], "CREATE TABLE x(a)",                             "other")

And the output from running it:

Baseline INSERT variants:
  OK    plain INSERT                                        expected=insert           got=insert
  OK    INSERT OR IGNORE                                    expected=insert           got=insert
  OK    INSERT OR ABORT                                     expected=insert           got=insert
  OK    INSERT OR FAIL                                      expected=insert           got=insert
  OK    INSERT OR ROLLBACK                                  expected=insert           got=insert
  OK    INSERT OR REPLACE                                   expected=insert_replace   got=insert_replace
  OK    REPLACE shorthand                                   expected=insert_replace   got=insert_replace

INSERT ... SELECT:
  OK    plain INSERT...SELECT                               expected=insert           got=insert
  OK    INSERT OR REPLACE...SELECT                          expected=insert_replace   got=insert_replace

Multi-row VALUES:
  OK    plain multi-row                                     expected=insert           got=insert
  OK    REPLACE multi-row                                   expected=insert_replace   got=insert_replace

Decoys & adversarial naming:
  OK    table called 'replace_log'                          expected=insert           got=insert

UPSERT variants:
  OK    ON CONFLICT DO NOTHING                              expected=insert           got=insert
  OK    ON CONFLICT DO UPDATE                               expected=upsert           got=upsert

Triggers that contain DELETE elsewhere:
  OK    plain INSERT, AFTER trigger does DELETE             expected=insert           got=insert
  OK    INSERT OR REPLACE, AFTER trigger does DELETE        expected=insert_replace   got=insert_replace

CTE prefixes:
  OK    WITH ... INSERT                                     expected=insert           got=insert
  OK    WITH ... INSERT OR REPLACE                          expected=insert_replace   got=insert_replace

Non-INSERT statements:
  OK    SELECT                                              expected=other            got=other
  OK    UPDATE                                              expected=update           got=update
  OK    UPDATE OR REPLACE                                   expected=update           got=update
  OK    DELETE                                              expected=other            got=other
  OK    CREATE TABLE                                        expected=other            got=other

@simonw
Copy link
Copy Markdown
Owner Author

simonw commented May 27, 2026

I modified that to print the SQL and the explain output, here's one of those insert or replace lines;

SQL: INSERT OR REPLACE INTO t (name, val) VALUES ('a', 1)

    addr  opcode       p1  p2  p3  p4      p5  comment
    ----  -----------  --  --  --  ------  --  -------
       0  Init          0  20   0           0  
       1  OpenWrite     0   2   0  3        0  
       2  OpenWrite     1   3   0  k(2,,)   0  
       3  SoftNull      2   0   0           0  
       4  String8       0   3   0  a        0  
       5  Integer       1   4   0           0  
       6  NewRowid      0   1   0           0  
       7  Affinity      2   3   0  DBD      0  
       8  SCopy         3   6   0           0  
       9  IntCopy       1   7   0           0  
      10  MakeRecord    6   2   5           0  
      11  NoConflict    1  16   6  1        0  
      12  IdxRowid      1   9   0           0  
      13  NotExists     0  16   9  1        0  
      14  Delete        0   0   0  t        0  
      15  Delete        1   0   0           0  
      16  MakeRecord    2   3   8           0  
      17  IdxInsert     1   5   6  2       16  
      18  Insert        0   8   1  t       57  
      19  Halt          0   0   0           0  
      20  Transaction   0   1   1  0        1  
      21  Goto          0   1   0           0  

Result: OK    INSERT OR REPLACE  expected=insert_replace  got=insert_replace

The fact that detecting this requires doing tricks like [r for r in inserts if not (r[6] & OPFLAG_ISUPDATE)] is alarming though, especially since the SQLite explain bytecode is not considered a stable interface!

@simonw
Copy link
Copy Markdown
Owner Author

simonw commented May 27, 2026

That Claude transcript also identified that SQLite has the perfect feature for this called the pre-update hook - https://sqlite.org/c3ref/preupdate_blobwrite.html - but it's only available "if SQLite is compiled using the SQLITE_ENABLE_PREUPDATE_HOOK compile-time option" and is not exposed by sqlite3 in Python (though it is exposed by apsw).

@simonw
Copy link
Copy Markdown
Owner Author

simonw commented May 27, 2026

  • [P2] Function blocking can be bypassed through schema expressions.
    Functions are only recorded when SQLite authorizer reports SQLITE_FUNCTION at datasette/utils/sql_analysis.py, but defaults, CHECK/generated/index expressions during DML did not show up in analysis. Example: create table d(x default (hex(randomblob(4)))) and then insert into d default values were both allowed without any function operation. Also, datasette/utils/sql_analysis.py marks allowlisted names like printf, length, and substr internal in CREATE TABLE AS SELECT with no source table, so create table t as select printf(...) bypasses the current “unsupported function” rejection.

I'm looking at this one now. Neither the authorizer nor an explain can tell if a create table statement might include a function, but once you have created the table you can check pragma table_xinfo(d); and look in the dflt_value column for values like hex(randomblob(4)).

So we could maybe detect create table using default functions by running it in a fresh connection, or a transaction, and checking that and then rolling back?

I'm not convinced it's worth solving this though. I don't think blocking default functions in create tables is worthwhile.

simonw added 2 commits May 27, 2026 16:14
Stop marking sqlite_master and sqlite_schema reads as internal as soon as the SQLite authorizer reports them. The later DDL-aware pass still treats schema catalog access as internal when it accompanies semantic CREATE, ALTER, or DROP operations.

This makes explicit catalog reads in write SQL fall through to the deny-by-default path as unsupported read schema operations, preventing queries from copying private table definitions into writable tables.

Refs #2749 (comment)
simonw added a commit that referenced this pull request May 27, 2026
Reject VACUUM explicitly during write-query permission analysis so arbitrary write SQL and untrusted stored write queries cannot run it, even when the actor has execute-write-sql.

Refs #2749 (comment) (P3)
Reject VACUUM explicitly during write-query permission analysis so arbitrary write SQL and untrusted stored write queries cannot run it, even when the actor has execute-write-sql.

Refs #2749 (comment) (P3)
@simonw simonw force-pushed the codex/operation-sql-analysis branch from 668f250 to 11bddc8 Compare May 28, 2026 00:09
@simonw
Copy link
Copy Markdown
Owner Author

simonw commented May 28, 2026

I had Codex use Showboat to exercise the SQL write API, here's the result: https://gist.github.com/simonw/3ba1ac83ba438b6d6558eb2ceff1adce

Everything worked well, but it did spot one weird almost-bug: CREATE INDEX IF NOT EXISTS can be rejected after the index already exists.

If you run this twice:

create index if not exists idx_showboat_dogs_name on showboat_dogs(name)

The first one returns "Query executed", but the second one returns an error of "Unsupported SQL operation: unknown statement".

I've decided not to try to fix this one, since the fix is very non-obvious (it's not easy to detect from either the analyzer operations or the explain output) and I don't think this is a particularly bad bug. It's just a tiny bit weird.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant