Skip to content

feat: SQLite Tier 1 e2e — provenance writer + update_hook (V-L1-C1 / #46)#100

Merged
hyperpolymath merged 1 commit into
mainfrom
v-l1-c1-sqlite-tier1-e2e
May 15, 2026
Merged

feat: SQLite Tier 1 e2e — provenance writer + update_hook (V-L1-C1 / #46)#100
hyperpolymath merged 1 commit into
mainfrom
v-l1-c1-sqlite-tier1-e2e

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

Closes #46 (V-L1-C1).

End-to-end SQLite Tier 1: target SQLite → sqlite3_update_hook → provenance sidecar (separate SQLite file) → verifiable hash chain. The sidecar holds an append-only hash-chained log per entity; the target is never written to.

What lands

tier1::provenance

  • SIDECAR_DDL — schema text for verisimdb_provenance_log and the new verisimdb_provenance_chain_head (per-entity tip pointer, used for O(1) previous_hash lookup).
  • init_sidecar_schema(conn) — idempotent DDL applier.
  • append_provenance(conn, entity_id, table_name, op, actor, before, transformation) -> Result<hash>BEGIN IMMEDIATE → read head → compute canonical domain-tagged hash via abi::ProvenanceEntry::compute_hash (V-L2-C1: hash full audit (actor + before_snapshot + transformation) with domain separation #27) → insert log row → update head → commit.
  • verify_chain(conn, entity_id) — walks the log; recomputes each hash; checks each previous_hash link.
  • ProvenanceRecord::compute_hash preserved as a shim forwarding to the canonical impl.

intercept::sqlite::SqliteInterceptor

  • Wraps a sidecar Connection behind Arc<Mutex<...>>.
  • .install(&target) registers sqlite3_update_hook on the target.
  • Optional .with_resolver(...) to plug in a logical-PK lookup (default stringifies the rowid).
  • Hook NEVER writes back to the target — V-L1-C1 isolation invariant enforced by a dedicated test.

Cargo

  • rusqlite gains hooks feature flag.
  • proptest added to [dev-dependencies] (used by phase-2 multithreaded tests).

Pre-existing test fix (folded in)

tests/integration_test.rs referenced 5 table/view names from before the verisim_*verisimdb_* migration. Renamed in-place so the suite runs green again. Trivial but blocked the test run — folded here rather than carried as a separate PR.

Tests

12 new tests (10 unit + 2 integration):

  • tier1::provenance × 6 — schema idempotence, genesis, sequential chain, tamper detection (hash + previous_hash), entity isolation.
  • intercept::sqlite × 4 — basic insert, update+delete chain, target_database_is_not_modified_by_the_hook (the isolation invariant), custom resolver override.
  • tests/sqlite_intercept_e2e.rs × 2 — tempfile-backed end-to-end:
    • e2e_mixed_workload_verifies_all_chains (5 accounts × 5 ops, all chains verify, entry count matches, no verisimdb_* leakage on target).
    • e2e_chain_survives_reopen_of_sidecar (drop interceptor + reopen sidecar file; chain still verifies).

cargo test --workspace87 passed, 0 failed.

Acceptance criteria

  • Library function compiles and tests with rusqlite
  • Property test (proptest): N threads × M updates — proptest dep added; the multithreaded version is filed for a phase-2 follow-up (the e2e mixed-workload test exercises the deterministic non-threaded path)
  • Integration test: target.db unchanged across all writes; sidecar.db has expected entries

Out of scope

  • Multi-threaded property testproptest is wired in dev-deps; threading the sidecar through Arc<Mutex<...>> is sound but a real proptest with concurrent spawning belongs in a follow-up so the cadence isn't held up by contention-tuning.
  • before_snapshot capture — needs preupdate_hook (rusqlite has the feature; tracked for V-L1-C2 / V-L1-C2: SQLite Tier 1 temporal versioning writer + point-in-time read #47 which the temporal writer needs anyway).
  • Logical PK resolution beyond rowidEntityIdResolver plumbing is there; no production resolver ships in this PR.

Test plan

  • cargo test -p verisimiser --lib provenance:: (6/6)
  • cargo test -p verisimiser --lib intercept:: (4/4)
  • cargo test --test sqlite_intercept_e2e (2/2)
  • cargo test --workspace (87/87)
  • CI green
  • Smoke: open the sidecar file produced by an e2e run in sqlite3 and confirm the schema + chain visually

…_hook (V-L1-C1)

Closes #46.

End-to-end SQLite Tier 1: target SQLite → `sqlite3_update_hook` →
provenance sidecar (separate SQLite file) → verifiable hash chain.

### `tier1::provenance` rewrite

The module previously held only the `ProvenanceRecord` struct with a
deprecated string-based hash. This commit makes it the canonical home
for the Provenance concern's SQLite backend:

* **`SIDECAR_DDL`** — schema text for both
  `verisimdb_provenance_log` (the append-only entries table) and the
  new `verisimdb_provenance_chain_head` table (per-entity tip-of-chain
  pointer used to look up `previous_hash` in O(1) per append, without
  scanning the log).
* **`init_sidecar_schema(conn)`** — idempotent DDL applier.
* **`append_provenance(conn, entity_id, table_name, op, actor,
  before, transformation) -> Result<hash>`** — opens a
  `BEGIN IMMEDIATE` transaction, reads the chain head (or empty for
  genesis), computes the canonical domain-tagged hash via
  `abi::ProvenanceEntry::compute_hash` (#27 / V-L2-C1), inserts the
  log row, updates the chain head, commits. Returns the new hash.
* **`verify_chain(conn, entity_id)`** — walks the log in timestamp
  order, recomputing each entry's hash and checking the
  `previous_hash` links. Returns `Ok(true)` iff every link is intact.
* The legacy `ProvenanceRecord::compute_hash` is preserved as a
  shim that forwards to `abi::ProvenanceEntry::compute_hash` so any
  external callers see no behaviour change.

### `intercept::sqlite::SqliteInterceptor`

New module wiring `sqlite3_update_hook` (via rusqlite's
`Connection::update_hook`) on a target connection. Each INSERT,
UPDATE, and DELETE on the target produces a `Decl::Extern`-style
record in the sidecar:

* Constructor: `SqliteInterceptor::new(sidecar, actor)`.
* Optional `.with_resolver(...)` to override the default
  rowid-stringifying entity-id resolver — production usage typically
  routes rowid through a `SELECT` to fetch a logical PK column.
* `.install(&target)` registers the update_hook closure; the hook
  is `FnMut + Send + 'static` (rusqlite's bound) and shares the
  sidecar via `Arc<Mutex<Connection>>`.

The hook NEVER writes back to the target — that's the V-L1-C1
isolation invariant. An integration test enforces it directly
(`target_database_is_not_modified_by_the_hook`).

### Cargo dependencies

* `rusqlite` gains the `hooks` feature flag (required for
  `update_hook`).
* `proptest` added to `[dev-dependencies]` for the property-style
  tests to come in phase 2.

### Tests

10 new tests:

* 6 unit tests in `tier1::provenance::tests`:
  - `schema_is_idempotent`
  - `genesis_entry_chains_from_empty`
  - `sequential_appends_chain_correctly`
  - `verify_chain_detects_tampered_hash`
  - `verify_chain_detects_broken_chain_link`
  - `distinct_entities_have_independent_chains`
* 4 unit tests in `intercept::sqlite::tests`:
  - `target_insert_produces_sidecar_provenance_entry`
  - `update_and_delete_produce_chained_entries`
  - `target_database_is_not_modified_by_the_hook` (the isolation
    invariant)
  - `custom_resolver_overrides_rowid_default`
* 2 integration tests in `tests/sqlite_intercept_e2e.rs`
  (tempfile-backed, so the real on-disk path is exercised):
  - `e2e_mixed_workload_verifies_all_chains` — 5 accounts × 5 ops
    each (insert / 3 updates / delete), every chain verifies, the
    entry count matches the workload, the target has no leaked
    `verisimdb_*` tables.
  - `e2e_chain_survives_reopen_of_sidecar` — drop the
    interceptor + reopen the sidecar file from a fresh
    Connection; chain still verifies and the chain-head table
    still points at the latest entry.

`cargo test --workspace` → 87 passed, 0 failed.

### Pre-existing test fix

`tests/integration_test.rs` referenced 5 table names from before the
`verisim_*` → `verisimdb_*` migration. Renamed in-place so the file
runs green again. Unrelated to V-L1-C1 in spirit, but the failures
blocked the suite — folded in here rather than carried as a separate
trivial PR.

### Out of scope

* **Multi-threaded property test** (the issue's
  "N threads × M updates" line item) — the integration test
  exercises the e2e path with a non-trivial mixed workload, but
  doesn't actually concurrent-spawn. A follow-up can wire proptest
  + std::thread once the sidecar's `Arc<Mutex<Connection>>` access
  pattern is verified safe under contention. Tracked separately.
* **Logical PK resolution** beyond the rowid default — the
  `EntityIdResolver` plumbing is in place but no production
  resolver ships in this PR.
* **before_snapshot capture** — the update_hook fires after the
  row mutation, so reading the "before" state requires either a
  preupdate_hook (rusqlite has a `preupdate_hook` feature) or
  caching reads. Filed for V-L1-C2 (#47), which the temporal
  versioning writer needs anyway.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@hyperpolymath hyperpolymath merged commit 5f9abdb into main May 15, 2026
16 of 19 checks passed
@hyperpolymath hyperpolymath deleted the v-l1-c1-sqlite-tier1-e2e branch May 15, 2026 00:10
hyperpolymath added a commit that referenced this pull request May 15, 2026
* feat(tier1,intercept): SQLite Tier 1 e2e — provenance writer + update_hook (V-L1-C1)

Closes #46.

End-to-end SQLite Tier 1: target SQLite → `sqlite3_update_hook` →
provenance sidecar (separate SQLite file) → verifiable hash chain.

### `tier1::provenance` rewrite

The module previously held only the `ProvenanceRecord` struct with a
deprecated string-based hash. This commit makes it the canonical home
for the Provenance concern's SQLite backend:

* **`SIDECAR_DDL`** — schema text for both
  `verisimdb_provenance_log` (the append-only entries table) and the
  new `verisimdb_provenance_chain_head` table (per-entity tip-of-chain
  pointer used to look up `previous_hash` in O(1) per append, without
  scanning the log).
* **`init_sidecar_schema(conn)`** — idempotent DDL applier.
* **`append_provenance(conn, entity_id, table_name, op, actor,
  before, transformation) -> Result<hash>`** — opens a
  `BEGIN IMMEDIATE` transaction, reads the chain head (or empty for
  genesis), computes the canonical domain-tagged hash via
  `abi::ProvenanceEntry::compute_hash` (#27 / V-L2-C1), inserts the
  log row, updates the chain head, commits. Returns the new hash.
* **`verify_chain(conn, entity_id)`** — walks the log in timestamp
  order, recomputing each entry's hash and checking the
  `previous_hash` links. Returns `Ok(true)` iff every link is intact.
* The legacy `ProvenanceRecord::compute_hash` is preserved as a
  shim that forwards to `abi::ProvenanceEntry::compute_hash` so any
  external callers see no behaviour change.

### `intercept::sqlite::SqliteInterceptor`

New module wiring `sqlite3_update_hook` (via rusqlite's
`Connection::update_hook`) on a target connection. Each INSERT,
UPDATE, and DELETE on the target produces a `Decl::Extern`-style
record in the sidecar:

* Constructor: `SqliteInterceptor::new(sidecar, actor)`.
* Optional `.with_resolver(...)` to override the default
  rowid-stringifying entity-id resolver — production usage typically
  routes rowid through a `SELECT` to fetch a logical PK column.
* `.install(&target)` registers the update_hook closure; the hook
  is `FnMut + Send + 'static` (rusqlite's bound) and shares the
  sidecar via `Arc<Mutex<Connection>>`.

The hook NEVER writes back to the target — that's the V-L1-C1
isolation invariant. An integration test enforces it directly
(`target_database_is_not_modified_by_the_hook`).

### Cargo dependencies

* `rusqlite` gains the `hooks` feature flag (required for
  `update_hook`).
* `proptest` added to `[dev-dependencies]` for the property-style
  tests to come in phase 2.

### Tests

10 new tests:

* 6 unit tests in `tier1::provenance::tests`:
  - `schema_is_idempotent`
  - `genesis_entry_chains_from_empty`
  - `sequential_appends_chain_correctly`
  - `verify_chain_detects_tampered_hash`
  - `verify_chain_detects_broken_chain_link`
  - `distinct_entities_have_independent_chains`
* 4 unit tests in `intercept::sqlite::tests`:
  - `target_insert_produces_sidecar_provenance_entry`
  - `update_and_delete_produce_chained_entries`
  - `target_database_is_not_modified_by_the_hook` (the isolation
    invariant)
  - `custom_resolver_overrides_rowid_default`
* 2 integration tests in `tests/sqlite_intercept_e2e.rs`
  (tempfile-backed, so the real on-disk path is exercised):
  - `e2e_mixed_workload_verifies_all_chains` — 5 accounts × 5 ops
    each (insert / 3 updates / delete), every chain verifies, the
    entry count matches the workload, the target has no leaked
    `verisimdb_*` tables.
  - `e2e_chain_survives_reopen_of_sidecar` — drop the
    interceptor + reopen the sidecar file from a fresh
    Connection; chain still verifies and the chain-head table
    still points at the latest entry.

`cargo test --workspace` → 87 passed, 0 failed.

### Pre-existing test fix

`tests/integration_test.rs` referenced 5 table names from before the
`verisim_*` → `verisimdb_*` migration. Renamed in-place so the file
runs green again. Unrelated to V-L1-C1 in spirit, but the failures
blocked the suite — folded in here rather than carried as a separate
trivial PR.

### Out of scope

* **Multi-threaded property test** (the issue's
  "N threads × M updates" line item) — the integration test
  exercises the e2e path with a non-trivial mixed workload, but
  doesn't actually concurrent-spawn. A follow-up can wire proptest
  + std::thread once the sidecar's `Arc<Mutex<Connection>>` access
  pattern is verified safe under contention. Tracked separately.
* **Logical PK resolution** beyond the rowid default — the
  `EntityIdResolver` plumbing is in place but no production
  resolver ships in this PR.
* **before_snapshot capture** — the update_hook fires after the
  row mutation, so reading the "before" state requires either a
  preupdate_hook (rusqlite has a `preupdate_hook` feature) or
  caching reads. Filed for V-L1-C2 (#47), which the temporal
  versioning writer needs anyway.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(tier1): SQLite temporal versioning writer (V-L1-C2)

Closes #47.

Companion to #46 (V-L1-C1, provenance writer): the temporal sidecar
holds full row-state snapshots per (entity_id, table_name, version)
so the system can answer "what did this look like at time T?" and
"roll this back to version N" without touching the target database.

### `tier1::temporal` rewrite

The module previously held only the `TemporalVersion` struct. This
commit makes it the canonical home for the Temporal concern's SQLite
backend:

* **`SIDECAR_DDL`** — schema text for `verisimdb_temporal_versions`
  including the partial UNIQUE INDEX
  `(entity_id, table_name) WHERE valid_to IS NULL` (from V-L2-H1 /
  #41) that enforces "at most one current version per (entity, table)"
  at the storage layer.
* **`init_sidecar_schema(conn)`** — idempotent DDL applier.
* **`append_version(conn, entity_id, table_name, snapshot, op)
  -> Result<version>`** —
  - `BEGIN IMMEDIATE` transaction.
  - Read `MAX(version)` for the entity/table; next is `prev + 1`.
  - Close out the previous current row by setting `valid_to = now`.
  - Insert the new row with `valid_to = NULL`.
  - Commit.
  The transaction discipline + partial UNIQUE index make the version
  sequence strictly monotonic even under concurrent writers (SQLite
  serialises through its write lock).
* **`read_at(conn, entity_id, table_name, t)
  -> Result<Option<String>>`** — point-in-time query: returns the
  snapshot whose `valid_from <= t` and whose `valid_to` is either
  `NULL` (still current) or `> t`. Picks the highest-numbered match
  for safety against any out-of-order writes.
* **`read_current(conn, entity_id, table_name)
  -> Result<Option<String>>`** — convenience helper: the row with
  `valid_to IS NULL` (or `None` if the entity is unknown / closed
  without successor).
* **`rollback_to(conn, entity_id, table_name, target_version)
  -> Result<new_version>`** — append-only rollback: fetches the
  snapshot at `target_version`, then calls `append_version` with
  `operation = "rollback"`. The rollback itself is a versioned event,
  not an in-place mutation, so the chain remains tamper-evident.

### Tests (11 new in `tier1::temporal::tests`)

* `schema_is_idempotent`
* `genesis_append_starts_at_version_one`
* `sequential_appends_are_monotonic_and_close_previous` — three
  inserts, only the last is current; partial UNIQUE index enforced.
* `read_current_returns_latest_snapshot`
* `read_current_returns_none_for_unknown_entity`
* `read_at_returns_snapshot_at_or_before_time` — checks both a
  past-tense read (returns v1 when v2 hasn't happened yet) and a
  current-tense read (returns v2 after the second update). Uses
  20ms sleeps between writes so the timestamps land in distinct
  RFC3339 milliseconds.
* `read_at_returns_none_before_first_version`
* `rollback_appends_new_version_with_old_snapshot` — `v1, v2, v3`,
  then `rollback_to(v1)`; the new v4's snapshot equals v1's and its
  `operation` is `"rollback"`.
* `rollback_unknown_version_errors`
* `fifty_appends_yield_monotonic_versions` — deterministic version
  of the "monotonic version numbers" acceptance criterion. Asserts
  the version sequence is exactly `1..=50` with no gaps; the
  storage layer holds exactly 50 rows and exactly 1 with
  `valid_to IS NULL`.
* `distinct_entities_have_independent_versions` — `e1` and `e2`
  each get version `2` independently.

### Acceptance

* [x] Library function `tier1::temporal::append_version(...)`
* [x] Point-in-time query helper: `read_at(...)` returning
  `Option<String>` (snapshot is opaque-string; caller decides
  format — typical JSON)
* [x] Rollback helper (`rollback_to`)
* [x] Property test for monotonic version numbers
  (`fifty_appends_yield_monotonic_versions` — the deterministic
  formulation; proptest randomisation belongs in a follow-up
  alongside multithreaded contention)

### Out of scope

* **Wiring into `intercept::sqlite::SqliteInterceptor`** — the
  current interceptor only writes provenance entries. Adding
  temporal capture from the same hook needs `preupdate_hook` (the
  regular `update_hook` fires AFTER the mutation so the
  "before-state" snapshot isn't visible). Rusqlite has the
  `preupdate_hook` feature; tracked for a follow-up that flips on
  the feature, adds an `intercept::sqlite::install_temporal_hook`
  variant, and writes both the provenance entry and the temporal
  snapshot in the same hook callback.
* **Multithreaded property test** — same disposition as #46.
  `proptest` is wired into dev-deps but the threaded version
  belongs in a follow-up.

Stacked on #100 (V-L1-C1). Base rebases to `main` after that PR
lands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

V-L1-C1: end-to-end SQLite Tier 1 — sqlite3_update_hook + sidecar provenance writer

1 participant