Skip to content

Fix FTS5 corruption on publish#768

Open
ask-bonk[bot] wants to merge 4 commits intomainfrom
opencode/issue649-20260425090851
Open

Fix FTS5 corruption on publish#768
ask-bonk[bot] wants to merge 4 commits intomainfrom
opencode/issue649-20260425090851

Conversation

@ask-bonk
Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot commented Apr 25, 2026

What does this PR do?

Fixes the SQLITE_CORRUPT_VTAB (database disk image is malformed) error reported on every publish/edit on collections with search enabled.

Root cause

The bug lives in packages/core/src/search/fts-manager.ts:87-154. The FTS5 table is created with content='ec_<slug>', which makes it an external content FTS5 table — meaning the row data lives in the content table and FTS5 only stores the inverted index. But the AFTER UPDATE/DELETE triggers used the contentless-table sync pattern:

DELETE FROM "${ftsTable}" WHERE rowid = OLD.rowid;

For external-content tables, DELETE FROM fts WHERE rowid = ? causes FTS5 to read column values from the backing content table to figure out which tokens to remove. By the time an AFTER UPDATE trigger fires, the row in ec_<collection> already holds the NEW values. FTS5 therefore removes NEW tokens while the index still holds OLD tokens, drifting the inverted index out of sync until SQLite refuses further reads with SQLITE_CORRUPT_VTAB. Same problem in the AFTER DELETE trigger — by then the row is gone, so FTS5 can't find any column values to remove. This explains every observation in the issue thread:

  • Only published-content edits trigger it (each edit is an UPDATE that fires the broken trigger).
  • "Disable search" fixes it (drops the broken triggers, stopping further damage).
  • 'rebuild' works after dropping/recreating the table but not always on its own (the corruption can be in the FTS shadow tables themselves).
  • It hits 0.4.0, 0.5.0, and 0.6.0 — the trigger code is identical across those versions.

Fix (packages/core/src/search/fts-manager.ts)

  1. Rewrote the AFTER UPDATE and AFTER DELETE triggers to use the documented external-content-safe pattern from https://www.sqlite.org/fts5.html#external_content_tables:
    INSERT INTO fts(fts, rowid, col1, col2) VALUES('delete', OLD.rowid, OLD.col1, OLD.col2);
    This passes the OLD column values directly so FTS5 removes the right tokens regardless of what's currently in the content table.
  2. Added auto-healing for upgrading sites via hasLegacyTriggers(). Sites running pre-fix versions have the broken triggers persisted in sqlite_master plus an already-corrupted index. verifyAndRepairIndex (called from the existing startup ensureSearchHealthy in emdash-runtime.ts:1589) now scans sqlite_master for the legacy DELETE FROM "<fts>" WHERE rowid = OLD.rowid marker and rebuilds the index, replacing the triggers in the process.
  3. Added an FTS5 'integrity-check' pass to verifyAndRepairIndex so any latent corruption that survives the row-count check (e.g. orphaned tokens) is detected and repaired automatically rather than surfacing on the next publish.

Tests

packages/core/tests/integration/search/fts-corruption.test.ts adds three new tests:

  • The reported reproduction (publish → edit → publish + integrity-check).
  • 25 edit/publish cycles with FTS integrity-check.
  • Auto-repair when legacy unsafe triggers are present in sqlite_master.

All 5 FTS tests pass; full core suite at 2590 passing.

Closes #649

Type of change

  • Bug fix
  • Feature (requires maintainer-approved Discussion)
  • Refactor (no behavior change)
  • Translation
  • Documentation
  • Performance improvement
  • Tests
  • Chore (dependencies, CI, tooling)

Checklist

  • I have read CONTRIBUTING.md
  • pnpm typecheck passes
  • pnpm lint passes
  • pnpm test passes (or targeted tests for my change)
  • pnpm format has been run
  • I have added/updated tests for my changes (if applicable)
  • User-visible strings in the admin UI are wrapped for translation and pnpm locale:extract has been run (if applicable) — N/A, no admin UI changes
  • I have added a changeset (if this PR changes a published package)
  • New features link to an approved Discussion: https://github.com/emdash-cms/emdash/discussions/...

AI-generated code disclosure

  • This PR includes AI-generated code

This PR was opened by /ultrabonk (Claude Opus 4.7 via opencode + ask-bonk) in response to my comment on #649. The diagnosis, fix, tests, and changeset were all model-generated. Human review still pending. The original PR body lacked the template — that's a known gap in opencode's GitHub Action flow (it uses the model's response verbatim as the PR body without loading .github/PULL_REQUEST_TEMPLATE.md); fix in flight as #769 + #770. This description was retroactively rewritten by hand to match the template.

Screenshots / test output

github run

Co-authored-by: ascorbic <ascorbic@users.noreply.github.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 25, 2026

🦋 Changeset detected

Latest commit: e77a8d7

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 12 packages
Name Type
emdash Patch
@emdash-cms/cloudflare Patch
@emdash-cms/fixture-perf-site Patch
@emdash-cms/perf-demo-site Patch
@emdash-cms/cache-demo-site Patch
@emdash-cms/admin Patch
@emdash-cms/auth Patch
@emdash-cms/blocks Patch
@emdash-cms/gutenberg-to-portable-text Patch
@emdash-cms/x402 Patch
create-emdash Patch
@emdash-cms/plugin-embeds Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
🔵 In progress
View logs
emdash-playground ccf2d57 Apr 25 2026, 09:22 AM

@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
🔵 In progress
View logs
emdash-demo-cache ccf2d57 Apr 25 2026, 09:22 AM

@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
🔵 In progress
View logs
emdash-perf-coordinator ccf2d57 Apr 25 2026, 09:22 AM

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 25, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
emdash-demo-cache e77a8d7 Apr 26 2026, 05:43 AM

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 25, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 25, 2026

Open in StackBlitz

@emdash-cms/admin

npm i https://pkg.pr.new/@emdash-cms/admin@768

@emdash-cms/auth

npm i https://pkg.pr.new/@emdash-cms/auth@768

@emdash-cms/blocks

npm i https://pkg.pr.new/@emdash-cms/blocks@768

@emdash-cms/cloudflare

npm i https://pkg.pr.new/@emdash-cms/cloudflare@768

emdash

npm i https://pkg.pr.new/emdash@768

create-emdash

npm i https://pkg.pr.new/create-emdash@768

@emdash-cms/gutenberg-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/gutenberg-to-portable-text@768

@emdash-cms/x402

npm i https://pkg.pr.new/@emdash-cms/x402@768

@emdash-cms/plugin-ai-moderation

npm i https://pkg.pr.new/@emdash-cms/plugin-ai-moderation@768

@emdash-cms/plugin-atproto

npm i https://pkg.pr.new/@emdash-cms/plugin-atproto@768

@emdash-cms/plugin-audit-log

npm i https://pkg.pr.new/@emdash-cms/plugin-audit-log@768

@emdash-cms/plugin-color

npm i https://pkg.pr.new/@emdash-cms/plugin-color@768

@emdash-cms/plugin-embeds

npm i https://pkg.pr.new/@emdash-cms/plugin-embeds@768

@emdash-cms/plugin-forms

npm i https://pkg.pr.new/@emdash-cms/plugin-forms@768

@emdash-cms/plugin-webhook-notifier

npm i https://pkg.pr.new/@emdash-cms/plugin-webhook-notifier@768

commit: e77a8d7

@ascorbic
Copy link
Copy Markdown
Collaborator

/review

@ask-bonk
Copy link
Copy Markdown
Contributor Author

ask-bonk Bot commented Apr 25, 2026

@ascorbic Bonk workflow was cancelled.

View workflow run · To retry, trigger Bonk again.

@ascorbic
Copy link
Copy Markdown
Collaborator

/review

@ask-bonk
Copy link
Copy Markdown
Contributor Author

ask-bonk Bot commented Apr 25, 2026

LGTM!

github run

@ascorbic
Copy link
Copy Markdown
Collaborator

/ultrareview

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes recurring SQLITE_CORRUPT_VTAB on publish/edit for collections with FTS enabled by correcting FTS5 trigger semantics for external-content tables and adding automatic detection/repair of legacy unsafe triggers/corrupted indexes.

Changes:

  • Update FTS5 UPDATE/DELETE triggers to use the external-content-safe VALUES('delete', OLD...) pattern.
  • Add legacy-trigger detection via sqlite_master inspection and rebuild on upgrade when unsafe triggers are found.
  • Add an integrity-check pass during verification and introduce integration regression tests covering corruption and auto-repair.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
packages/core/src/search/fts-manager.ts Fixes trigger behavior for external-content FTS5 tables; adds legacy trigger detection and integrity-check based repair.
packages/core/tests/integration/search/fts-corruption.test.ts Adds regression + stress tests for corruption and verifies auto-repair behavior.
.changeset/fts-corrupt-vtab-fix.md Documents the patch release and explains root cause/fix at a high level.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 135 to +141
const fieldList = searchableFields.join(", ");
const newFieldList = searchableFields.map((f) => `NEW.${f}`).join(", ");
// `'delete'` takes the FTS5 virtual table name as the first column,
// then the rowid being removed, then the OLD value of every column
// declared on the FTS5 table (in declaration order: id, locale,
// then each searchable field).
const oldFieldList = searchableFields.map((f) => `OLD.${f}`).join(", ");
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If searchableFields can be empty, fieldList becomes an empty string and the generated SQL ends up with a trailing comma in the column list (and an extra comma in VALUES), which is a syntax error. Either enforce searchableFields.length > 0 in validateInputs (and/or at the call site), or branch the SQL generation to omit ", ${fieldList}" / ", ${oldFieldList}" when there are no searchable fields.

Copilot uses AI. Check for mistakes.
Comment thread packages/core/src/search/fts-manager.ts Outdated
Comment on lines +165 to +166
INSERT INTO "${ftsTable}"("${ftsTable}", rowid, id, locale, ${fieldList})
VALUES('delete', OLD.rowid, OLD.id, OLD.locale, ${oldFieldList});
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If searchableFields can be empty, fieldList becomes an empty string and the generated SQL ends up with a trailing comma in the column list (and an extra comma in VALUES), which is a syntax error. Either enforce searchableFields.length > 0 in validateInputs (and/or at the call site), or branch the SQL generation to omit ", ${fieldList}" / ", ${oldFieldList}" when there are no searchable fields.

Copilot uses AI. Check for mistakes.
Comment thread packages/core/src/search/fts-manager.ts Outdated
Comment on lines +423 to +428

const result = await sql<{ sql: string | null }>`
SELECT sql FROM sqlite_master
WHERE type = 'trigger'
AND tbl_name = ${this.getContentTableName(collectionSlug)}
AND name LIKE ${`${ftsTable}_%`}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLite LIKE treats "_" as a single-character wildcard, and ftsTable names contain underscores (e.g. "_emdash_fts_pages"). This pattern can unintentionally match non-FTS triggers and cause false-positive rebuilds. Prefer matching exact trigger names (e.g. IN (${ftsTable}_insert, ${ftsTable}_update, ${ftsTable}delete)) or escape "" and "%" and add an ESCAPE clause.

Suggested change
const result = await sql<{ sql: string | null }>`
SELECT sql FROM sqlite_master
WHERE type = 'trigger'
AND tbl_name = ${this.getContentTableName(collectionSlug)}
AND name LIKE ${`${ftsTable}_%`}
const insertTrigger = `${ftsTable}_insert`;
const updateTrigger = `${ftsTable}_update`;
const deleteTrigger = `${ftsTable}_delete`;
const result = await sql<{ sql: string | null }>`
SELECT sql FROM sqlite_master
WHERE type = 'trigger'
AND tbl_name = ${this.getContentTableName(collectionSlug)}
AND name IN (${insertTrigger}, ${updateTrigger}, ${deleteTrigger})

Copilot uses AI. Check for mistakes.
Comment on lines +512 to +520
// Check: FTS5 integrity-check. This catches corruption that the row
// count check misses (e.g. orphaned tokens in segments where the
// docsize entry exists but points to garbage). Throws on a corrupt
// index; treat the throw itself as the signal to rebuild.
try {
await sql
.raw(`INSERT INTO "${ftsTable}"("${ftsTable}") VALUES('integrity-check')`)
.execute(this.db);
} catch (err) {
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running FTS5 'integrity-check' on every verify pass can be expensive for large indexes and will directly impact startup time if ensureSearchHealthy runs at boot. Consider gating the integrity-check behind a cheaper heuristic (e.g. only after legacy-trigger detection, after row-count mismatch, or on a lower-frequency schedule), or making it configurable (e.g. "deep check" mode) to avoid consistently paying the full scan cost.

Copilot uses AI. Check for mistakes.
Comment thread packages/core/src/search/fts-manager.ts Outdated
Comment on lines +431 to +436
const legacyMarker = `DELETE FROM "${ftsTable}" WHERE rowid = OLD.rowid`;
for (const row of result.rows) {
if (row.sql && row.sql.includes(legacyMarker)) {
return true;
}
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legacy-trigger detection relies on an exact SQL substring match (including quoting and whitespace). If older shipped versions differed in formatting/quoting, hasLegacyTriggers can miss genuinely unsafe triggers and skip the rebuild. To make this more robust, consider normalizing whitespace and matching via a small regex (e.g. "DELETE\s+FROM\s+"?"?\s+WHERE\s+rowid\s*=\s*OLD\.rowid") or matching multiple expected variants.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL — the fix introduces a new SQLITE_CORRUPT_VTAB regression on restore / permanent-delete / edit-while-trashed.

Empirically verified against better-sqlite3 12.8.0 (vanilla SQLite, same as D1). With the PR's triggers in place, a normal user lifecycle now throws SQLITE_CORRUPT_VTAB:

Lifecycle: insert active → soft-delete → restore
  soft delete OK
  restore FAILED: SQLITE_CORRUPT_VTAB - database disk image is malformed

Lifecycle: insert active → soft-delete → permanent delete
  permanent delete FAILED: SQLITE_CORRUPT_VTAB

Lifecycle: edit a soft-deleted row
  edit-while-deleted FAILED: SQLITE_CORRUPT_VTAB

Root cause: the INSERT trigger has WHEN NEW.deleted_at IS NULL, so soft-deleted rows are never added to the FTS index. The new UPDATE / DELETE triggers issue 'delete' unconditionally, including for rows where OLD.deleted_at IS NOT NULL (i.e., the row is not in the FTS index at all). FTS5's 'delete' command is not a no-op when the rowid is absent — it tries to remove tokens that don't exist and corrupts state, throwing SQLITE_CORRUPT_VTAB and aborting the parent UPDATE/DELETE. The previous broken DELETE FROM fts WHERE rowid = OLD.rowid form silently no-op'd in that case, so this case worked by accident before.

Reachable from production via ContentRepository.restore() / ContentRepository.permanentDelete() (used by /api/.../restore and the trash endpoints, see packages/core/src/api/handlers/content.ts:765,810). Anyone who soft-deletes content on a search-enabled collection then tries to restore or permanently-delete it will hit the same error this PR claims to fix.

Fix: gate the 'delete' on OLD.deleted_at IS NULL (the row was actually indexed). Verified working — a full lifecycle (insert → edit → soft-delete → restore → edit → soft-delete → permanent-delete) survives integrity-check at every step:

CREATE TRIGGER ..._update AFTER UPDATE ON ec_pages BEGIN
  INSERT INTO fts(fts, rowid, id, locale, title, content)
  SELECT 'delete', OLD.rowid, OLD.id, OLD.locale, OLD.title, OLD.content
  WHERE OLD.deleted_at IS NULL;
  INSERT INTO fts(rowid, id, locale, title, content)
  SELECT NEW.rowid, NEW.id, NEW.locale, NEW.title, NEW.content
  WHERE NEW.deleted_at IS NULL;
END;

CREATE TRIGGER ..._delete AFTER DELETE ON ec_pages BEGIN
  INSERT INTO fts(fts, rowid, id, locale, title, content)
  SELECT 'delete', OLD.rowid, OLD.id, OLD.locale, OLD.title, OLD.content
  WHERE OLD.deleted_at IS NULL;
END;

The fts-corruption test file should add coverage for the trash lifecycle (restore + permanent-delete + edit-while-soft-deleted) — none of the three current tests exercise a row that was inserted with deleted_at IS NOT NULL or transitions through the trash, which is exactly why this regression slipped through.

Not blocking diagnosis — the diagnosis and the choice of 'delete' command are correct, and the legacy-trigger detection + integrity-check additions are good. But this needs the OLD-side gate before merging or the bug it ships will be the same SQLITE_CORRUPT_VTAB on a different operation.

Comment thread packages/core/src/search/fts-manager.ts Outdated
BEGIN
DELETE FROM "${ftsTable}" WHERE rowid = OLD.rowid;
INSERT INTO "${ftsTable}"("${ftsTable}", rowid, id, locale, ${fieldList})
VALUES('delete', OLD.rowid, OLD.id, OLD.locale, ${oldFieldList});
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL — this 'delete' is unconditional, but the INSERT trigger above only indexes rows where NEW.deleted_at IS NULL. So when an UPDATE fires with OLD.deleted_at IS NOT NULL (restore from trash, edit a row that's currently in the trash, or any update on a row that was inserted soft-deleted), the OLD row is not in the FTS index and FTS5's 'delete' command throws SQLITE_CORRUPT_VTAB — the same error this PR is fixing.

Reproduced against vanilla SQLite (better-sqlite3 12.8.0):

insert active → soft-delete → restore  =>  SQLITE_CORRUPT_VTAB
insert active → soft-delete → edit     =>  SQLITE_CORRUPT_VTAB

Reachable from production via ContentRepository.restore() (packages/core/src/database/repositories/content.ts:625), used by the restore endpoint at packages/core/src/api/handlers/content.ts:765. Any user who soft-deletes content on a search-enabled collection and then tries to restore it will hit corruption.

Fix: gate on OLD.deleted_at IS NULL so we only 'delete' rows that were actually indexed:

INSERT INTO "${ftsTable}"("${ftsTable}", rowid, id, locale, ${fieldList})
SELECT 'delete', OLD.rowid, OLD.id, OLD.locale, ${oldFieldList}
WHERE OLD.deleted_at IS NULL;

Verified that this form survives the full lifecycle (insert → edit → soft-delete → restore → edit → soft-delete → permanent-delete) with 'integrity-check' passing at every step. The previous (buggy) DELETE FROM fts WHERE rowid = OLD.rowid was silently a no-op when the rowid wasn't in the index, which is why this case happened to work before — 'delete' is not a no-op.

Comment thread packages/core/src/search/fts-manager.ts Outdated
BEGIN
DELETE FROM "${ftsTable}" WHERE rowid = OLD.rowid;
INSERT INTO "${ftsTable}"("${ftsTable}", rowid, id, locale, ${fieldList})
VALUES('delete', OLD.rowid, OLD.id, OLD.locale, ${oldFieldList});
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL — same issue as the UPDATE trigger. A DELETE FROM ec_<slug> on a row whose deleted_at IS NOT NULL (i.e. the user permanently-deletes a soft-deleted row from trash, the standard flow via ContentRepository.permanentDelete() -> packages/core/src/api/handlers/content.ts:810) will fire this trigger with OLD values that were never inserted into the FTS index, and FTS5 throws SQLITE_CORRUPT_VTAB, aborting the user's DELETE.

Verified end-to-end against vanilla SQLite. Same fix:

INSERT INTO "${ftsTable}"("${ftsTable}", rowid, id, locale, ${fieldList})
SELECT 'delete', OLD.rowid, OLD.id, OLD.locale, ${oldFieldList}
WHERE OLD.deleted_at IS NULL;

.execute(db),
).resolves.toBeDefined();
});
});
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test gap that hid the regression: none of the three new tests exercise a row that was inserted soft-deleted, transitions through the trash, or comes back through restore()/permanentDelete(). The new triggers throw SQLITE_CORRUPT_VTAB in all three of those cases (verified empirically), but the test suite never goes there, so it ships green.

Please add coverage for the trash lifecycle once the trigger gating is fixed — at minimum: insert active row → soft-delete → restore → integrity-check, and insert → soft-delete → permanent-delete → integrity-check. These are the ones that currently break and would have caught this in CI.

@ask-bonk
Copy link
Copy Markdown
Contributor Author

ask-bonk Bot commented Apr 25, 2026

Review posted with three line-anchored comments. Found a critical regression: the new triggers throw SQLITE_CORRUPT_VTAB on restore from trash, permanent-delete from trash, and edits to soft-deleted rows — empirically reproduced against vanilla SQLite. The INSERT trigger has WHEN NEW.deleted_at IS NULL so soft-deleted rows are never indexed, but the new UPDATE/DELETE triggers issue 'delete' unconditionally. The old buggy DELETE FROM fts WHERE rowid = OLD.rowid silently no-op'd on missing rowids; the documented 'delete' command does not. Fix is to gate the 'delete' on OLD.deleted_at IS NULL.

Review URL: #768 (review)

github run

@ascorbic ascorbic closed this Apr 26, 2026
@ascorbic ascorbic reopened this Apr 26, 2026
@ascorbic
Copy link
Copy Markdown
Collaborator

/ultrabonk look at the review comments and fix the issues

Co-authored-by: ascorbic <ascorbic@users.noreply.github.com>
@ask-bonk
Copy link
Copy Markdown
Contributor Author

ask-bonk Bot commented Apr 26, 2026

All search tests pass (10/10). Summary of changes:

  1. Critical fix (fts-manager.ts): Gated the 'delete' command in both UPDATE and DELETE triggers on OLD.deleted_at IS NULL. Without this, restore-from-trash, permanent-delete, and edit-while-trashed all raised SQLITE_CORRUPT_VTAB (the same error the original PR was meant to fix).
  2. Tightened legacy-trigger detection (hasLegacyTriggers): Switched from a LIKE pattern (where underscores are wildcards) to exact IN (...) matching of trigger names, and a whitespace-tolerant regex for the unsafe SQL fragment so older shipped variants are caught.
  3. Defensive guard in createTriggers: Throws if searchableFields is empty rather than producing malformed SQL with trailing commas.
  4. Three new lifecycle tests covering the regression: soft-delete -> restore -> integrity-check, soft-delete -> permanent-delete, and edit-while-soft-deleted.

What does this PR do?

Addresses review feedback on PR #768. The FTS5 trigger fix introduced a new SQLITE_CORRUPT_VTAB regression on restore, permanent-delete, and edit-while-trashed flows. This PR fixes that and tightens a few rough edges flagged by Copilot.

The regression

The INSERT trigger has WHEN NEW.deleted_at IS NULL, so soft-deleted rows are never indexed. The new UPDATE/DELETE triggers issued 'delete' unconditionally. FTS5's 'delete' command is not a no-op for missing rowids (the previous broken DELETE FROM fts WHERE rowid = OLD.rowid form silently was, which is why this case worked by accident before) and raises SQLITE_CORRUPT_VTAB when the row was never in the index. Reachable from production via ContentRepository.restore() and ContentRepository.permanentDelete() -- any user who soft-deletes content on a search-enabled collection then tries to restore or permanently-delete it would hit the same error #768 was meant to fix.

Fixes (packages/core/src/search/fts-manager.ts)

  1. Gate 'delete' on OLD.deleted_at IS NULL in both UPDATE and DELETE triggers using INSERT ... SELECT ... WHERE OLD.deleted_at IS NULL. The OLD-side index state matches the INSERT trigger's NULL-only indexing condition.
  2. Tighten hasLegacyTriggers detection -- switched from a LIKE pattern (underscores in _emdash_fts_<slug> are SQL wildcards) to exact IN (...) matching of trigger names, plus a whitespace-tolerant regex match for the unsafe fragment so older shipped variants with different formatting are still caught.
  3. Defensive guard in createTriggers -- throws if searchableFields is empty rather than producing malformed SQL with trailing commas. The call sites (enableSearch, verifyAndRepairIndex) already filter this out, but the function should fail loud rather than silent.

Tests

Three new lifecycle tests in fts-corruption.test.ts:

  • soft-delete → restore → integrity-check (the main regression case)
  • soft-delete → permanent-delete (DELETE on a row not in the index)
  • soft-delete → direct UPDATE (edit-while-trashed)

All 10 search tests pass; full core suite at 2577 passing (1 pre-existing failure in wordpress-slug-sanitization.test.ts that's a workspace self-resolution issue, present on main and unrelated to this change).

Type of change

  • Bug fix
  • Feature (requires maintainer-approved Discussion)
  • Refactor (no behavior change)
  • Translation
  • Documentation
  • Performance improvement
  • Tests
  • Chore (dependencies, CI, tooling)

Checklist

  • I have read CONTRIBUTING.md
  • pnpm typecheck passes
  • pnpm lint passes (no new diagnostics in changed files; remaining 25 are pre-existing on main)
  • pnpm test passes (or targeted tests for my change)
  • pnpm format has been run
  • I have added/updated tests for my changes (if applicable)
  • User-visible strings in the admin UI are wrapped for translation and pnpm locale:extract has been run (if applicable) — N/A, no admin UI changes
  • I have added a changeset (if this PR changes a published package) — updated existing changeset
  • New features link to an approved Discussion: https://github.com/emdash-cms/emdash/discussions/...

AI-generated code disclosure

  • This PR includes AI-generated code

github run

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +515 to +524
// Check: legacy/broken triggers from a pre-fix install. These corrupt
// the index on every UPDATE/DELETE, so any site that has them needs a
// rebuild even if the row counts happen to match right now.
if (fields.length > 0 && (await this.hasLegacyTriggers(collectionSlug))) {
console.warn(
`FTS index for "${collectionSlug}" has legacy sync triggers from a pre-fix EmDash version. Rebuilding to install corruption-safe triggers.`,
);
await this.rebuildIndex(collectionSlug, fields, config?.weights);
return true;
}
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifyAndRepairAll() (and therefore the legacy-trigger auto-heal) is currently only invoked via EmDashRuntime.ensureSearchHealthy, which the codebase calls from the search endpoints. Content mutations (publish/edit/delete/restore) don’t call this health check, so an upgraded site with legacy triggers/corruption can still hit SQLITE_CORRUPT_VTAB on publish before any search request ever triggers the rebuild.

Consider ensuring the verify/repair path runs before content writes for search-enabled collections (e.g. call ensureSearchHealthy()/verifyAndRepairIndex() from publish/edit/trash routes, or trigger a targeted rebuild when a mutation fails with SQLITE_CORRUPT_VTAB).

Copilot uses AI. Check for mistakes.
Comment on lines +553 to +560
// Check: FTS5 integrity-check. This catches corruption that the row
// count check misses (e.g. orphaned tokens in segments where the
// docsize entry exists but points to garbage). Throws on a corrupt
// index; treat the throw itself as the signal to rebuild.
try {
await sql
.raw(`INSERT INTO "${ftsTable}"("${ftsTable}") VALUES('integrity-check')`)
.execute(this.db);
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifyAndRepairIndex now runs an unconditional FTS5 'integrity-check' whenever an index exists and row counts match. This can be significantly more expensive than the row-count check for large indexes and will run on the first search request in each worker lifetime.

If this is intended, it may be worth documenting/justifying the expected cost. Otherwise, consider making integrity-check conditional (e.g. only after detecting legacy triggers / mismatch / a prior crash marker, or rate-limiting it with a persisted “last checked” timestamp) so search latency doesn’t regress for large sites.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQLite Database Corrupt Error on new site

2 participants