Fix migration 016 partial-reapply crash#976
Merged
Conversation
Reproduces the symptom from #954: when migration 016's up() partially completes — for example, the first createTable succeeds but a later statement crashes due to a D1 subrequest limit, isolate cancellation, or transient connection error — the migration record never gets inserted into _emdash_migrations. On the next request, Kysely sees 016 still pending and re-runs up() from the top, which currently crashes with 'table _emdash_api_tokens already exists'. The new tests directly invoke up() twice, and also simulate the partial-apply state by pre-creating just the first table 016 owns. Both currently fail; the next commit makes 016 idempotent so they pass.
) When migration 016's up() crashes mid-way -- D1 subrequest limit, isolate cancellation, transient connection error -- the migration record never gets inserted into _emdash_migrations. Kysely's next attempt re-runs up() from the top and crashes with 'table _emdash_api_tokens already exists', blocking every subsequent boot of the Worker. This is the failure path the user hit on a fresh D1 database: setup wizard runs, the first table creates, something later in 016 fails, and the deploy is wedged. Adds .ifNotExists() to every CREATE TABLE and CREATE INDEX in 016's up() and .ifExists() to the symmetric drops in down(). The retry now treats already-applied steps as no-ops and proceeds to create the remaining tables, matching the recovery pattern from #803 (migrations 034 and 035).
Addresses adversarial review feedback: the original partial-apply test hand-rolled an _emdash_api_tokens schema that was meant to mimic the state left by a crashed up(). Because CREATE TABLE IF NOT EXISTS over an existing table is a no-op even when the new definition differs, a schema drift in the migration would leave the test silently passing while the recovery path no longer matched what we shipped. Reuses the migration's own up() to set up the partial state, then drops only the trailing tables before retrying. Adds explicit assertions on the recovered table's columns and 016's indexes, so a future change to the migration that drops a column or index gets caught here rather than slipping through.
🦋 Changeset detectedLatest commit: 91fb5c0 The changes in this PR will be included in the next version bump. This PR includes changesets to release 13 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-perf-coordinator | 91fb5c0 | May 10 2026, 11:11 AM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | 91fb5c0 | May 10 2026, 11:11 AM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-i18n | 91fb5c0 | May 10 2026, 11:11 AM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-playground | 91fb5c0 | May 10 2026, 11:12 AM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-demo-cache | 91fb5c0 | May 10 2026, 11:12 AM |
@emdash-cms/admin
@emdash-cms/auth
@emdash-cms/blocks
@emdash-cms/cloudflare
emdash
create-emdash
@emdash-cms/gutenberg-to-portable-text
@emdash-cms/x402
@emdash-cms/plugin-ai-moderation
@emdash-cms/plugin-atproto
@emdash-cms/plugin-audit-log
@emdash-cms/plugin-color
@emdash-cms/plugin-embeds
@emdash-cms/plugin-forms
@emdash-cms/plugin-webhook-notifier
commit: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Now I'll craft the final response as the PR body. The workflow will use this as the PR body when opening the PR.
What does this PR do?
Closes #954.
Migration
016_api_tokenswas not safe to re-run against a partially-applied schema. The user-reported failure on a fresh Cloudflare Workers + D1 deploy is the canonical instance: ifup()crashes mid-way -- D1 subrequest limit, isolate cancellation, transient connection error -- the firstcreateTable("_emdash_api_tokens")succeeds but the migration record never gets inserted into_emdash_migrations. On the next request, Kysely sees 016 still pending and re-runsup()from the top, which crashes withtable "_emdash_api_tokens" already existsand wedges every subsequent boot.The fix is the same shape as PR #803 (which made 034 and 035 idempotent for the same bug class): every
CREATE TABLEandCREATE INDEXin 016'sup()is guarded with.ifNotExists(), and the symmetricdropTablecalls indown()use.ifExists(). A retry now treats already-applied steps as no-ops and creates the remaining tables.Residual risk worth flagging: migrations 017+ have the same structural vulnerability (no
.ifNotExists()on theirCREATE TABLE/CREATE INDEXcalls; 017 also has a bareALTER TABLE ... ADD COLUMNthat will hard-fail on retry withduplicate column name). Per AGENTS.md scope discipline and the narrow precedent set by #803, I have not preemptively fixed those here -- only the migration named in the report. They should be addressed when (a) a user hits one of them, or (b) a maintainer decides to do a deliberate sweep.Reproduction & TDD shape
6447fd9-- adds the failing unit test. At that commit the new tests fail with the exact error from Migration 016_api_tokens fails with "table already exists" on fresh D1 database #954:table "_emdash_api_tokens" already exists.839eb31-- adds.ifNotExists()/.ifExists()to 016. Tests pass.eea15a0-- changeset.91fb5c0-- strengthens the partial-reapply test in response to adversarial review. The original version hand-rolled a partial-state schema; becauseCREATE TABLE IF NOT EXISTSis a no-op against a pre-existing table even when the new definition differs, a future schema drift in 016 would have left the test silently passing. The strengthened version reusesup()itself to set up the partial state, drops only the trailing tables, retries, and asserts the recovered table's columns and 016's owned indexes are intact.I verified the test catches the regression by reverting just the migration source while keeping the strengthened test: both nested re-run-safety tests fail with
table "_emdash_api_tokens" already exists-- the exact symptom from the issue.Type of change
Checklist
pnpm typecheckpasses (verifiedpnpm --filter emdash typecheckclean afterpnpm build; pre-existing failure in@emdash-cms/registry-clientreproduces on136d348and is unrelated to this PR)pnpm lintpasses (no new diagnostics;pnpm --silent lint:json | jq '.diagnostics | length'returns the same 73 pre-existing warnings on this branch as on136d348, none on the changed files)pnpm testpasses (pnpm --filter emdash test-- 204 files, 3214 tests pass)pnpm formathas been runAI-generated code disclosure
/bonkworkflow)Screenshots / test output
Regression check (revert only the migration source, keep the test):
With the migration fix in place: 4/4 pass.
Closes #954
github run