perf(core): dedupe taxonomy-def and posts-list fetches per request#840
perf(core): dedupe taxonomy-def and posts-list fetches per request#840
Conversation
Two intra-request duplicate-fetch patterns showed up on the perf-fixture
post-detail render:
1. `_emdash_taxonomy_defs` was fetched three times per render — once
unfiltered (during entry-term hydration) and twice with `WHERE name=?`
(one per taxonomy widget). Have `getTaxonomyDef(name)` peek the
already-cached full list before falling through to a narrower query.
2. `getEmDashCollection("posts", { limit: N })` was issued three times at
different small limits (4 from the page body, 5 from RecentPosts, no
limit from Archives) — each duplicate dragged byline + term hydration
along with it. Bucket small limits up to a shared minimum (10) so
sibling widgets at slightly different sizes share one fetch + slice
to their requested size in the wrapper. Cursor-paginated calls are
exempt; nextCursor is recomputed from the slice boundary.
Trade-off: bucketed fetches over-fetch up to ~10 entries (and the +1
over-fetch the loader already does) for callers who would otherwise have
asked for fewer. On the perf fixture this nets out to fewer queries
overall because the bucket is shared. For a page with a single small-
limit query and no sibling widgets the bucket fetches more rows than
strictly needed, but stays at the same query count.
Cuts the post-detail snapshot from 33 to 27 queries (sqlite target).
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-perf-coordinator | e08ecdf | Apr 29 2026, 10:32 PM |
🦋 Changeset detectedLatest commit: e08ecdf The changes in this PR will be included in the next version bump. This PR includes changesets to release 13 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Query-count snapshot changes4 routes changed, total Δ -20 queries. SQLite
D1
Comparing snapshot files between base and head. Updated automatically on each push. |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-i18n | e08ecdf | Apr 29 2026, 10:33 PM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | e08ecdf | Apr 29 2026, 10:33 PM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-demo-cache | e08ecdf | Apr 29 2026, 10:33 PM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-playground | e08ecdf | Apr 29 2026, 10:33 PM |
@emdash-cms/admin
@emdash-cms/auth
@emdash-cms/blocks
@emdash-cms/cloudflare
emdash
create-emdash
@emdash-cms/gutenberg-to-portable-text
@emdash-cms/x402
@emdash-cms/plugin-ai-moderation
@emdash-cms/plugin-atproto
@emdash-cms/plugin-audit-log
@emdash-cms/plugin-color
@emdash-cms/plugin-embeds
@emdash-cms/plugin-forms
@emdash-cms/plugin-webhook-notifier
commit: |
There was a problem hiding this comment.
Pull request overview
Improves public read-path performance by deduplicating intra-request fetches for taxonomy definitions and small post-list queries, reducing query counts on the perf fixture post-detail route.
Changes:
- Reuses request-cached
taxonomy-defs:allto satisfygetTaxonomyDef(name)without an extra_emdash_taxonomy_defs WHERE name=?query. - Buckets small
getEmDashCollectionlimits (< 10, no cursor) to a shared limit and slices results back down per caller. - Updates SQLite query-count snapshot and adds a changeset documenting the perf improvement.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| scripts/query-counts.snapshot.sqlite.json | Updates perf snapshot counts for the post-detail route (33 → 27). |
| packages/core/src/taxonomies/index.ts | Adds request-cache peek to avoid redundant single-taxonomy definition queries. |
| packages/core/src/query.ts | Adds limit bucketing + slicing and recomputes nextCursor for sliced results. |
| .changeset/perf-taxonomy-and-posts-list-dedup.md | Documents the perf change as a patch changeset. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Three fixes for the bucket-then-slice path in getEmDashCollection: - encodeEntryCursor now skips invalid orderBy field names via the same FIELD_NAME_PATTERN check the loader's getPrimarySort uses, so a malformed first key falls back to created_at instead of producing a cursor against a phantom column. - For date columns, encodeEntryCursor now reads the raw stored string from a hidden CURSOR_RAW_VALUES symbol the loader stashes on each entry, rather than round-tripping through new Date().toISOString(). The round-trip mints `.000Z` for ISO strings without milliseconds, which would lex-mismatch the stored value and re-include the cursor row on the next page. - sliceCollectionResult docstring updated to match the actual behavior (early-return for entries already within the limit, shallow-copy only on truncation). Adds tests pinning the bucket → slice → re-paginate round-trip, including exact cursor equivalence with a direct loader call at the same limit. Resolves Copilot review comments on PR #840.
|
/ultrareview |
|
Now I'll think about whether this nit is even worth raising. The PR is well-instrumented, the author explicitly noted both adversarial concerns, included a test, and the Honestly, after thorough investigation, I don't find any actionable bug. The PR is well-engineered. My review will be approval-only with no comments — the existing analysis (PR description's adversarial notes) covers everything important, and the Copilot reviewer's three concerns are all either addressed by the implementation or based on misreading. LGTM! |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| */ | ||
| export const CURSOR_RAW_VALUES: unique symbol = Symbol("emdash:cursorRawValues"); |
There was a problem hiding this comment.
CURSOR_RAW_VALUES is created with Symbol(...), so if the loader module is duplicated across Vite/Rollup SSR chunks, the symbol identity can differ between the code that defines the property and the code that reads it. That would make Reflect.get(data, CURSOR_RAW_VALUES) fail and reintroduce cursor-encoding drift. Consider using a global symbol (e.g. Symbol.for("emdash:cursorRawValues")) so all module instances share the same key.
| */ | |
| export const CURSOR_RAW_VALUES: unique symbol = Symbol("emdash:cursorRawValues"); | |
| * | |
| * Use the global symbol registry so duplicated SSR/bundled module instances | |
| * resolve the same property key. | |
| */ | |
| export const CURSOR_RAW_VALUES: symbol = Symbol.for("emdash:cursorRawValues"); |
| const limit = filter?.limit; | ||
| if ( | ||
| limit === undefined || | ||
| limit >= BUCKET_LIMIT_THRESHOLD || | ||
| limit <= 0 || | ||
| filter?.cursor !== undefined | ||
| ) { |
There was a problem hiding this comment.
bucketFilter treats any number < threshold as bucketable, but doesn’t guard against non-finite values (e.g. NaN, Infinity) or non-integers. With limit = NaN, bucketing is applied and requestedLimit becomes NaN, which will cause sliceCollectionResult to return an empty slice. Consider validating limit with Number.isFinite (and likely Number.isInteger) before applying bucketing, and otherwise passing the filter through unchanged.
Overlapping PRsThis PR modifies files that are also changed by other open PRs:
This may cause merge conflicts or duplicated work. A maintainer will coordinate. |
|
/review |
|
APIError: Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead. |
|
@ascorbic Bonk workflow failed. Check the logs for details. View workflow run · To retry, trigger Bonk again. |
|
/review |
|
APIError: Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead. |
|
/review |
|
UnknownError: ProviderInitError |
|
/review |
|
APIError: Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead. |
|
/review gemini |
|
APIError: Bad Request: {"success":false,"result":[],"messages":[],"error":[{"code":2008,"message":"Invalid provider"}]} |
|
/review gemini |
|
APIError: Bad Request: {"success":false,"result":[],"messages":[],"error":[{"code":2008,"message":"Invalid provider"}]} |
|
/review |
There was a problem hiding this comment.
Two intra-request dedups, both well-targeted at hot widget patterns; the CURSOR_RAW_VALUES raw-string stash already addresses the date-precision concern that would otherwise have made the bucketed nextCursor diverge from a direct loader cursor. Tests pin the bucket→slice equivalence end-to-end. One latent concern below.
| * ISO-with-milliseconds (e.g. `2026-01-01T00:00:00Z` becomes | ||
| * `2026-01-01T00:00:00.000Z`). | ||
| */ | ||
| export const CURSOR_RAW_VALUES: unique symbol = Symbol("emdash:cursorRawValues"); |
There was a problem hiding this comment.
LOW — latent. CURSOR_RAW_VALUES is created with Symbol("emdash:cursorRawValues"), but it's exported from loader.ts and read from query.ts (Reflect.get(data, CURSOR_RAW_VALUES) in encodeEntryCursor). AGENTS.md explicitly calls out the Vite SSR chunk-duplication footgun for cross-module shared keys, and request-context.ts already uses Symbol.for("emdash:request-context") for the same reason. If loader.ts ends up inlined into more than one SSR chunk, the symbol identity that mapRowToData writes with can differ from the one encodeEntryCursor reads with — Reflect.get would return undefined, the encoder would silently fall back to the date-key path on data, and the T00:00:00Z ↔ T00:00:00.000Z divergence this PR was designed to prevent comes back. Not currently observed (the in-process tests can't reproduce it), but cheap to harden — one-line change to Symbol.for("emdash:cursorRawValues").
|
Review posted with one line-anchored LOW-severity comment on The PR is well-engineered: the author's adversarial notes pre-empted the date-precision and field-validation concerns Copilot raised, and the Other Copilot concerns (NaN guard in |
Adds the query-dump harness and analysis scripts that produced the catalogue used to drive PRs #838, #839, #840: - scripts/query-counts-dump.mjs — sibling of query-counts.mjs that emits per-route × phase NDJSON dumps under scripts/query-dumps/. - scripts/build-perf-d1.mjs — standalone "build the perf fixture for d1" wrapper, useful when iterating with the dump harness without re-running the full counts harness. - scripts/query-dumps/{classify,cold-only,inspect-other}.mjs — analysis helpers for slicing the dumps. classify.mjs writes a markdown classification report; cold-only.mjs surfaces the d1 cold-isolate tax; inspect-other.mjs prints distinct SQL for a class. - scripts/query-dumps/README.md — workflow doc. The dump JSON itself and the generated classification.{target}.md reports stay gitignored — they're analysis artefacts that regenerate from the harness in seconds. The query-dumps analysis scripts are excluded from oxlint (one-off tooling, not production code).
What does this PR do?
Two intra-request duplicate-fetch collapses on the public read path. Both surfaced by the per-route query catalogue against
fixtures/perf-site/. Combined: post-detail drops from 33 → 27 queries.1. Taxonomy-def peek (
packages/core/src/taxonomies/index.ts)getTaxonomyDef(name)now consults the request-cachedtaxonomy-defs:alllist before falling through to aWHERE name=?query. The catalogue showed three_emdash_taxonomy_defsqueries on post-detail: one unfiltered scan plus twoWHERE name=?lookups (one each forcategoryandtag, fired by theCategories.astroandTags.astrowidgets viagetTaxonomyTerms). The unfiltered scan already contains both rows; the singular path now reads from it.Behaviour unchanged: when the all-list isn't in the cache, the singular query fires as before. When the all-list resolved without containing
name, the peek returnsnull— same result a freshWHERE name=?would produce.2. Posts-list bucketing (
packages/core/src/query.ts)getEmDashCollectionnow buckets calls withlimit < 10and no cursor up to a sharedBUCKET_LIMIT_THRESHOLD = 10, slicing in the helper to return exactlylimitrows. The cache key uses the bucketed limit so the post-detail's three list calls (limit:4, limit:5, unlimited) share one fetch on the small-limit path.nextCursoris recomputed viaencodeEntryCursor(sliced.at(-1))to match the loader's encoding.Each duplicate list call brings byline/taxonomy hydration with it, so collapsing the list query collapses the hydration cost too — that's why a single dedup yields multiple snapshot deltas.
Snapshot diff (sqlite, fixtures/perf-site)
All other routes unchanged. The d1 snapshot will be refreshed by the
query-counts-applyworkflow.Adversarial review notes
A reviewer pass turned up two LOW-severity concerns worth surfacing for follow-up — neither is active on current code paths:
encodeEntryCursorreadsentry.data[field]and round-trips dates vianew Date(...).toISOString(). Production writes always store ISO-8601 withZ, so the round-trip is identity. If a future write path ever used SQLite's(datetime('now'))default ("2026-04-29 21:35:41"), the round-trip would mint"2026-04-29T21:35:41.000Z"— lex-greater than stored, causingWHERE created_at < ?to re-include the cursor row. Latent.cacheHint.lastModifiedshifts later. The bucketed fetch computesmax(updated_at)over 10 rows; a directlimit:4would compute over 4. For non-time orderings,Astro.cache.set(cacheHint)becomes slightly less aggressive. No staleness — cache just expires sooner than it would have.Verified clean: cache key includes locale/status/orderBy/where, hydration not doubled,
limit:10shares the cache with bucketedlimit<10(bonus),getEmDashEntryuntouched.A vitest pinning "bucket(10) sliced to 4 = direct limit:4 entries[] + re-paginatable nextCursor" would lock both concerns down — happy to add as follow-up if reviewers want it before merge.
Type of change
Checklist
pnpm typecheckpassespnpm lintpassespnpm testpasses (or targeted tests for my change)pnpm formathas been runmessages.pochanges except in translation PRs — a workflow extracts catalogs on merge tomain.AI-generated code disclosure