Skip to content

perf(core): dedupe taxonomy-def and posts-list fetches per request#840

Merged
ascorbic merged 4 commits intomainfrom
perf/taxonomy-and-posts-list-dedup
Apr 30, 2026
Merged

perf(core): dedupe taxonomy-def and posts-list fetches per request#840
ascorbic merged 4 commits intomainfrom
perf/taxonomy-and-posts-list-dedup

Conversation

@ascorbic
Copy link
Copy Markdown
Collaborator

What does this PR do?

Two intra-request duplicate-fetch collapses on the public read path. Both surfaced by the per-route query catalogue against fixtures/perf-site/. Combined: post-detail drops from 33 → 27 queries.

1. Taxonomy-def peek (packages/core/src/taxonomies/index.ts)

getTaxonomyDef(name) now consults the request-cached taxonomy-defs:all list before falling through to a WHERE name=? query. The catalogue showed three _emdash_taxonomy_defs queries on post-detail: one unfiltered scan plus two WHERE name=? lookups (one each for category and tag, fired by the Categories.astro and Tags.astro widgets via getTaxonomyTerms). The unfiltered scan already contains both rows; the singular path now reads from it.

Behaviour unchanged: when the all-list isn't in the cache, the singular query fires as before. When the all-list resolved without containing name, the peek returns null — same result a fresh WHERE name=? would produce.

2. Posts-list bucketing (packages/core/src/query.ts)

getEmDashCollection now buckets calls with limit < 10 and no cursor up to a shared BUCKET_LIMIT_THRESHOLD = 10, slicing in the helper to return exactly limit rows. The cache key uses the bucketed limit so the post-detail's three list calls (limit:4, limit:5, unlimited) share one fetch on the small-limit path. nextCursor is recomputed via encodeEntryCursor(sliced.at(-1)) to match the loader's encoding.

Each duplicate list call brings byline/taxonomy hydration with it, so collapsing the list query collapses the hydration cost too — that's why a single dedup yields multiple snapshot deltas.

Snapshot diff (sqlite, fixtures/perf-site)

Route Before After
GET /posts/building-for-the-long-term (cold) 33 27
GET /posts/building-for-the-long-term (warm) 33 27

All other routes unchanged. The d1 snapshot will be refreshed by the query-counts-apply workflow.

Adversarial review notes

A reviewer pass turned up two LOW-severity concerns worth surfacing for follow-up — neither is active on current code paths:

  1. Cursor reproduction fragility. encodeEntryCursor reads entry.data[field] and round-trips dates via new Date(...).toISOString(). Production writes always store ISO-8601 with Z, so the round-trip is identity. If a future write path ever used SQLite's (datetime('now')) default ("2026-04-29 21:35:41"), the round-trip would mint "2026-04-29T21:35:41.000Z" — lex-greater than stored, causing WHERE created_at < ? to re-include the cursor row. Latent.
  2. cacheHint.lastModified shifts later. The bucketed fetch computes max(updated_at) over 10 rows; a direct limit:4 would compute over 4. For non-time orderings, Astro.cache.set(cacheHint) becomes slightly less aggressive. No staleness — cache just expires sooner than it would have.

Verified clean: cache key includes locale/status/orderBy/where, hydration not doubled, limit:10 shares the cache with bucketed limit<10 (bonus), getEmDashEntry untouched.

A vitest pinning "bucket(10) sliced to 4 = direct limit:4 entries[] + re-paginatable nextCursor" would lock both concerns down — happy to add as follow-up if reviewers want it before merge.

Type of change

  • Bug fix
  • Feature (requires maintainer-approved Discussion)
  • Refactor (no behavior change)
  • Translation
  • Documentation
  • Performance improvement
  • Tests
  • Chore (dependencies, CI, tooling)

Checklist

  • I have read CONTRIBUTING.md
  • pnpm typecheck passes
  • pnpm lint passes
  • pnpm test passes (or targeted tests for my change)
  • pnpm format has been run
  • I have added/updated tests for my changes (if applicable)
  • User-visible strings in the admin UI are wrapped for translation (if applicable). Do not include messages.po changes except in translation PRs — a workflow extracts catalogs on merge to main.
  • I have added a changeset (if this PR changes a published package)
  • New features link to an approved Discussion: https://github.com/emdash-cms/emdash/discussions/...

AI-generated code disclosure

  • This PR includes AI-generated code

Two intra-request duplicate-fetch patterns showed up on the perf-fixture
post-detail render:

1. `_emdash_taxonomy_defs` was fetched three times per render — once
   unfiltered (during entry-term hydration) and twice with `WHERE name=?`
   (one per taxonomy widget). Have `getTaxonomyDef(name)` peek the
   already-cached full list before falling through to a narrower query.

2. `getEmDashCollection("posts", { limit: N })` was issued three times at
   different small limits (4 from the page body, 5 from RecentPosts, no
   limit from Archives) — each duplicate dragged byline + term hydration
   along with it. Bucket small limits up to a shared minimum (10) so
   sibling widgets at slightly different sizes share one fetch + slice
   to their requested size in the wrapper. Cursor-paginated calls are
   exempt; nextCursor is recomputed from the slice boundary.

Trade-off: bucketed fetches over-fetch up to ~10 entries (and the +1
over-fetch the loader already does) for callers who would otherwise have
asked for fewer. On the perf fixture this nets out to fewer queries
overall because the bucket is shared. For a page with a single small-
limit query and no sibling widgets the bucket fetches more rows than
strictly needed, but stays at the same query count.

Cuts the post-detail snapshot from 33 to 27 queries (sqlite target).
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 29, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
emdash-perf-coordinator e08ecdf Apr 29 2026, 10:32 PM

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 29, 2026

🦋 Changeset detected

Latest commit: e08ecdf

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 13 packages
Name Type
emdash Patch
@emdash-cms/cloudflare Patch
@emdash-cms/fixture-perf-site Patch
@emdash-cms/perf-demo-site Patch
@emdash-cms/cache-demo-site Patch
@emdash-cms/admin Patch
@emdash-cms/auth Patch
@emdash-cms/blocks Patch
@emdash-cms/gutenberg-to-portable-text Patch
@emdash-cms/x402 Patch
create-emdash Patch
@emdash-cms/auth-atproto Patch
@emdash-cms/plugin-embeds Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions Bot added the query-count changed PR diff modifies query-count snapshot files label Apr 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

Query-count snapshot changes

4 routes changed, total Δ -20 queries.

SQLite

Route Before After Δ
GET /posts/building-for-the-long-term (cold) 28 23 -5
GET /posts/building-for-the-long-term (warm) 28 23 -5

D1

Route Before After Δ
GET /posts/building-for-the-long-term (cold) 36 31 -5
GET /posts/building-for-the-long-term (warm) 27 22 -5

Comparing snapshot files between base and head. Updated automatically on each push.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 29, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
emdash-i18n e08ecdf Apr 29 2026, 10:33 PM

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 29, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
docs e08ecdf Apr 29 2026, 10:33 PM

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 29, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
emdash-demo-cache e08ecdf Apr 29 2026, 10:33 PM

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 29, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
emdash-playground e08ecdf Apr 29 2026, 10:33 PM

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 29, 2026

Open in StackBlitz

@emdash-cms/admin

npm i https://pkg.pr.new/@emdash-cms/admin@840

@emdash-cms/auth

npm i https://pkg.pr.new/@emdash-cms/auth@840

@emdash-cms/blocks

npm i https://pkg.pr.new/@emdash-cms/blocks@840

@emdash-cms/cloudflare

npm i https://pkg.pr.new/@emdash-cms/cloudflare@840

emdash

npm i https://pkg.pr.new/emdash@840

create-emdash

npm i https://pkg.pr.new/create-emdash@840

@emdash-cms/gutenberg-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/gutenberg-to-portable-text@840

@emdash-cms/x402

npm i https://pkg.pr.new/@emdash-cms/x402@840

@emdash-cms/plugin-ai-moderation

npm i https://pkg.pr.new/@emdash-cms/plugin-ai-moderation@840

@emdash-cms/plugin-atproto

npm i https://pkg.pr.new/@emdash-cms/plugin-atproto@840

@emdash-cms/plugin-audit-log

npm i https://pkg.pr.new/@emdash-cms/plugin-audit-log@840

@emdash-cms/plugin-color

npm i https://pkg.pr.new/@emdash-cms/plugin-color@840

@emdash-cms/plugin-embeds

npm i https://pkg.pr.new/@emdash-cms/plugin-embeds@840

@emdash-cms/plugin-forms

npm i https://pkg.pr.new/@emdash-cms/plugin-forms@840

@emdash-cms/plugin-webhook-notifier

npm i https://pkg.pr.new/@emdash-cms/plugin-webhook-notifier@840

commit: e08ecdf

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves public read-path performance by deduplicating intra-request fetches for taxonomy definitions and small post-list queries, reducing query counts on the perf fixture post-detail route.

Changes:

  • Reuses request-cached taxonomy-defs:all to satisfy getTaxonomyDef(name) without an extra _emdash_taxonomy_defs WHERE name=? query.
  • Buckets small getEmDashCollection limits (< 10, no cursor) to a shared limit and slices results back down per caller.
  • Updates SQLite query-count snapshot and adds a changeset documenting the perf improvement.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

File Description
scripts/query-counts.snapshot.sqlite.json Updates perf snapshot counts for the post-detail route (33 → 27).
packages/core/src/taxonomies/index.ts Adds request-cache peek to avoid redundant single-taxonomy definition queries.
packages/core/src/query.ts Adds limit bucketing + slicing and recomputes nextCursor for sliced results.
.changeset/perf-taxonomy-and-posts-list-dedup.md Documents the perf change as a patch changeset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/core/src/query.ts Outdated
Comment thread packages/core/src/query.ts
Comment thread packages/core/src/query.ts Outdated
Three fixes for the bucket-then-slice path in getEmDashCollection:

- encodeEntryCursor now skips invalid orderBy field names via the same
  FIELD_NAME_PATTERN check the loader's getPrimarySort uses, so a malformed
  first key falls back to created_at instead of producing a cursor against
  a phantom column.
- For date columns, encodeEntryCursor now reads the raw stored string from
  a hidden CURSOR_RAW_VALUES symbol the loader stashes on each entry,
  rather than round-tripping through new Date().toISOString(). The
  round-trip mints `.000Z` for ISO strings without milliseconds, which
  would lex-mismatch the stored value and re-include the cursor row on
  the next page.
- sliceCollectionResult docstring updated to match the actual behavior
  (early-return for entries already within the limit, shallow-copy only
  on truncation).

Adds tests pinning the bucket → slice → re-paginate round-trip, including
exact cursor equivalence with a direct loader call at the same limit.

Resolves Copilot review comments on PR #840.
@github-actions github-actions Bot added size/L and removed size/M labels Apr 29, 2026
@ascorbic ascorbic requested a review from Copilot April 29, 2026 22:32
@ascorbic
Copy link
Copy Markdown
Collaborator Author

/ultrareview

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 29, 2026

Now I'll think about whether this nit is even worth raising. The PR is well-instrumented, the author explicitly noted both adversarial concerns, included a test, and the CURSOR_RAW_VALUES mechanism actually addresses them. The JSON-column edge case is genuinely esoteric.

Honestly, after thorough investigation, I don't find any actionable bug. The PR is well-engineered. My review will be approval-only with no comments — the existing analysis (PR description's adversarial notes) covers everything important, and the Copilot reviewer's three concerns are all either addressed by the implementation or based on misreading.

LGTM!

github run

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/core/src/query.ts
Comment thread scripts/query-counts.snapshot.sqlite.json
Comment on lines +125 to +126
*/
export const CURSOR_RAW_VALUES: unique symbol = Symbol("emdash:cursorRawValues");
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CURSOR_RAW_VALUES is created with Symbol(...), so if the loader module is duplicated across Vite/Rollup SSR chunks, the symbol identity can differ between the code that defines the property and the code that reads it. That would make Reflect.get(data, CURSOR_RAW_VALUES) fail and reintroduce cursor-encoding drift. Consider using a global symbol (e.g. Symbol.for("emdash:cursorRawValues")) so all module instances share the same key.

Suggested change
*/
export const CURSOR_RAW_VALUES: unique symbol = Symbol("emdash:cursorRawValues");
*
* Use the global symbol registry so duplicated SSR/bundled module instances
* resolve the same property key.
*/
export const CURSOR_RAW_VALUES: symbol = Symbol.for("emdash:cursorRawValues");

Copilot uses AI. Check for mistakes.
Comment on lines +318 to +324
const limit = filter?.limit;
if (
limit === undefined ||
limit >= BUCKET_LIMIT_THRESHOLD ||
limit <= 0 ||
filter?.cursor !== undefined
) {
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bucketFilter treats any number < threshold as bucketable, but doesn’t guard against non-finite values (e.g. NaN, Infinity) or non-integers. With limit = NaN, bucketing is applied and requestedLimit becomes NaN, which will cause sliceCollectionResult to return an empty slice. Consider validating limit with Number.isFinite (and likely Number.isInteger) before applying bucketing, and otherwise passing the filter through unchanged.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

Overlapping PRs

This PR modifies files that are also changed by other open PRs:

This may cause merge conflicts or duplicated work. A maintainer will coordinate.

@ascorbic
Copy link
Copy Markdown
Collaborator Author

/review

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

APIError: Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.

github run

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

@ascorbic Bonk workflow failed. Check the logs for details.

View workflow run · To retry, trigger Bonk again.

@ascorbic
Copy link
Copy Markdown
Collaborator Author

/review

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

APIError: Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.

github run

@ascorbic
Copy link
Copy Markdown
Collaborator Author

/review

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

UnknownError: ProviderInitError

github run

@ascorbic
Copy link
Copy Markdown
Collaborator Author

/review

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

APIError: Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.

github run

@ascorbic
Copy link
Copy Markdown
Collaborator Author

/review gemini

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

APIError: Bad Request: {"success":false,"result":[],"messages":[],"error":[{"code":2008,"message":"Invalid provider"}]}

github run

@ascorbic
Copy link
Copy Markdown
Collaborator Author

/review gemini

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

APIError: Bad Request: {"success":false,"result":[],"messages":[],"error":[{"code":2008,"message":"Invalid provider"}]}

github run

@ascorbic
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two intra-request dedups, both well-targeted at hot widget patterns; the CURSOR_RAW_VALUES raw-string stash already addresses the date-precision concern that would otherwise have made the bucketed nextCursor diverge from a direct loader cursor. Tests pin the bucket→slice equivalence end-to-end. One latent concern below.

* ISO-with-milliseconds (e.g. `2026-01-01T00:00:00Z` becomes
* `2026-01-01T00:00:00.000Z`).
*/
export const CURSOR_RAW_VALUES: unique symbol = Symbol("emdash:cursorRawValues");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOW — latent. CURSOR_RAW_VALUES is created with Symbol("emdash:cursorRawValues"), but it's exported from loader.ts and read from query.ts (Reflect.get(data, CURSOR_RAW_VALUES) in encodeEntryCursor). AGENTS.md explicitly calls out the Vite SSR chunk-duplication footgun for cross-module shared keys, and request-context.ts already uses Symbol.for("emdash:request-context") for the same reason. If loader.ts ends up inlined into more than one SSR chunk, the symbol identity that mapRowToData writes with can differ from the one encodeEntryCursor reads with — Reflect.get would return undefined, the encoder would silently fall back to the date-key path on data, and the T00:00:00ZT00:00:00.000Z divergence this PR was designed to prevent comes back. Not currently observed (the in-process tests can't reproduce it), but cheap to harden — one-line change to Symbol.for("emdash:cursorRawValues").

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented Apr 30, 2026

Review posted with one line-anchored LOW-severity comment on packages/core/src/loader.ts:126.

The PR is well-engineered: the author's adversarial notes pre-empted the date-precision and field-validation concerns Copilot raised, and the CURSOR_RAW_VALUES mechanism + tests address them. The single concern I raised is the Symbol(...) vs Symbol.for(...) issue — directly contradicts an AGENTS.md performance rule about cross-module shared keys getting duplicated by Vite SSR chunking, and the codebase already uses Symbol.for in exactly this scenario (request-context.ts). One-line fix, latent (not currently observed), so flagged as LOW.

Other Copilot concerns (NaN guard in bucketFilter, JSON-parsed string in non-date sort field, docstring shallow-copy nit, snapshot number mismatch with PR description) are either truly esoteric edge cases or cosmetic — not worth surfacing.

github run

@ascorbic ascorbic merged commit 64bf5b9 into main Apr 30, 2026
62 checks passed
@ascorbic ascorbic deleted the perf/taxonomy-and-posts-list-dedup branch April 30, 2026 10:02
@emdashbot emdashbot Bot mentioned this pull request Apr 30, 2026
ascorbic added a commit that referenced this pull request Apr 30, 2026
Adds the query-dump harness and analysis scripts that produced the
catalogue used to drive PRs #838, #839, #840:

- scripts/query-counts-dump.mjs — sibling of query-counts.mjs that
  emits per-route × phase NDJSON dumps under scripts/query-dumps/.
- scripts/build-perf-d1.mjs — standalone "build the perf fixture for
  d1" wrapper, useful when iterating with the dump harness without
  re-running the full counts harness.
- scripts/query-dumps/{classify,cold-only,inspect-other}.mjs — analysis
  helpers for slicing the dumps. classify.mjs writes a markdown
  classification report; cold-only.mjs surfaces the d1 cold-isolate tax;
  inspect-other.mjs prints distinct SQL for a class.
- scripts/query-dumps/README.md — workflow doc.

The dump JSON itself and the generated classification.{target}.md
reports stay gitignored — they're analysis artefacts that regenerate
from the harness in seconds. The query-dumps analysis scripts are
excluded from oxlint (one-off tooling, not production code).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants