Skip to content

Reinstate doc-level preload cache for WP text macros#23221

Open
akabiru wants to merge 1 commit into
feature/text-macro-semantic-id-renderingfrom
feature/text-macro-preload-cache
Open

Reinstate doc-level preload cache for WP text macros#23221
akabiru wants to merge 1 commit into
feature/text-macro-semantic-id-renderingfrom
feature/text-macro-preload-cache

Conversation

@akabiru
Copy link
Copy Markdown
Member

@akabiru akabiru commented May 15, 2026

Ticket

https://community.openproject.org/wp/74948

Stacked on #23203 (Please review that first)

What are you trying to accomplish?

Stored markdown runs through two filters in this order (formatter.rb):

  1. MentionFilter<mention data-id="1234" data-display-id="PROJ-7"> envelopes (autocomplete-picked).
  2. PatternMatcherFilter — bare text like #1234, #PROJ-7, ##1234.

In classic mode the matched id is the label, so link building skips the database entirely. Semantic mode always translates to the canonical formatted_id, regardless of the form stored in the body:

  • #1234 (primary key — the form most legacy content carries, since bulk content is not rewritten when an instance switches modes) → PROJ-7.
  • #OLD-7 (historical alias, left behind by a work package move or project rename) → PROJ-7.
  • #PROJ-7 (current semantic id) → PROJ-7.

All three reach the same renderer and all three need a lookup.

What the preload changes for the user

One thing:

  • Bare #1234 in semantic mode, with preload → <a>PROJ-7</a> (translated label).
  • Bare #1234 in semantic mode, without preload → <a>#1234</a> (label falls back to input).

Both link to the right work package. Only the visible label regresses, which is the bug this fixes.

What approach did you choose and why?

PatternMatcherFilter primes a per-document RequestStore lookup before walking matches. build_lookup issues WHERE id IN (…) OR EXISTS (…aliases…) against work_packages, plus a targeted WorkPackageSemanticAlias query when historical identifiers appear — two round-trips, bounded, regardless of match count. Classic mode short-circuits before any preload.

No truncation cap on the identifier set: silently dropping past the Nth match would render the (N+1)th reference as literal text, which is the regression this fixes. Postgres handles several-thousand-bind IN clauses without complaint.

The save/restore around nested format_text is there because custom-field formatters re-enter the pipeline mid-render; the outer document's lookup must not be clobbered by an inner render's preload.

Merge checklist

  • RSpec — work_packages_spec.rb, in_tool_links_spec.rb, mentions_spec.rb green
  • Rubocop clean
  • Manual round-trip on local dev (semantic + classic mode, legacy #N body, mixed #N / #PROJ-N body)

@akabiru akabiru self-assigned this May 15, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

Deploying openproject with PullPreview

Field Value
Latest commit ce1681e
Job deploy
Status ✅ Deploy successful
Preview URL https://pr-23221-text-macro-prelo-ip-46-225-145-139.my.opf.run:443

View logs

@akabiru akabiru marked this pull request as ready for review May 15, 2026 07:25
@akabiru akabiru requested review from a team, Copilot and thykel May 15, 2026 07:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reinstates a document-level preload cache for Work Package (#N / #PROJ-N) text macros so that, in semantic mode, numeric references can reliably render with the canonical semantic label (e.g., #1234PROJ-7) without falling back to the input label.

Changes:

  • Adds a per-document WorkPackage lookup in ResourceLinksMatcher (stored in RequestStore) and updates the WP link handler to use it instead of per-reference DB lookups.
  • Wraps PatternMatcherFilter processing in matcher-level preload hooks, with save/restore semantics to support nested format_text calls.
  • Introduces a shared PreformattedBlocks helper for consistent <pre>/<code> ancestry skipping, and updates/extends specs to cover query bounds and nested preload restoration.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
spec/lib/open_project/text_formatting/matchers/link_handlers/work_packages_spec.rb Updates query-cost expectations (1 WP SELECT per render) and adds a regression test for nested preload save/restore behavior.
lib/open_project/text_formatting/preformatted_blocks.rb Adds a reusable helper to detect <pre>/<code> ancestry without relying on HTML::Pipeline filter helpers.
lib/open_project/text_formatting/matchers/resource_links_matcher.rb Implements RequestStore-backed doc-level preload for WP identifiers and alias folding to avoid N+1 lookups.
lib/open_project/text_formatting/matchers/link_handlers/work_packages.rb Switches WP resolution to use the preloaded cache and keeps a non-querying fallback label in classic/no-preload paths.
lib/open_project/text_formatting/filters/pattern_matcher_filter.rb Wraps node processing with matcher preload hooks and reuses the shared preformatted-block skipping set.

In semantic mode, every `#N` reference in legacy content needs a WP
record to render the consistent `formatted_id` label users opted into.
The per-match `find_by_display_id` fallback that ships in #23203 scales
linearly with reference count: a wiki page with 50 refs runs 50 indexed
PK lookups per render, and API collection endpoints / mailer fan-out /
journal feeds compound the multiplier across records.

`PatternMatcherFilter` now primes a `RequestStore`-backed lookup once
per document via `ResourceLinksMatcher.with_preloaded_resources`. The
link handler reads from `work_package_for(identifier)` so the same path
serves numeric and semantic input. Cost is one batched `WorkPackage`
SELECT per render, plus an alias-table SELECT only when historical
identifiers are referenced. Classic mode short-circuits before any
preload — `formatted_id` collapses to the numeric form, so the matched
id alone is enough.

The save/restore around nested `format_text` is retained: custom-field
formatters re-enter the pipeline mid-render and must not clobber the
outer document's lookup.

The earlier `MAX_PRELOAD_IDENTIFIERS` cap is intentionally omitted.
Silent truncation past position N would render the (N+1)th reference
as literal text — a regression of the feature itself. Postgres handles
several-thousand-bind `IN` clauses comfortably; the right safety net,
if one is needed later, is log-and-continue, not truncate.

`PreformattedBlocks` is restored so the preload visitor and the
matcher's text-node walk share one `<pre>` / `<code>` skip.

Follow-up to #23203
@akabiru akabiru force-pushed the feature/text-macro-preload-cache branch from c406fc6 to ce1681e Compare May 15, 2026 07:46
@akabiru akabiru removed request for a team and thykel May 15, 2026 10:44
@akabiru akabiru marked this pull request as draft May 15, 2026 12:24
@akabiru akabiru marked this pull request as ready for review May 15, 2026 23:30
@akabiru akabiru requested review from a team and thykel May 15, 2026 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants