Accept semantic work-package identifiers in text macros#23203
Open
akabiru wants to merge 20 commits into
Open
Conversation
ec22866 to
9cb7df8
Compare
5 tasks
f83491a to
c365ceb
Compare
76dc2e2 to
5feb9d3
Compare
3 tasks
In semantic mode, plain `#N` references inside formatted text rendered `<a href="/work_packages/N">#N</a>`. The link should carry the human-readable identifier on both the label and the href, matching how the `##N` quickinfo macro already renders via Angular. Two pieces: 1. `PatternMatcherFilter` gains an opt-in pre/cleanup hook around the per-node loop. Matchers that own per-render lookup caches (e.g. a batched WP load) implement `preload_for_doc` and `cleanup_after_doc` to populate and drop them. 2. `ResourceLinksMatcher` implements those hooks: it scans every text node for `#N` matches, runs a single batched `WorkPackage.where(id: ids)`, and exposes the result via a thread-isolated class attribute. The `WorkPackages` link handler reads from it to choose `wp.formatted_id` for the label and `wp.display_id` for the href. Falls back to the legacy `#N` shape when the WP isn't loadable (deleted, out of scope, or no preload ran). Visibility filtering is intentionally not introduced — the matcher links regardless of viewer permissions on the referenced WP, preserving pre-existing behaviour. Out of scope for this ticket. Refs https://community.openproject.org/work_packages/74315
Three changes from the rails-code-reviewer agent + the GitHub Copilot
reviewer:
1. Switch from `thread_mattr_accessor` to `RequestStore.store` for the
per-render WorkPackage lookup. Aligns with the existing `Cache`,
`Setting`, `CustomStyle`, and `WorkPackage#available_custom_field_key`
conventions in the codebase. Cleanup-on-request-end via
`RequestStore::Middleware` is defense in depth — the filter's explicit
save/restore is what actually scopes the cache to a single render.
2. Save/restore around the per-node loop, exposed via a single yielding
API on the matcher: `with_preloaded_resources(doc, context) { … }`.
The previous lookup is captured on entry and restored on exit, so a
nested `format_text` (custom-field formatter, recursive markdown
render, etc.) cannot clobber the outer render's lookup. The flat
`preload_for_doc`/`cleanup_after_doc` pair is replaced; the filter
recursively wraps the loop in each matcher's hook.
3. Wrap the lookup behind two class methods — `work_package_for(id)` for
the link handler, `with_preloaded_resources` for the filter — so the
RequestStore key stays a private constant. Future storage swaps don't
ripple to callers.
Plus the GitHub Copilot review's two findings:
- Skip the preload entirely in classic mode (`semantic_mode_active?`
guard). `display_id` and `formatted_id` collapse to the numeric form,
so the link handler renders the legacy shape from `wp_id` alone — no
DB load required, restoring pre-PR query-free behaviour. New spec
"classic mode is query-free" is the regression guard.
- `extract_work_package_id` now requires `prefix.nil?` so prefixed
matches like `version#3` and `message#12` don't contribute to the
preload set, mirroring `LinkHandlers::WorkPackages#applicable?`.
New specs:
- save/restore semantics: nested `with_preloaded_resources` calls
preserve the outer lookup and clear after the outer block exits.
- classic mode: `format_text("#1 #2 #3")` issues zero `work_packages`
SELECTs.
Refs #22976 (review)
Address follow-up polish on PR 22976 review: - Extract `OpenProject::TextFormatting::PreformattedBlocks` (BLOCKS set + ancestor? helper) so PatternMatcherFilter and ResourceLinksMatcher share a single source of truth for the `<pre>`/`<code>` ancestry skip. - Lift `parse_match(match)` so `process_match` and `extract_work_package_id` consume the same regex group → semantic name mapping. - `with_preloaded_resources` captures `previous` unconditionally so the `ensure` block no longer needs a `defined?(previous)` guard. - Preload `WorkPackage.where(id: ids).select(:id, :identifier)` only — `display_id`/`formatted_id` don't read other columns. - N+1 spec switches from `have_a_query_limit(1)` to a SQL-notification subscriber filtered to `FROM "work_packages"` SELECTs, avoiding false positives from incidental Setting/User/Project queries.
`#1234` text macros render `formatted_id`/`display_id` already, but the *input* side still requires the numeric primary key. Authors typing or pasting `#PROJ-1` (or `##PROJ-1` / `###PROJ-1`) in a comment, WP description, or meeting body would see literal text rather than a resolved link. Three changes that move together: 1. `ResourceLinksMatcher.regexp` — the hash-separator branch now accepts either the numeric shape `\d+` or the semantic shape `[A-Z][A-Z0-9_]*-\d+`, mirroring `WorkPackage::SemanticIdentifier:: ID_ROUTE_CONSTRAINT`. The revision branch (`r\d+`) stays numeric-only via a separate alternation. `parse_match` is the single site that maps the new regex group indices to semantic field names; everything else flows from there. 2. `LinkHandlers::WorkPackages#call` — splits into a numeric path (preserving the leading-zero rejection from before) and a semantic path. Semantic-shape input only links when `semantic_mode_active?` is true; classic instances render literal text. Plain `#PROJ-N` requires a cache hit (literal-text fallback when missing); `##PROJ-N` / `###PROJ-N` quickinfo elements emit unconditionally with `data-id` set to the user-facing identifier — APIv3 already resolves either shape, and the frontend Angular component handles missing WPs. Hover-card URLs now also speak `display_id` so the URL matches the user-facing identifier (the route accepts both shapes — see `HoverCardComponent#initialize`). 3. The preload cache extends to string keys via the `WorkPackage.where_display_id_in` batch finder added in #23016. `with_preloaded_resources` runs one WP SELECT for the common case (numerics + current semantic identifiers); historical alias references add a second targeted alias-table pluck, so an alias-heavy doc costs at most two round-trips per render. Specs cover: `#PROJ-N` resolves with formatted_id / display_id href in semantic mode; classic mode leaves it as literal text and issues zero WP SELECTs; `##/###` quickinfo carries `display_id` in `data-id`; mixed numeric+semantic resolves in 1 SELECT; alias references resolve in 2 round-trips; `#GHOST-99` falls through cleanly; nested `format_text` calls preserve outer save/restore semantics.
`Macros::Links` extends `ResourceLinksMatcher`, so the regex broadening in the previous commit also matches `#PROJ-1` shapes inside markdown that's about to be exported as PDF. Without an explicit guard, the PDF custom handler would silently emit `<mention data-id="0">` (since `"PROJ-1".to_i == 0`) — broken-looking output that's hard to attribute. Tighten `WorkPackagesLinkHandler#applicable?` to reject semantic-shape input. Override `call` so it doesn't fall through the parent's cache-driven path (PDF rendering walks Markly nodes via a separate pipeline that doesn't populate the per-render cache). Cite WP #74366 in the comment as the follow-up that adds semantic-id support to the PDF side via the Markly walk in `app/models/exports/pdf/common/macro.rb`. New specs assert `#PROJ-1` falls through to literal text in PDF output and never produces a `data-id="0"` mention, both alone and mixed with numeric references.
The regex now declares each capture group by name, so `parse_match` reads as a flat name-to-name map rather than "group 7 || 9 || 11" arithmetic. Adding or removing a branch later doesn't ripple through group numbers — historically that was the bug-window in this method. Also bring the class-level docstring up to date with `##N`/`###N` quickinfo macros (which were missing from the docs entirely) and the new `#PROJ-N` semantic-id forms.
Comments shouldn't read as commit-history narration or as reviewer shorthand. Each one now describes the present invariant in plain words: - Drop "Pre-PR-E behaviour" / "matching pre-PR behaviour" / "matches the pre-PR" — the spec and inline comments now state the current rule. - Drop "load-bearing", "already in this codebase", "we fall back" — say what the code does, not how a reviewer would describe it. - Reword the `semantic_id?` rationale to a forward-looking constraint (`don't tighten it`) instead of a comparative description of why this module owns it.
The five SELECT-counting examples each inlined the same `ActiveSupport::Notifications.subscribed` + CACHE/SCHEMA filter that `ActiveRecord::QueryRecorder` already provides via `spec/support/matchers/has_a_query_limit.rb`. Switch to the existing recorder and inline the table filters at the call sites — there are only two filter shapes used a handful of times, so a separate helper isn't justified. The user/role/author boilerplate moves into a `with author signed in` shared context so each describe defines only its own `project` and any work packages it needs. Failure messages on the size assertions are preserved.
The `value == value.to_i.to_s` round-trip check that filters leading-
zero ID forms ("0123") was duplicated across the WP link handler, the
PDF export macro, and the cost-query filter, and the matching guard in
`ResourceLinksMatcher#extract_work_package_identifier` was a no-op
(the regex's `\d+|[A-Z][A-Z0-9_]*-\d+` branch already gates the shape).
A new `WorkPackage::SemanticIdentifier.numeric_id?(value)` predicate
captures the canonical-numeric check at one site. It pairs with
`semantic_id?` as the WP-finder shape gate; the two answer different
questions (shape vs routing) and so are kept independent rather than
expressed as one another's negation.
The redundant `extract_work_package_identifier` guard is dropped along
with its misleading "rejecting leading-zero forms" comment — rejection
actually happens downstream in the link handler, where the new
predicate now reads as the contract it always was.
The matcher regex now accepts both numeric and semantic identifier shapes (`\d+|PROJ-\d+`) on a single alternation branch shared by the bare WP handler and the prefixed-resource handler. Without this guard, `version#PROJ-1` reaches HashSeparator with `oid = "PROJ-1".to_i = 0`, which is `present?`, so it issues `Version.find_by(id: 0)` — a guaranteed miss against the wrong table. The same shape applies to every prefix HashSeparator handles (document, message, meeting, project, user, group, view), turning a single pasted comment into one wasted SELECT per prefix. Gating `applicable?` on `numeric_id?(matcher.identifier)` short-circuits the handler before any DB lookup; the regex still matches and the text falls through to literal output. Regression test asserts zero `FROM "versions"` SELECTs for `version#PROJ-1`.
Cuts comments that restate code mechanics, narrate the journey, or pile on implementation detail. Genuine WHYs (perf rationale of the to_i.to_s round-trip, the alias second-query reason, leading-zero rejection, classic-mode preload skip) all stayed.
Replace duplicated `%w(# ## ###).include?(matcher.sep)` and `["##", "###"].include?(matcher.sep)` checks with named predicates on the WorkPackages link handler: HASH_TRIGGERS for the `#`/`##`/`###` family, hash_trigger? to gate the handler, and quickinfo?/detailed? to distinguish the three macro shapes. The PDF-export subclass picks both up via inheritance. "Trigger" replaces "separator" in the WP context: in `#1234` the `#` is a sigil that triggers mention recognition, not something that separates parts of the reference. The matcher's generic `sep` vocabulary stays — it still fits the `version#3` and `version:1.0.0` cases that other handlers own.
The doc-level preload that backs the macro pipeline reads identifiers straight out of user-pasted CKEditor content. Without a bound, a single multi-megabyte comment could push thousands of values into one `WHERE id IN (...)` and an alias `WHERE identifier IN (...)`, both built in memory and shipped in one query. `MAX_PRELOAD_IDENTIFIERS = 500` caps the per-render Set; references past the cap render via the link handler's cache-miss fallback (numeric → bare `#N` link, semantic → literal text). Spec stubs the constant to 2 and asserts the WP IN-list never exceeds it.
`lookup` is already keyed by `wp.id.to_s`, so the second index built inside `fold_in_alias_keys` was redundant. The alias-lookup branch can read straight from `lookup` using the plucked work_package_id.
S2's splat refactor of where_display_id_in (c24e3cf) accepts a flat list of values via `*values; values.flatten(1)`. Ruby's Array#flatten doesn't unroll non-Array Enumerables — passing a Set as a single arg leaves it boxed inside the values Array and the subsequent map(&:to_s) stringifies the whole Set as one entry. Splat at the call site unrolls the Set into varargs, restoring the batched-lookup behaviour.
The 500-identifier cap on doc-level work-package preload was defensive against a workload that has never been observed: a single rendered body containing 500+ distinct `#N` references. Work package bodies are unbounded `text`, but real comments and descriptions sit nowhere near that scale, and PostgreSQL handles IN-lists with thousands of bind parameters comfortably (the protocol ceiling is ~32k). The cost of the cap was not theoretical: once the limit was reached, references past it silently fell through to the cache-miss path and rendered as literal numeric ids — exactly the semantic-link breakage this PR exists to fix. A feature that ships "render semantic display ids in formatted text" cannot also ship "but only the first 500 per document." The pre-PR baseline had no preload at all (one SELECT per matched reference), so removing the cap is strictly an improvement over the shipped behaviour and a return to the pre-PR contract for pathological inputs. If a real workload ever surfaces, the right answer is render-path caching keyed on the rendered body, not a lossy truncation that quietly hides links from the reader.
`m` was below the three-character minimum. `matchdata` matches the base `RegexMatcher#process_match` signature, keeping the override consistent.
66da22d to
cbeaeca
Compare
The doc-level RequestStore-backed preload in PatternMatcherFilter + ResourceLinksMatcher batched all `#N` references into a single WorkPackage SELECT (with a sidecar WorkPackageSemanticAlias pluck for historical aliases). It is removed; the WP link handler now resolves each reference inline via WorkPackage.find_by_display_id, which already encapsulates "numeric PK or current identifier or historical alias", returning nil on miss so the existing literal-text fallback still kicks in for unresolved refs. Why drop it: the cache only changed user-visible output for one case (bare `#1234` plain links in semantic mode, where the formatted_id label needs the WP row). Quickinfo widgets fetch formatted_id from the API at render time; PDF export and the editor mention path bypass this filter entirely; autocomplete-stored mentions take a different code path. Realistic doc shape is <10 refs per render, so per-match find_by_display_id is fine and the document walk + IN-list batch + RequestStore save/restore + alias-key fold-in is complexity without a matching workload. Tradeoffs: - Each `#N` / `#PROJ-N` reference now issues its own SELECT, vs one batched SELECT before. - Historical-alias resolution loses its sidecar pluck; find_by_display_id performs `identifier = ? OR EXISTS(alias subquery)` in a single SELECT. - Classic mode `#N` rendering goes from zero queries (cache opted out) to one query per reference. Spec adjustments: - Drop the with_preloaded_resources nesting test (mechanism is gone). - Drop the classic-mode "query-free" test (no longer holds — this is the most concrete cost of the spike). - Update N+1 / mixed-refs tests to assert one SELECT per reference. - Update the historical-alias test to assert one WP SELECT, no alias-only SELECT.
The PreformattedBlocks extraction served two consumers (PatternMatcherFilter and ResourceLinksMatcher) before the cache spike; ResourceLinksMatcher's text-node walk is gone, so PatternMatcherFilter is the only caller. Restore the inline PREFORMATTED_BLOCKS constant matching dev's shape and remove the module.
39b0d7d to
b7396da
Compare
Deploying openproject with ⚡ PullPreview
|
3 tasks
akabiru
commented
May 15, 2026
| expect(wp_selects.size).to eq(5), | ||
| "expected one SELECT per reference, got #{wp_selects.size}:\n#{wp_selects.join("\n")}" | ||
| end | ||
| end |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends text macro handling so semantic work-package identifiers can be recognized in the markdown link pipeline while keeping PDF export numeric-only until separate semantic PDF support lands.
Changes:
- Broadens resource-link matching to accept semantic work-package identifier shapes on
#references. - Updates work-package link handling to resolve semantic identifiers and render quickinfo/link output consistently.
- Adds specs for semantic/classic behavior and PDF fallback behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
lib/open_project/text_formatting/matchers/resource_links_matcher.rb |
Updates macro regex parsing to support semantic work-package IDs. |
lib/open_project/text_formatting/matchers/link_handlers/work_packages.rb |
Adds semantic/numeric resolution paths for work-package references. |
lib/open_project/text_formatting/matchers/link_handlers/hash_separator.rb |
Adjusts prefixed hash-link applicability to reject non-numeric identifiers. |
app/models/work_package/exports/macros/links.rb |
Keeps PDF work-package macro handling constrained to numeric IDs. |
spec/lib/open_project/text_formatting/matchers/link_handlers/work_packages_spec.rb |
Adds coverage for semantic/classic work-package macro behavior. |
spec/models/exports/pdf/common/macro_spec.rb |
Adds coverage ensuring semantic-shaped PDF references stay literal. |
akabiru
added a commit
that referenced
this pull request
May 15, 2026
In semantic mode, every `#N` reference in legacy content needs a WP record to render the consistent `formatted_id` label users opted into. The per-match `find_by_display_id` fallback that ships in #23203 scales linearly with reference count: a wiki page with 50 refs runs 50 indexed PK lookups per render, and API collection endpoints / mailer fan-out / journal feeds compound the multiplier across records. `PatternMatcherFilter` now primes a `RequestStore`-backed lookup once per document via `ResourceLinksMatcher.with_preloaded_resources`. The link handler reads from `work_package_for(identifier)` so the same path serves numeric and semantic input. Cost is one batched `WorkPackage` SELECT per render, plus an alias-table SELECT only when historical identifiers are referenced. Classic mode short-circuits before any preload — `formatted_id` collapses to the numeric form, so the matched id alone is enough. The save/restore around nested `format_text` is retained: custom-field formatters re-enter the pipeline mid-render and must not clobber the outer document's lookup. The earlier `MAX_PRELOAD_IDENTIFIERS` cap is intentionally omitted. Silent truncation past position N would render the (N+1)th reference as literal text — a regression of the feature itself. Postgres handles several-thousand-bind `IN` clauses comfortably; the right safety net, if one is needed later, is log-and-continue, not truncate. `PreformattedBlocks` is restored so the preload visitor and the matcher's text-node walk share one `<pre>` / `<code>` skip. Follow-up to #23203
Contributor
`version#0123` previously resolved to version 123 via the `to_i` conversion that the link handler still uses. The earlier guard piggybacked on `WorkPackage::SemanticIdentifier.numeric_id?`, which treats canonical-shape strings only as numeric and so rejected leading-zero inputs along with the semantic shapes it was added to filter. The check is now a digit-only regex anchored to the full identifier. It admits any string that `to_i` parses to a positive integer (the existing renderers all dereference through `oid`), while still short-circuiting `version#PROJ-1` so prefixed resources don't issue `find_by(id: 0)`. The WP-side `numeric_id?` predicate stays intentionally stricter — `#123` should not resolve to WP 123 — so the two predicates diverge on purpose.
a70f389 to
c9d3b81
Compare
akabiru
added a commit
that referenced
this pull request
May 15, 2026
In semantic mode, every `#N` reference in legacy content needs a WP record to render the consistent `formatted_id` label users opted into. The per-match `find_by_display_id` fallback that ships in #23203 scales linearly with reference count: a wiki page with 50 refs runs 50 indexed PK lookups per render, and API collection endpoints / mailer fan-out / journal feeds compound the multiplier across records. `PatternMatcherFilter` now primes a `RequestStore`-backed lookup once per document via `ResourceLinksMatcher.with_preloaded_resources`. The link handler reads from `work_package_for(identifier)` so the same path serves numeric and semantic input. Cost is one batched `WorkPackage` SELECT per render, plus an alias-table SELECT only when historical identifiers are referenced. Classic mode short-circuits before any preload — `formatted_id` collapses to the numeric form, so the matched id alone is enough. The save/restore around nested `format_text` is retained: custom-field formatters re-enter the pipeline mid-render and must not clobber the outer document's lookup. The earlier `MAX_PRELOAD_IDENTIFIERS` cap is intentionally omitted. Silent truncation past position N would render the (N+1)th reference as literal text — a regression of the feature itself. Postgres handles several-thousand-bind `IN` clauses comfortably; the right safety net, if one is needed later, is log-and-continue, not truncate. `PreformattedBlocks` is restored so the preload visitor and the matcher's text-node walk share one `<pre>` / `<code>` skip. Follow-up to #23203
Copilot stopped work on behalf of
akabiru due to an error
May 15, 2026 07:48
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ticket
https://community.openproject.org/wp/74948
What are you trying to accomplish?
Backend support for project-based semantic identifiers in the work-package text-macro pipeline. In semantic mode the markdown macros
#PROJ-1,##PROJ-1, and###PROJ-1resolve to a clickable mention and quickinfo widgets respectively. Numeric#1234keeps working in both modes.Two pieces ship here:
#branch, reusing the pattern that route constraints already use. One source of truth for routing, formatting, and the editor's markdown processor.Adjacent behaviour changes:
version#PROJ-1no longer issue a wastedVersion.find_by(id: 0)— the hash separator rejects semantic-shape input before resolution.Out of scope:
WorkPackagepreload that batches lookups for bodies with many references (stacked in Reinstate doc-level preload cache for WP text macros #23221).What approach did you choose and why?
Resolution is mode-gated. Classic instances render semantic-shaped input as literal text (a
/work_packages/PROJ-1). In semantic mode, plain#PROJ-1requires a resolvable work package — unresolvable references fall through to literal text rather than producing a broken link.##/###quickinfo elements emit unconditionally with the user-facing identifier indata-id, since the frontend already handles missing work packages on the hover-card path. Hover-card URLs use the same identifier so users see consistent paths everywhere.The hash and revision separators sit on independent branches in the regex — the semantic shape applies only to
#references, revisions stay numeric.Merge checklist
work_packages_spec.rb,in_tool_links_spec.rb,macro_spec.rbgreen#PROJ-1renders as literal text (nodata-id="0"mention)