Skip to content

[#73523] Implement WorkPackage semantic ID allocation system#22581

Merged
thykel merged 1 commit intodevfrom
impl/73373-semantic-registry-query-optimized
Apr 1, 2026
Merged

[#73523] Implement WorkPackage semantic ID allocation system#22581
thykel merged 1 commit intodevfrom
impl/73373-semantic-registry-query-optimized

Conversation

@thykel
Copy link
Copy Markdown
Contributor

@thykel thykel commented Mar 30, 2026

Ticket

https://community.openproject.org/projects/communicator-stream/work_packages/73436
https://community.openproject.org/projects/communicator-stream/work_packages/73523

What are you trying to accomplish?

Add support for semantic WP identifiers in the format of {project identifier}-{sequence number} (e.g. "PROJ-42").

The semantic WP identifier gets generated into the WorkPackage.identifier column via an after_create hook. This hook only functions if Setting::WorkPackageIdentifier is set to semantic (which is currently opt-in and hidden behind a feature flag).

The semantic WP identifiers have to be stable across time, i.e. even if the identifier changes multiple times (due to project rename or WP move), all old identifiers must still resolve to the same WP.

Additionally, we need to support "ghost identifiers", which are identifiers that have technically never been generated by the system, but we should still somehow support them to allow flexible aliases. Example:

  1. Project PROJ contains WP PROJ-1
  2. PROJ gets renamed to PROJ_NEW
  3. PROJ_NEW has a new WP added
  4. PROJ-2 needs to resolve to PROJ_NEW-2

What approach did you choose and why?

  • Add Project.wp_sequence_counter to track per-project WP sequences
  • Add WorkPackage.sequence_number to provide numeric representation of the sequence part of the ID
  • Add WorkPackage.identifier to reflect the current semantic ID of the work package
  • Add WorkPackageSemanticAlias to track all possible handles for each WP — both current and retired identifiers

The flow consists of the following steps, firing off a single query for the happy path, and 2 for expired/ghost identifiers.

  1. Resolve valid WP identifiers (fast path): Indexed lookup via work_packages.semantic_id (1 query).
  2. Resolve moved WPs & pre-rename identifiers (fallback): Indexed string lookup in WorkPackageSemanticAlias (1 additional query).
  3. 404.

Notes on initialization: The wp_sequence_counter, sequence_number, semantic_id, and alias records for existing work packages are initialized by the appropriate admin procedure when a user enables the semantic identifier option in admin settings. New work packages created after enabling the feature are handled automatically on insert/move.

Tradeoffs:

  • The alias table attempts to track all possible identifiers in the system, which can sometimes add up — e.g. if a project with 10k WPs gets its identifier changed once, the alias table now contains 10k + 10k records just for that specific project.
    • This is deemed acceptable as we are optimizing for API read performance and each row is quite tiny (just the handle + integer fkey). Additionally, this kind of design could be more future-proof for other identifier schemes.
    • Project rename operations use batched alias insertion and a single SQL REPLACE-based bulk update for semantic_id to keep rename cost bounded.
  • WorkPackage insert & move become a bit heavier as now they also have to write down aliases.

Merge checklist

  • Added/updated tests
  • Added/updated documentation in Lookbook (patterns, previews, etc)
  • Tested major browsers (Chrome, Firefox, Edge, ...)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a semantic Work Package identifier system ({PROJECT}-{SEQ} like PROJ-42) that remains resolvable across project renames and work package moves by maintaining an alias registry, while also updating routing and services to use the new identifier forms.

Changes:

  • Adds DB support for per-project sequence allocation (projects.wp_sequence_counter), per-WP sequence/semantic ID (work_packages.sequence_number, work_packages.semantic_id), and a global alias registry (work_package_semantic_aliases).
  • Introduces model concerns to allocate/update semantic IDs on WP create/move and project rename, plus lookup helpers (find_by_identifier).
  • Updates routes/controller to accept semantic IDs in URLs and adds integration/model specs for the new behavior.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
db/migrate/20260326100000_create_work_package_semantic_ids.rb Adds columns, alias table, and indexes for semantic ID storage + lookup.
app/models/work_package/semantic_identifier.rb Implements semantic ID allocation, alias writes, and identifier resolution helpers.
app/models/projects/semantic_identifier.rb Implements project-rename propagation for semantic IDs and aliases.
app/models/work_package_semantic_alias.rb Adds alias model with validations/associations and documentation.
app/services/work_packages/update_service.rb Hooks WP move handling to update semantic IDs after project changes.
app/services/projects/update_service.rb Hooks project identifier changes to update semantic IDs/aliases.
app/controllers/work_packages_controller.rb Updates WP lookup to support semantic IDs.
config/routes.rb Broadens WP show route constraint to include semantic IDs.
config/locales/en.yml Adds i18n labels for the alias model attributes.
.rubocop.yml Allows new find_by_* methods under Rails/DynamicFindBy.
spec/services/work_packages/semantic_ids/integration_spec.rb End-to-end coverage for create/move/rename/multi-step scenarios.
spec/models/work_package/semantic_identifier_spec.rb Unit coverage for lookup and write-side behavior.
spec/models/work_package_semantic_alias_spec.rb Validations/association coverage for alias model.

Comment thread app/controllers/work_packages_controller.rb Outdated
Comment thread app/models/work_package/semantic_identifier.rb Outdated
Comment thread app/models/work_package/semantic_identifier.rb Outdated
Comment thread app/models/work_package/semantic_identifier.rb Outdated
Comment thread app/models/projects/semantic_identifier.rb Outdated
Comment thread app/models/work_package_semantic_alias.rb Outdated
Comment thread app/models/work_package_semantic_alias.rb Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Comment thread db/migrate/20260330100000_create_work_package_semantic_ids.rb
Comment thread app/models/projects/semantic_identifier_operations.rb Outdated
Comment thread app/models/projects/semantic_identifier_operations.rb Outdated
Comment thread app/models/work_package/semantic_identifier.rb Outdated
@thykel thykel marked this pull request as ready for review March 30, 2026 11:11
@thykel thykel requested review from akabiru and judithroth March 30, 2026 11:14
Copy link
Copy Markdown
Contributor

@judithroth judithroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe I did not yet grasp every detail, but here are some first thoughts about this. It's coming along nice!

Comment thread app/controllers/work_packages_controller.rb Outdated
def rewrite_semantic_ids(like_pattern, prefix, new_prefix)
WorkPackage
.where("semantic_id LIKE ?", like_pattern)
.update_all(["semantic_id = REPLACE(semantic_id, ?, ?)", prefix, new_prefix])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the number of WorkPackages and the number of WorkPackageSemanticAliases should be roughly the same (differing only by the ones that were moved to another project) I wonder if we should use in_batches for both? (if we stay with doing it through ActiveRecord - if both could be done in the database directly that would also be nice).

Comment thread app/models/projects/semantic_identifier.rb Outdated
@@ -29,6 +29,7 @@
#++

class WorkPackage < ApplicationRecord
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I was thinking around the "batch-insert-and-change" code (see my last comments.

At first I thought we would do something like (horrible pseudo-code incoming):

transaction do
  wp.identifier = new_identifier
  wp.save!
  WorkPackageSemanticAliases.create!(identifier: new_identifier, work_package_id: id)
end

when we detect changes on the identifier.

But compared to your solution, this will be slower since we're going record by record. So I strongly prefer your solution!
However, I was thinking if we should somehow disallow changes to the identifier of a WorkPackage, to make sure, no one is messing with a WorkPackages identifier manually. On Projects, it's allowed to change the identifier of the record, so someone else could easily somehow not understand that just changing the identifier on a WorkPackage in isolation "breaks" our system.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I was thinking if we should somehow disallow changes to the identifier of a WorkPackage, to make sure, no one is messing with a WorkPackages identifier manually

I gave it some thought yesterday and couldn't figure out a way that didn't seem needlessly messy (e.g. introducing a thread-local lock variable that is momentarily bypassed in semantic helpers).

This would mainly be a warning to our developers, so I guess the closest thing that comes to mind is some extra custom validation on the model. But then, we would probably have to make it check some state, which already feels like an overkill. 🤔

Copy link
Copy Markdown
Contributor Author

@thykel thykel Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: If we decide to keep that sequence number field for pagination purposes, this is where an extra validation actually makes a lot of sense -- just a simple one-liner that verifies that both fields are set at once, as opposed to only identifier or only sequence number. That should address the denormalization concerns from another comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: If we decide to keep that sequence number field for pagination purposes, this is where an extra validation actually makes a lot of sense -- just a simple one-liner that verifies that both fields are set at once, as opposed to only identifier or only sequence number. That should address the denormalization concerns from another comment.

=> Add WP validator to ensure setting identifier + semantic number at the same time

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the follow-up work packages, Tom! I don't have a solution handy for the topic I raised in my original comment, but I still think this is important. So I added an open point for it: https://community.openproject.org/wp/73713

Comment thread app/models/work_package/semantic_identifier.rb Outdated
Comment thread db/migrate/20260326100000_create_work_package_semantic_ids.rb Outdated
Comment thread app/models/work_package/semantic_identifier.rb Outdated
Comment thread config/routes.rb Outdated
Comment thread db/migrate/20260330100000_create_work_package_semantic_ids.rb
Comment thread spec/models/work_package/semantic_identifier_operations_spec.rb Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Comment thread app/models/work_package/semantic_identifier.rb
Comment thread app/models/work_package/semantic_identifier.rb Outdated
Comment thread app/controllers/work_packages_controller.rb Outdated
Comment thread config/routes.rb Outdated
Comment thread config/routes.rb
@thykel thykel requested a review from judithroth March 30, 2026 22:02
Comment thread app/models/work_package/semantic_identifier.rb Outdated
Comment thread app/services/work_packages/update_service.rb
Comment thread config/routes.rb Outdated
Comment thread spec/models/work_package/semantic_identifier_spec.rb
Comment thread app/models/projects/semantic_identifier_operations.rb Outdated
@akabiru
Copy link
Copy Markdown
Member

akabiru commented Mar 31, 2026

🍊 55 commits for this PR (rubocop × 5, "cosmetics", "remove empty line", "rename", "lint"). Should be squashed before merge.

Comment thread app/models/projects/identifier.rb Outdated
@thykel
Copy link
Copy Markdown
Contributor Author

thykel commented Mar 31, 2026

🍊 55 commits for this PR (rubocop × 5, "cosmetics", "remove empty line", "rename", "lint"). Should be squashed before merge.

This is where I would really rather appreciate squash merge. 😢 If we do a cosmetic squash + force push just to have things cleaned up, on GitHub it often completely wipes out the linearity of PR discussion and makes the individual threads actually harder to track down.

If the merge policy is iron-clad, I'm considering closing this PR, re-pushing squashed version to a new PR, then merging that one instead.

(I know, the alternative is to also write "nicer" commits from the very beginning. But I think it's very rare that a long-running PR consists primarily of self-contained, meaningful and testable commits.)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Comment thread app/models/work_package/semantic_identifier.rb
Comment thread app/models/projects/semantic_identifier.rb
@akabiru
Copy link
Copy Markdown
Member

akabiru commented Apr 1, 2026

🍊 55 commits for this PR (rubocop × 5, "cosmetics", "remove empty line", "rename", "lint"). Should be squashed before merge.

This is where I would really rather appreciate squash merge. 😢 If we do a cosmetic squash + force push just to have things cleaned up, on GitHub it often completely wipes out the linearity of PR discussion and makes the individual threads actually harder to track down.

If the merge policy is iron-clad, I'm considering closing this PR, re-pushing squashed version to a new PR, then merging that one instead.

(I know, the alternative is to also write "nicer" commits from the very beginning. But I think it's very rare that a long-running PR consists primarily of self-contained, meaningful and testable commits.)

AFAIK, it's not iron clad- we have loads of commits in dev that are less "nicer" :) - 🍏 your call

@thykel thykel changed the title [#73523] Implement WorkPackage semantic ID resolution system (query-optimized) [#73523] Implement WorkPackage semantic ID resolution system Apr 1, 2026
Copy link
Copy Markdown
Member

@akabiru akabiru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well in, Tom! 🥇 :shipit:

Comment thread app/models/projects/semantic_identifier.rb
Comment thread app/models/project.rb
Comment thread spec/models/projects/semantic_identifier_spec.rb Outdated
Comment thread spec/models/projects/semantic_identifier_spec.rb
Comment thread spec/models/projects/semantic_identifier_spec.rb
Comment thread spec/models/projects/semantic_identifier_spec.rb
@thykel thykel force-pushed the impl/73373-semantic-registry-query-optimized branch from 144bf65 to 46027a4 Compare April 1, 2026 15:20
@thykel thykel force-pushed the impl/73373-semantic-registry-query-optimized branch from 46027a4 to c4debc8 Compare April 1, 2026 15:25
@thykel
Copy link
Copy Markdown
Contributor Author

thykel commented Apr 1, 2026

🍊 55 commits for this PR (rubocop × 5, "cosmetics", "remove empty line", "rename", "lint"). Should be squashed before merge.

This is where I would really rather appreciate squash merge. 😢 If we do a cosmetic squash + force push just to have things cleaned up, on GitHub it often completely wipes out the linearity of PR discussion and makes the individual threads actually harder to track down.

If the merge policy is iron-clad, I'm considering closing this PR, re-pushing squashed version to a new PR, then merging that one instead.

(I know, the alternative is to also write "nicer" commits from the very beginning. But I think it's very rare that a long-running PR consists primarily of self-contained, meaningful and testable commits.)

OK, after squashing this all into a single commit just now, I have to admit that at least the discussion doesn't get lost in the ether anymore. So, this is where GitHub seems to have improved over the last few years. 👍

@thykel thykel changed the title [#73523] Implement WorkPackage semantic ID resolution system [#73523] Implement WorkPackage semantic ID allocation system Apr 1, 2026
@thykel thykel merged commit e7bfb8f into dev Apr 1, 2026
16 of 17 checks passed
@thykel thykel deleted the impl/73373-semantic-registry-query-optimized branch April 1, 2026 15:53
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 1, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants