[#71645] Convert instance to semantic identifiers#22765
Conversation
8f62ee3 to
e35cdd0
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces the background-job driven “semantic conversion” procedure for switching an instance from classic project/work package identifiers to semantic identifiers, including per-project conversion, a pending-projects finder, and completion handling.
Changes:
- Add
ProjectIdentifiers::ConvertInstanceToSemanticIdsJobbatching that enqueues per-project conversion jobs and a completion callback. - Add
ProjectIdentifiers::ConvertProjectToSemanticServiceplusPendingProjectsFinderto detect and convert remaining projects/WPs. - Add specs covering the jobs, finder, and conversion service scenarios (identifier fixes, sequence backfill, stale identifiers, alias seeding).
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| app/workers/project_identifiers/convert_instance_to_semantic_ids_job.rb | Enqueues per-project conversion jobs in a GoodJob batch with an on_success callback. |
| app/workers/project_identifiers/convert_project_to_semantic_ids_job.rb | Per-project ActiveJob wrapper delegating to the conversion service. |
| app/workers/project_identifiers/finish_semantic_conversion_job.rb | Batch success callback intended to finalize conversion and enable semantic mode. |
| app/services/project_identifiers/pending_projects_finder.rb | Computes the set of project IDs still needing conversion/backfill. |
| app/services/project_identifiers/convert_project_to_semantic_service.rb | Performs project identifier normalization, WP sequence/identifier backfill, and alias seeding. |
| app/services/work_packages/identifier_autofix/problematic_identifiers.rb | Exposes format_error_reason for in-memory format checks used by the conversion flow. |
| app/models/projects/semantic_identifier.rb | Adds previous_semantic_identifier helper to restore a historical semantic identifier when possible. |
| app/models/work_package_semantic_alias.rb | Adds upsert_rows helper for bulk alias insertion with uniqueness. |
| app/models/setting/work_package_identifier.rb | Adds enable_semantic! helper to switch the setting to semantic mode. |
| spec/workers/project_identifiers/convert_instance_to_semantic_ids_job_spec.rb | Tests batch enqueueing and callback wiring. |
| spec/workers/project_identifiers/convert_project_to_semantic_ids_job_spec.rb | Tests job delegates to conversion service. |
| spec/workers/project_identifiers/finish_semantic_conversion_job_spec.rb | Tests completion logic behavior and re-run behavior. |
| spec/services/project_identifiers/pending_projects_finder_spec.rb | Tests pending-project detection across identifier/WP states. |
| spec/services/project_identifiers/convert_project_to_semantic_service_spec.rb | Tests conversion behavior across key edge cases (restore, regenerate, stale/moved WPs, alias seeding). |
akabiru
left a comment
There was a problem hiding this comment.
Nice one 🙌🏾 I've got some feedback for your consideration 🍎 🍏 🍊
6ead97a to
cc475e6
Compare
There was a problem hiding this comment.
Pull request overview
Implements the background job/service flow for converting an OpenProject instance from classic to semantic work package identifiers by identifying “pending” projects, converting them in parallel, then running a final sweep and flipping the global setting.
Changes:
- Add GoodJob batch orchestration (
ConvertInstanceToSemanticIdsJob→ per-project jobs →FinishSemanticConversionJob) for the instance-wide conversion. - Introduce services to detect projects needing conversion (
PendingProjectsFinder) and convert an individual project (ConvertProjectToSemanticService). - Extend identifier format helpers and model scopes, with comprehensive RSpec coverage for the new behavior.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| app/workers/project_identifiers/convert_instance_to_semantic_ids_job.rb | Enqueues a GoodJob batch of per-project conversion jobs with a success callback. |
| app/workers/project_identifiers/convert_project_to_semantic_ids_job.rb | Per-project job delegating to the conversion service. |
| app/workers/project_identifiers/finish_semantic_conversion_job.rb | Callback job to run corrective sweeps and enable semantic mode. |
| app/services/project_identifiers/pending_projects_finder.rb | Computes which projects still require conversion/backfill. |
| app/services/project_identifiers/convert_project_to_semantic_service.rb | Project-level conversion: fix identifier, reset stale WP ids, backfill, seed aliases. |
| app/services/work_packages/identifier_autofix/problematic_identifiers.rb | Refactors identifier format classification into class helpers used by conversion. |
| app/models/work_package/semantic_identifier.rb | Adds scopes used to detect stale semantic identifiers and sequenced WPs. |
| app/models/projects/semantic_identifier.rb | Adds helper to restore a previous semantic identifier from FriendlyId history. |
| spec/workers/project_identifiers/convert_instance_to_semantic_ids_job_spec.rb | Tests batch enqueueing and callback wiring. |
| spec/workers/project_identifiers/convert_project_to_semantic_ids_job_spec.rb | Tests per-project job delegates to service. |
| spec/workers/project_identifiers/finish_semantic_conversion_job_spec.rb | Tests sweeps, error path, and setting flip behavior. |
| spec/services/project_identifiers/pending_projects_finder_spec.rb | Tests detection of pending projects across the three buckets. |
| spec/services/project_identifiers/convert_project_to_semantic_service_spec.rb | Tests identifier fixing, WP backfill/rewrites, and alias seeding. |
| spec/services/work_packages/identifier_autofix/problematic_identifiers_spec.rb | Tests new class-level format helpers and delegation. |
| spec/models/work_package/semantic_identifier_spec.rb | Tests the newly added scopes. |
There was a problem hiding this comment.
Looking good! I'm okay to approve this if we want to move forward- and solve the following issue separately.
I still noticed connection pooling issues when testing again with edge db (Projects- 1049, WorkPackages 17310). I had to re-run 3 times, and in each instance the success message was a false positive; only upon reloading and selecting the "semantic" option did it resurface. I wonder if the UI also needs to run a post health check; if there are projects not yet migrated, present that info to the use to rerun to complete the migration. It would be good to narrow down the root cause though- perhaps reduce the ConvertProjectToSemanticIdsJob perform_limit to 1 same as JiraProjectsMetaDataJob
| end | ||
|
|
||
| def call | ||
| ApplicationRecord.transaction do |
There was a problem hiding this comment.
🍎 I wonder if the pool exhaustion we're seeing empirically is partly driven here — ApplicationRecord.transaction wraps all four steps, and fix_identifier_if_needed already takes a transaction-level advisory lock. So for a project with tens of thousands of WPs we'd be holding both a pool connection and the semantic_identifier_generation advisory lock for the entire backfill + alias seed, serializing every other per-project job behind it.
Could we scope the outer transaction to just step 1 and let steps 2–4 run in their own shorter transactions? They each look idempotent on retry (non_semantic filter, sequence_number: nil filter, insert_all unique_by:), so narrower boundaries should be safe and would free the connection much sooner.
| private | ||
|
|
||
| def identifier_taken_by_other_project?(slug) | ||
| self.class.where.not(id:).exists?(["LOWER(identifier) = ?", slug.downcase]) |
There was a problem hiding this comment.
🍊 Nice helper overall — two things to sanity-check on identifier_taken_by_other_project?:
- It runs inside the
findblock (line 62), so it's oneEXISTSquery per slug in history — N+1 over slug history. Could be oneProject.where.not(id:).where("LOWER(identifier) IN (?)", slugs.map(&:downcase)).pluck(...)and thenfindin Ruby. - The check-then-use isn't atomic with the eventual
project.save!inassign_semantic_identifier. The advisory lock infix_identifier_if_neededblocks peer conversion jobs, but a user creating a project with that identifier through the UI isn't blocked. Belt-and-braces: rescueActiveRecord::RecordNotUniqueon save and fall back to generation.
There was a problem hiding this comment.
Point 1 addressed.
Gonna leave point 2 to a follow-up.
|
I dug into the pool exhaustion I kept hitting with claude-code, and it points to the root cause being in
I wonder if narrowing the outer transaction to just step 1 would unblock it; steps 2–4 look idempotent on retry. The backfill could probably be one To make it concrete, something like: def call
fix_identifier_if_needed # already takes its own advisory lock (transaction: true)
reset_stale_identifiers # idempotent: filters on non_semantic(project)
backfill_missing_ids # idempotent: filters on sequence_number: nil, atomic CTE
seed_alias_table # idempotent: insert_all unique_by: :identifier
end
def backfill_missing_ids
WorkPackage.connection.execute(<<~SQL.squish)
WITH candidates AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS offset
FROM work_packages
WHERE project_id = #{project.id} AND sequence_number IS NULL
),
bumped AS (
UPDATE projects
SET wp_sequence_counter = wp_sequence_counter + (SELECT COUNT(*) FROM candidates)
WHERE id = #{project.id}
RETURNING wp_sequence_counter - (SELECT COUNT(*) FROM candidates) AS base,
identifier
)
UPDATE work_packages wp
SET sequence_number = (SELECT base FROM bumped) + candidates.offset,
identifier = (SELECT identifier FROM bumped) || '-' ||
((SELECT base FROM bumped) + candidates.offset)::text
FROM candidates
WHERE wp.id = candidates.id
SQL
endOne round-trip instead of ~3N, atomic against concurrent user-triggered allocate_wp_semantic_identifier! (they'd see the already-bumped counter), and the per-WP advisory lock drops out during backfill. On concurrency — the CTE is safe today against user-traffic WP creates ( But the safety against concurrent conversion jobs for the same project rests entirely on def backfill_missing_ids
OpenProject::Mutex.with_advisory_lock(WorkPackage, "wp_sequence_#{project.id}") do
WorkPackage.connection.execute(<<~SQL.squish)
# … same CTE as above …
SQL
end
endOne advisory-lock acquisition per project instead of per WP — still roughly 17k× cheaper than today, and conversion + user traffic fully serialize on the same lock. |
This did not address it- I still hit connection pool errors + incomplete state with false positive |
judithroth
left a comment
There was a problem hiding this comment.
Thanks for splitting the reviews a bit! The code is very readable 👍
There's one topic that I see around saving and validating but otherwise it looks very good.
| project.identifier = new_identifier | ||
| # Bypass validation, because we're technically still in classic mode, so the model would be applying | ||
| # validation for classic identifiers. | ||
| project.save!(validate: false) |
There was a problem hiding this comment.
I get why you're doing it that way - the validations are for the "wrong" mode still.
However, there are validations that we should still consider because they are valuable. E.g. Projects::Identifier#identifier_not_reserved, which will do case-insensitive checks for collisions with reserved words, e.g. "new".
I think it would be possible to use the validations if we refactor them a bit. We just need a temporary switch:
module Projects::Identifier
validate :validate_identifier_format,
if: ->(p) { p.identifier_changed? && p.identifier.present?)
attr_accessor :force_identifier_format
# remove the two existing validations validate :identifier_numeric_format and validate :identifier_alphanumeric_format
private
def validate_identifier_format
if identifier_format == Setting::WorkPackageIdentifier::SEMANTIC
identifier_alphanumeric_format # rename to "validate_identifier_semantic_format"?
else
identifier_numeric_format # rename to "validate_identifier_classic_format"?
end
end
def identifier_format
force_identifier_format || Setting[:work_packages_identifier]
endUsage:
[2] pry(main)> Setting[:work_packages_identifier]
=> "semantic"
[3] pry(main)> project.identifier = "looooooooooooooooooooong"
=> "looooooooooooooooooooong"
[4] pry(main)> project.valid?
=> false
[5] pry(main)> project.force_identifier_format = Setting::WorkPackageIdentifier::CLASSIC
=> "classic"
[6] pry(main)> project.valid?
=> true
[7] pry(main)> project.save!
=> true
[8] pry(main)> project.reload.identifier
=> "looooooooooooooooooooong"
In addition to all that we maybe should make ProjectIdentifiers::IdentifierAutofix::ProjectIdentifierSuggestionGenerator aware of the reserved keywords, so none of them are chosen in the first place.
(However, I still don't like omitting validations completely - future developers will add validations for a reason and they might not be aware that they also have to adapt some generator to ensure whatever they want to ensure. We should at least check for errors on the attribute we want to change.)
There was a problem hiding this comment.
@thykel I ran the job locally and created a collision with an reserved identifier.
![]()
Let's treat this as a separate bug so we can get this merged but still have it fixed before it's released.
Maybe we also want to explore other options than the one I already suggested - we could also maybe move all the identifier validations to contracts, which could then be reused on the suggestion generator and in the project - so that people adding / changing it, will catch both usages.
There was a problem hiding this comment.
Here's the bug: https://community.openproject.org/wp/74161
There was a problem hiding this comment.
Thanks! Since you already created the WP, I am also inclined to fix this via a fresh PR.
The connection pool errors can not be solved with this PR. |
Argh 😅 So, the main problem I see here is that the project jobs are missing a retry strategy for the project job. I'm gonna add it to this to ensure that we can brute-force our way to success, and then deal with any other performance drawbacks in a follow-up PR, as we're not yet sure how reproducible this is. |
judithroth
left a comment
There was a problem hiding this comment.
Thanks for taking this on, Tom and addressing all our feedback. It's an interestingly complex topic to build this in a good way!
Since we agreed to do all that's left in separate work packages, I'll approve now as well 🙂
ba392a9 to
56f130d
Compare
Ticket
https://community.openproject.org/projects/communicator-stream/work_packages/71645/activity
What are you trying to accomplish?
Implement a procedure for converting OP instance from classic identifier mode to semantic identifier mode.
Screenshots
What approach did you choose and why?
ConvertInstanceToSemanticIdsJobdispatches a parallel batch of per-product conversion jobs.ConvertProjectToSemanticIdsJobdoes the following:After the batch is done,
FinishSemanticConversionJobdoes final cleanup after any system activity that might have occurred during the batch processing, then it finally flips the system switch.Merge checklist