Skip to content

[#71645] Convert instance to semantic identifiers#22765

Merged
thykel merged 1 commit intodevfrom
feature/71654-semantic-converter-2
Apr 21, 2026
Merged

[#71645] Convert instance to semantic identifiers#22765
thykel merged 1 commit intodevfrom
feature/71654-semantic-converter-2

Conversation

@thykel
Copy link
Copy Markdown
Contributor

@thykel thykel commented Apr 14, 2026

Ticket

https://community.openproject.org/projects/communicator-stream/work_packages/71645/activity

What are you trying to accomplish?

Implement a procedure for converting OP instance from classic identifier mode to semantic identifier mode.

Screenshots

What approach did you choose and why?

ConvertInstanceToSemanticIdsJob dispatches a parallel batch of per-product conversion jobs.

ConvertProjectToSemanticIdsJob does the following:

  • Fix the project identifier if it is not in valid semantic format.
    • If there was an identifier generated in the past (e.g. when the instance was set to semantic some time ago), we reuse it.
  • Rewrite stale WP identifiers whose prefix no longer matches the project.
    • These are usually leftovers from previous semantic mode that have been moved to another project in numeric mode
  • Assign sequence numbers to WPs that have none yet (idempotent)
  • Seed the alias table for all historical project identifier prefixes

After the batch is done, FinishSemanticConversionJob does final cleanup after any system activity that might have occurred during the batch processing, then it finally flips the system switch.

Merge checklist

  • Added/updated tests
  • Added/updated documentation in Lookbook (patterns, previews, etc)
  • Tested major browsers (Chrome, Firefox, Edge, ...)

@thykel thykel force-pushed the feature/71654-semantic-converter-2 branch from 8f62ee3 to e35cdd0 Compare April 14, 2026 18:43
@thykel thykel requested a review from Copilot April 14, 2026 20:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the background-job driven “semantic conversion” procedure for switching an instance from classic project/work package identifiers to semantic identifiers, including per-project conversion, a pending-projects finder, and completion handling.

Changes:

  • Add ProjectIdentifiers::ConvertInstanceToSemanticIdsJob batching that enqueues per-project conversion jobs and a completion callback.
  • Add ProjectIdentifiers::ConvertProjectToSemanticService plus PendingProjectsFinder to detect and convert remaining projects/WPs.
  • Add specs covering the jobs, finder, and conversion service scenarios (identifier fixes, sequence backfill, stale identifiers, alias seeding).

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
app/workers/project_identifiers/convert_instance_to_semantic_ids_job.rb Enqueues per-project conversion jobs in a GoodJob batch with an on_success callback.
app/workers/project_identifiers/convert_project_to_semantic_ids_job.rb Per-project ActiveJob wrapper delegating to the conversion service.
app/workers/project_identifiers/finish_semantic_conversion_job.rb Batch success callback intended to finalize conversion and enable semantic mode.
app/services/project_identifiers/pending_projects_finder.rb Computes the set of project IDs still needing conversion/backfill.
app/services/project_identifiers/convert_project_to_semantic_service.rb Performs project identifier normalization, WP sequence/identifier backfill, and alias seeding.
app/services/work_packages/identifier_autofix/problematic_identifiers.rb Exposes format_error_reason for in-memory format checks used by the conversion flow.
app/models/projects/semantic_identifier.rb Adds previous_semantic_identifier helper to restore a historical semantic identifier when possible.
app/models/work_package_semantic_alias.rb Adds upsert_rows helper for bulk alias insertion with uniqueness.
app/models/setting/work_package_identifier.rb Adds enable_semantic! helper to switch the setting to semantic mode.
spec/workers/project_identifiers/convert_instance_to_semantic_ids_job_spec.rb Tests batch enqueueing and callback wiring.
spec/workers/project_identifiers/convert_project_to_semantic_ids_job_spec.rb Tests job delegates to conversion service.
spec/workers/project_identifiers/finish_semantic_conversion_job_spec.rb Tests completion logic behavior and re-run behavior.
spec/services/project_identifiers/pending_projects_finder_spec.rb Tests pending-project detection across identifier/WP states.
spec/services/project_identifiers/convert_project_to_semantic_service_spec.rb Tests conversion behavior across key edge cases (restore, regenerate, stale/moved WPs, alias seeding).

Comment thread app/workers/project_identifiers/finish_semantic_conversion_job.rb Outdated
Comment thread app/services/work_packages/identifier_autofix/problematic_identifiers.rb Outdated
Comment thread app/models/projects/semantic_identifier.rb Outdated
Comment thread app/workers/project_identifiers/finish_semantic_conversion_job.rb Outdated
@thykel thykel changed the title [#71645] Add the semantic conversion procedure [#71645] Convert instance to semantic identifiers Apr 15, 2026
@thykel thykel marked this pull request as ready for review April 15, 2026 21:15
@thykel thykel requested review from akabiru and judithroth April 15, 2026 21:33
Copy link
Copy Markdown
Member

@akabiru akabiru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one 🙌🏾 I've got some feedback for your consideration 🍎 🍏 🍊

Comment thread app/models/projects/semantic_identifier.rb Outdated
Comment thread app/models/setting/work_package_identifier.rb Outdated
Comment thread app/services/project_identifiers/convert_project_to_semantic_service.rb Outdated
Comment thread app/services/project_identifiers/pending_projects_finder.rb Outdated
Comment thread app/services/project_identifiers/pending_projects_finder.rb Outdated
Comment thread app/workers/project_identifiers/convert_instance_to_semantic_ids_job.rb Outdated
Comment thread app/workers/project_identifiers/finish_semantic_conversion_job.rb Outdated
Comment thread app/services/project_identifiers/convert_project_to_semantic_service.rb Outdated
Comment thread app/models/work_package_semantic_alias.rb Outdated
@thykel thykel force-pushed the feature/71654-semantic-converter-1 branch from 6ead97a to cc475e6 Compare April 16, 2026 18:35
@thykel thykel requested review from akabiru and Copilot April 16, 2026 19:10
Base automatically changed from feature/71654-semantic-converter-1 to dev April 16, 2026 19:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the background job/service flow for converting an OpenProject instance from classic to semantic work package identifiers by identifying “pending” projects, converting them in parallel, then running a final sweep and flipping the global setting.

Changes:

  • Add GoodJob batch orchestration (ConvertInstanceToSemanticIdsJob → per-project jobs → FinishSemanticConversionJob) for the instance-wide conversion.
  • Introduce services to detect projects needing conversion (PendingProjectsFinder) and convert an individual project (ConvertProjectToSemanticService).
  • Extend identifier format helpers and model scopes, with comprehensive RSpec coverage for the new behavior.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
app/workers/project_identifiers/convert_instance_to_semantic_ids_job.rb Enqueues a GoodJob batch of per-project conversion jobs with a success callback.
app/workers/project_identifiers/convert_project_to_semantic_ids_job.rb Per-project job delegating to the conversion service.
app/workers/project_identifiers/finish_semantic_conversion_job.rb Callback job to run corrective sweeps and enable semantic mode.
app/services/project_identifiers/pending_projects_finder.rb Computes which projects still require conversion/backfill.
app/services/project_identifiers/convert_project_to_semantic_service.rb Project-level conversion: fix identifier, reset stale WP ids, backfill, seed aliases.
app/services/work_packages/identifier_autofix/problematic_identifiers.rb Refactors identifier format classification into class helpers used by conversion.
app/models/work_package/semantic_identifier.rb Adds scopes used to detect stale semantic identifiers and sequenced WPs.
app/models/projects/semantic_identifier.rb Adds helper to restore a previous semantic identifier from FriendlyId history.
spec/workers/project_identifiers/convert_instance_to_semantic_ids_job_spec.rb Tests batch enqueueing and callback wiring.
spec/workers/project_identifiers/convert_project_to_semantic_ids_job_spec.rb Tests per-project job delegates to service.
spec/workers/project_identifiers/finish_semantic_conversion_job_spec.rb Tests sweeps, error path, and setting flip behavior.
spec/services/project_identifiers/pending_projects_finder_spec.rb Tests detection of pending projects across the three buckets.
spec/services/project_identifiers/convert_project_to_semantic_service_spec.rb Tests identifier fixing, WP backfill/rewrites, and alias seeding.
spec/services/work_packages/identifier_autofix/problematic_identifiers_spec.rb Tests new class-level format helpers and delegation.
spec/models/work_package/semantic_identifier_spec.rb Tests the newly added scopes.

Comment thread app/models/projects/semantic_identifier.rb Outdated
@thykel thykel requested a review from akabiru April 19, 2026 21:39
Copy link
Copy Markdown
Member

@akabiru akabiru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I'm okay to approve this if we want to move forward- and solve the following issue separately.

I still noticed connection pooling issues when testing again with edge db (Projects- 1049, WorkPackages 17310). I had to re-run 3 times, and in each instance the success message was a false positive; only upon reloading and selecting the "semantic" option did it resurface. I wonder if the UI also needs to run a post health check; if there are projects not yet migrated, present that info to the use to rerun to complete the migration. It would be good to narrow down the root cause though- perhaps reduce the ConvertProjectToSemanticIdsJob perform_limit to 1 same as JiraProjectsMetaDataJob


Screenshot 2026-04-20 at 3 25 33 PM Screenshot 2026-04-20 at 3 26 08 PM Screenshot 2026-04-20 at 3 25 06 PM

Comment thread app/models/projects/semantic_identifier.rb Outdated
Comment thread app/services/project_identifiers/convert_project_to_semantic_service.rb Outdated
Comment thread app/services/project_identifiers/identifier_autofix.rb
Comment thread app/services/project_identifiers/pending_projects_finder.rb
Comment thread app/services/project_identifiers/pending_projects_finder.rb Outdated
Comment thread app/workers/project_identifiers/finish_semantic_conversion_job.rb
end

def call
ApplicationRecord.transaction do
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍎 I wonder if the pool exhaustion we're seeing empirically is partly driven here — ApplicationRecord.transaction wraps all four steps, and fix_identifier_if_needed already takes a transaction-level advisory lock. So for a project with tens of thousands of WPs we'd be holding both a pool connection and the semantic_identifier_generation advisory lock for the entire backfill + alias seed, serializing every other per-project job behind it.

Could we scope the outer transaction to just step 1 and let steps 2–4 run in their own shorter transactions? They each look idempotent on retry (non_semantic filter, sequence_number: nil filter, insert_all unique_by:), so narrower boundaries should be safe and would free the connection much sooner.

Comment thread app/services/project_identifiers/convert_project_to_semantic_service.rb Outdated
private

def identifier_taken_by_other_project?(slug)
self.class.where.not(id:).exists?(["LOWER(identifier) = ?", slug.downcase])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍊 Nice helper overall — two things to sanity-check on identifier_taken_by_other_project?:

  1. It runs inside the find block (line 62), so it's one EXISTS query per slug in history — N+1 over slug history. Could be one Project.where.not(id:).where("LOWER(identifier) IN (?)", slugs.map(&:downcase)).pluck(...) and then find in Ruby.
  2. The check-then-use isn't atomic with the eventual project.save! in assign_semantic_identifier. The advisory lock in fix_identifier_if_needed blocks peer conversion jobs, but a user creating a project with that identifier through the UI isn't blocked. Belt-and-braces: rescue ActiveRecord::RecordNotUnique on save and fall back to generation.

Copy link
Copy Markdown
Contributor Author

@thykel thykel Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point 1 addressed.

Gonna leave point 2 to a follow-up.

@akabiru
Copy link
Copy Markdown
Member

akabiru commented Apr 20, 2026

I dug into the pool exhaustion I kept hitting with claude-code, and it points to the root cause being in ConvertProjectToSemanticService.

#call wraps all four steps in one transaction, and step 1 takes the semantic_identifier_generation advisory lock, so one pool connection and the global lock end up held for the entire per-project run. On top of that, backfill_missing_ids is a find_each loop doing per-WP advisory lock + UPDATE RETURNING + update_columns — roughly 50k round-trips for a 17k-WP project, all while other jobs wait behind the outer lock.

I wonder if narrowing the outer transaction to just step 1 would unblock it; steps 2–4 look idempotent on retry. The backfill could probably be one UPDATE … FROM (… ROW_NUMBER()) that bumps the counter by the batch size in one go. The per-WP wp_sequence_#{id} lock looks redundant given perform_limit: 1 plus the outer lock.


To make it concrete, something like:

  def call
    fix_identifier_if_needed   # already takes its own advisory lock (transaction: true)
    reset_stale_identifiers    # idempotent: filters on non_semantic(project)
    backfill_missing_ids       # idempotent: filters on sequence_number: nil, atomic CTE
    seed_alias_table           # idempotent: insert_all unique_by: :identifier
  end

  def backfill_missing_ids
    WorkPackage.connection.execute(<<~SQL.squish)
      WITH candidates AS (
        SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS offset
        FROM work_packages
        WHERE project_id = #{project.id} AND sequence_number IS NULL
      ),
      bumped AS (
        UPDATE projects
           SET wp_sequence_counter = wp_sequence_counter + (SELECT COUNT(*) FROM candidates)
         WHERE id = #{project.id}
         RETURNING wp_sequence_counter - (SELECT COUNT(*) FROM candidates) AS base,
                   identifier
      )
      UPDATE work_packages wp
         SET sequence_number = (SELECT base FROM bumped) + candidates.offset,
             identifier      = (SELECT identifier FROM bumped) || '-' ||
                               ((SELECT base FROM bumped) + candidates.offset)::text
        FROM candidates
       WHERE wp.id = candidates.id
    SQL
  end

One round-trip instead of ~3N, atomic against concurrent user-triggered allocate_wp_semantic_identifier! (they'd see the already-bumped counter), and the per-WP advisory lock drops out during backfill.


On concurrency — the CTE is safe today against user-traffic WP creates (UPDATE projects SET wp_sequence_counter = wp_sequence_counter + ? serializes via PostgreSQL's row-level lock; MVCC + the after_create-inside-tx semantics mean a concurrent insert is either visible with its seq already assigned or invisible until the user's own allocate handles it).

But the safety against concurrent conversion jobs for the same project rests entirely on perform_limit: 1 rather
than a mutex. To not regress that if the limit ever loosens, worth wrapping the CTE in the same per-project advisory lock that allocate_wp_semantic_identifier! already uses:

  def backfill_missing_ids
    OpenProject::Mutex.with_advisory_lock(WorkPackage, "wp_sequence_#{project.id}") do
      WorkPackage.connection.execute(<<~SQL.squish)
        # … same CTE as above …
      SQL
    end
  end

One advisory-lock acquisition per project instead of per WP — still roughly 17k× cheaper than today, and conversion + user traffic fully serialize on the same lock.

@akabiru
Copy link
Copy Markdown
Member

akabiru commented Apr 20, 2026

It would be good to narrow down the root cause though- perhaps reduce the ConvertProjectToSemanticIdsJob perform_limit to 1 same as JiraProjectsMetaDataJob

This did not address it- I still hit connection pool errors + incomplete state with false positive

Copy link
Copy Markdown
Contributor

@judithroth judithroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting the reviews a bit! The code is very readable 👍
There's one topic that I see around saving and validating but otherwise it looks very good.

Comment thread app/services/project_identifiers/pending_projects_finder.rb Outdated
Comment thread app/workers/project_identifiers/finish_semantic_conversion_job.rb Outdated
project.identifier = new_identifier
# Bypass validation, because we're technically still in classic mode, so the model would be applying
# validation for classic identifiers.
project.save!(validate: false)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get why you're doing it that way - the validations are for the "wrong" mode still.
However, there are validations that we should still consider because they are valuable. E.g. Projects::Identifier#identifier_not_reserved, which will do case-insensitive checks for collisions with reserved words, e.g. "new".

I think it would be possible to use the validations if we refactor them a bit. We just need a temporary switch:

module Projects::Identifier

validate :validate_identifier_format,
  if: ->(p) { p.identifier_changed? && p.identifier.present?)

attr_accessor :force_identifier_format

# remove the two existing validations validate :identifier_numeric_format and validate :identifier_alphanumeric_format

private

  def validate_identifier_format
    if identifier_format == Setting::WorkPackageIdentifier::SEMANTIC
      identifier_alphanumeric_format # rename to "validate_identifier_semantic_format"?
    else
      identifier_numeric_format # rename to "validate_identifier_classic_format"?
    end
  end

  def identifier_format
    force_identifier_format || Setting[:work_packages_identifier]
  end

Usage:

[2] pry(main)> Setting[:work_packages_identifier]
=> "semantic"
[3] pry(main)> project.identifier = "looooooooooooooooooooong"
=> "looooooooooooooooooooong"
[4] pry(main)> project.valid?
=> false
[5] pry(main)> project.force_identifier_format = Setting::WorkPackageIdentifier::CLASSIC
=> "classic"
[6] pry(main)> project.valid?
=> true
[7] pry(main)> project.save!
=> true
[8] pry(main)> project.reload.identifier
=> "looooooooooooooooooooong"

In addition to all that we maybe should make ProjectIdentifiers::IdentifierAutofix::ProjectIdentifierSuggestionGenerator aware of the reserved keywords, so none of them are chosen in the first place.
(However, I still don't like omitting validations completely - future developers will add validations for a reason and they might not be aware that they also have to adapt some generator to ensure whatever they want to ensure. We should at least check for errors on the attribute we want to change.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thykel I ran the job locally and created a collision with an reserved identifier.
image

Let's treat this as a separate bug so we can get this merged but still have it fixed before it's released.

Maybe we also want to explore other options than the one I already suggested - we could also maybe move all the identifier validations to contracts, which could then be reused on the suggestion generator and in the project - so that people adding / changing it, will catch both usages.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

@thykel thykel Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Since you already created the WP, I am also inclined to fix this via a fresh PR.

@judithroth
Copy link
Copy Markdown
Contributor

judithroth commented Apr 21, 2026

It would be good to narrow down the root cause though- perhaps reduce the ConvertProjectToSemanticIdsJob perform_limit to 1 same as JiraProjectsMetaDataJob

This did not address it- I still hit connection pool errors + incomplete state with false positive

The connection pool errors can not be solved with this PR.
Connection pool = number of connections the app allows to the database. This can be configured in config/database.yml. If you use multiple workers, you have to increase the size of that, because each worker needs its own connections. Usually one connection per thread - I don't know how good job handles this, you need to look it up (as I can't do it right now because I don't have the time for it)

@thykel
Copy link
Copy Markdown
Contributor Author

thykel commented Apr 21, 2026

I still noticed connection pooling issues when testing again with edge db

Argh 😅

So, the main problem I see here is that the project jobs are missing a retry strategy for the project job. I'm gonna add it to this to ensure that we can brute-force our way to success, and then deal with any other performance drawbacks in a follow-up PR, as we're not yet sure how reproducible this is.

Copy link
Copy Markdown
Member

@akabiru akabiru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me, thanks for addressing all the feedback!

⛑️ As discussed, connection pool topics will be handled via followup pr. 👍🏾

Copy link
Copy Markdown
Contributor

@judithroth judithroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on, Tom and addressing all our feedback. It's an interestingly complex topic to build this in a good way!

Since we agreed to do all that's left in separate work packages, I'll approve now as well 🙂

@thykel thykel force-pushed the feature/71654-semantic-converter-2 branch from ba392a9 to 56f130d Compare April 21, 2026 17:34
@thykel thykel merged commit bcc926a into dev Apr 21, 2026
16 of 17 checks passed
@thykel thykel deleted the feature/71654-semantic-converter-2 branch April 21, 2026 18:02
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants