Skip to content

fix(reindex): clear stale on-demand job before triggering reindex#28772

Merged
harshach merged 2 commits into
mainfrom
fix/reindex-ondemand-rerun
Jun 6, 2026
Merged

fix(reindex): clear stale on-demand job before triggering reindex#28772
harshach merged 2 commits into
mainfrom
fix/reindex-ondemand-rerun

Conversation

@mohityadav766
Copy link
Copy Markdown
Member

@mohityadav766 mohityadav766 commented Jun 5, 2026

Fixes #28773

The existing on-demand job left behind by a previous run was preventing further reindex runs. This clears it before triggering (and starts the scheduler in the CLI path).

🤖 Generated with Claude Code

A leftover on-demand job from a previous run that did not complete was
preventing further reindex runs. Start the scheduler in the CLI path and
clear any existing on-demand job before triggering.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 5, 2026 17:08
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

✅ PR checks passed

The linked issue has a description and all required Shipping project fields set. Thanks!

@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Jun 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a failure mode in the CLI reindex flow where a persisted Quartz on-demand job from a previous crashed run prevents subsequent reindex runs. It does so by ensuring the scheduler is started in the CLI path and clearing any stale on-demand job/trigger before triggering the reindex application.

Changes:

  • Start AppScheduler in the CLI reindex entrypoints so on-demand triggers can actually be scheduled.
  • Clear any existing on-demand Quartz job/trigger before triggering Search/DataInsights reindex apps.
  • Add unit tests covering stale on-demand job behavior and best-effort cleanup semantics.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/util/OpenMetadataOperations.java Starts the scheduler in CLI reindex commands and deletes stale on-demand jobs before triggering reindex apps.
openmetadata-service/src/main/java/org/openmetadata/service/apps/scheduler/AppScheduler.java Adds deleteOnDemandJob(App) helper to clear persisted on-demand job/trigger state.
openmetadata-service/src/test/java/org/openmetadata/service/apps/scheduler/AppSchedulerTest.java Adds tests validating stale on-demand job cleanup and failure behavior without cleanup.

Address review: deleteOnDemandJob now skips the delete when the on-demand
job is in getCurrentlyExecutingJobs(), so a genuinely running reindex is
never cleared. Narrow the Javadoc to the actual CLI usage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mohityadav766 mohityadav766 self-assigned this Jun 5, 2026
@mohityadav766 mohityadav766 added the To release Will cherry-pick this PR into the release branch label Jun 5, 2026
@mohityadav766 mohityadav766 removed the To release Will cherry-pick this PR into the release branch label Jun 5, 2026
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 5, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

🔴 Playwright Results — 1 failure(s), 10 flaky

✅ 4267 passed · ❌ 1 failed · 🟡 10 flaky · ⏭️ 88 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 301 0 0 4
🟡 Shard 2 803 0 1 9
🔴 Shard 3 808 1 2 8
🟡 Shard 4 845 0 2 12
✅ Shard 5 721 0 0 47
🟡 Shard 6 789 0 5 8

Genuine Failures (failed on all attempts)

Flow/ExploreAggregationCountsMatching.spec.ts › should verify left panel counts and tab search results for normal search (shard 3)
Error: Tab "table" search total hits should match the aggregation count

�[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoBe�[2m(�[22m�[32mexpected�[39m�[2m) // Object.is equality�[22m

Expected: �[32m24�[39m
Received: �[31m158�[39m
🟡 10 flaky test(s) (passed on retry)
  • Features/Glossary/GlossaryTermRelationsGraph.spec.ts › search in the Relations Graph filters to matching node and its neighbours (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/IngestionBot.spec.ts › Ingestion bot should be able to access domain specific domain (shard 3, 2 retries)
  • Pages/CustomProperties.spec.ts › Sql Query (shard 4, 1 retry)
  • Pages/DomainAdvanced.spec.ts › Admin can edit data product description (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage service type filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Admin user can edit teams from the user profile (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@harshach harshach merged commit e85f029 into main Jun 6, 2026
55 of 94 checks passed
@harshach harshach deleted the fix/reindex-ondemand-rerun branch June 6, 2026 16:09
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Jun 6, 2026

Code Review ✅ Approved 2 resolved / 2 findings

Clears stale on-demand reindex jobs before triggering new runs to prevent execution conflicts, resolving issues with unsafe job deletion and incorrect Javadoc documentation.

✅ 2 resolved
Edge Case: deleteOnDemandJob deletes job without checking if it is running

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/scheduler/AppScheduler.java:345-354 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/scheduler/AppScheduler.java:296-310 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/scheduler/AppScheduler.java:71 📄 openmetadata-service/src/main/java/org/openmetadata/service/util/OpenMetadataOperations.java:1918 📄 openmetadata-service/src/main/java/org/openmetadata/service/util/OpenMetadataOperations.java:2013
deleteOnDemandJob unconditionally calls scheduler.deleteJob/unscheduleJob for the on-demand JobKey/TriggerKey, then the reindex path immediately calls triggerOnDemandApplication. The Quartz scheduler is configured as clustered (org.quartz.jobStore.isClustered = true, AppScheduler.java:71). The non-concurrent guard inside triggerOnDemandApplication (AppScheduler.java:296-311) protects against re-triggering while a job is actually executing by inspecting getCurrentlyExecutingJobs(). By deleting the persisted job first and not checking getCurrentlyExecutingJobs(), deleteOnDemandJob removes that protection: if a previous on-demand reindex is genuinely still running on another cluster node (not merely a dead/stale leftover), the delete succeeds and a second concurrent reindex is scheduled even though SearchIndexingApplication has allowConcurrentExecution=false. Two simultaneous reindex runs can duplicate work or corrupt the target index. Consider guarding the delete: skip clearing (or fail) when the on-demand job appears in scheduler.getCurrentlyExecutingJobs(), so only truly stale (non-running) leftovers are removed.

Quality: Javadoc lists callers (server start, upgrade) that are not wired

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/scheduler/AppScheduler.java:339-345
The Javadoc on deleteOnDemandJob states it is "Used by maintenance/ops flows (server start, upgrade, CLI reindex)", but the only production callers added are the two CLI reindex paths in OpenMetadataOperations. There is no server-start or upgrade caller in the codebase. This is misleading documentation; either wire those flows or narrow the comment to describe the actual CLI reindex usage.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reindex CLI: stale on-demand job prevents further reindex runs

3 participants