feat(workflows): add onConflict trigger policy with retrigger-as-branch by IceS2 · Pull Request #26865 · open-metadata/OpenMetadata

IceS2 · 2026-03-30T15:41:54Z

Summary

Adds configurable onConflict policy to eventBasedEntityTrigger config with three modes: restart (default, existing behavior), skip (keep existing instance), and forward (deliver retrigger status to active ManualTask)
forward mode delivers a "retrigger" status through the ManualTask's existing IntermediateCatchEvent path — zero BPMN structural changes needed
ManualTask exposes "retrigger" as a routable branch only when the trigger uses onConflict=forward, controlled via retriggerEnabled on WorkflowConfiguration (set automatically by the MainWorkflow compiler)
Workflow authors wire the retrigger edge via normal graph edges: self-loop for incidents (task continues), back-to-decision-tree for approvals (re-evaluation)

Why

When the same entity triggers a workflow again while an instance is already running (e.g., test case fails again during open incident, glossary term edited while in review), the current behavior always kills the old instance and starts a new one. This destroys in-progress human task state for incident workflows. The onConflict policy lets each workflow declare the correct behavior.

Changes

File	Change
`eventBasedEntityTrigger.json`	`onConflict` enum: restart/skip/forward
`workflowDefinition.json`	`retriggerEnabled` boolean on WorkflowConfiguration
`MainWorkflow.java`	Derives `retriggerEnabled` from trigger's onConflict at compile time
`ManualTask.java`	Conditionally adds "retrigger" to statuses + branches
`EventBasedEntityTrigger.java`	Passes onConflict as FieldExtension to filter
`FilterEntityImpl.java`	Switch on policy: restart → terminate, skip → block, forward → deliver retrigger
`WorkflowHandler.java`	`hasRunningInstance()` + `forwardRetrigger()` using Flowable's `variableValueEquals`
`ManualTaskOutboxIT.java`	E2E tests for all three policies

Test plan

onConflict_restart_terminatesOldWorkflow — second trigger terminates old workflow (status → FAILURE)
onConflict_skip_keepsExistingWorkflow — second trigger ignored, same task + workflow preserved, instance count stays 1
onConflict_forward_deliversRetriggerToManualTask — retrigger delivered, ManualTask re-entered, same task stays open
outboxDeliversStatusChangesInOrder_andSyncsTcrs — existing outbox pipeline + TCRS sync unchanged

Adds a generic, configurable-status human task node for governance workflows. The node creates an OM Task, waits for status transitions via IntermediateCatchEvent messages, and routes based on terminal vs non-terminal statuses. Key components: - ManualTask.java: BPMN subprocess builder (setup → wait → route → close) - SetupDelegate/SetupImpl: Task creation, idempotent on cycle re-entry - CheckTerminalDelegate: Validates status against template - CloseTaskDelegate/CloseTaskImpl: Closes task on terminal status - SetResultDelegate: Propagates status to parent for edge routing - ManualTaskTemplateResolver: Template-based status configuration - ManualTaskDefinition JSON schema + nodeType/nodeSubType registration The node is domain-agnostic — incident/approval behavior lives in the workflow graph around the node, not inside it.

Remove inputNamespaceMapExpr and configMapExpr from BaseDelegate. Each delegate now declares its own Expression fields, preventing NullPointerExceptions in delegates that don't use these fields (e.g., SetResultDelegate, CheckTerminalDelegate, CloseTaskDelegate).

- Fix: isAlreadyClosed now only checks task.getResolution() != null. Previously it also checked terminalStatuses.contains(currentStatus), which always returned true when CloseTask runs (the PATCH already set the terminal status), leaving tasks permanently unresolved. - Remove unused terminalStatuses parameter from closeTask/CloseTaskDelegate - Rename taskCreated → taskAlreadyExists for clarity: the variable means "should we enter the message-waiting phase" (true on re-entry, false on first creation)

…ual-task-node

Implements the bridge that connects Task status changes to the ManualTask workflow node via Flowable message delivery. Bridge chain: TaskRepository.postUpdate() detects status change → TaskWorkflowHandler.transitionManualTaskStatus() (sends updatedBy) → WorkflowHandler.sendManualTaskMessage() with async exponential retry Key design decisions: - postUpdate wrapped in try-catch: workflow failures never break PATCH - Async retry via ScheduledExecutorService + resilience4j IntervalFunction: 500ms → 1s → 2s → 4s → 5s cap (~12.5s total coverage) - First attempt synchronous (fast path), retries non-blocking - CloseTaskImpl uses actual user from PATCH, falls back to governance-bot Also includes: - WorkflowDefinitionRepository/WorkflowInstanceStateRepository updates - CollectionDAO, ListFilter, EntityResource supporting changes - SQL migration (2.0.0) for stageResult generated column - ManualTaskWorkflowTest: full E2E lifecycle test

- Fix: catch FlowableOptimisticLockingException in tryDeliverMessage and return false to trigger retry (concurrent modification means another thread may have consumed the subscription) - Fix: nonTerminalReachable BFS now iterates the full edges list at every step, not just the unfiltered outgoingEdges map. Prevents following terminal-condition edges from intermediate nodes. - Refactor: hoist IntervalFunction to a static final constant instead of recreating on each retry. Made interval fields final. - Fix: remove IF NOT EXISTS from PostgreSQL migration for consistency with MySQL pattern (Flyway handles migration idempotency)

…tConsumer Add Entity.TASK to validEntityTypes, detect workflow-managed task status changes via isWorkflowManagedTaskStatusChange, and enqueue them to the outbox table via enqueueTaskMessage before the existing signal broadcast.

Add 4 reflection-based unit tests for isWorkflowManagedTaskStatusChange covering early-return conditions: non-update events, non-task entity types, missing changeDescription, and non-status field changes.

Remove the TaskRepository.postUpdate override that synchronously called TransitionManualTaskStatus, the TaskWorkflowHandler.transitionManualTaskStatus method it depended on, and the WorkflowHandler async retry infrastructure (sendManualTaskMessage, scheduleMessageRetry, tryDeliverMessage, and their backing constants and ScheduledExecutorService). Task status transitions are now delivered exclusively via the Transactional Outbox pattern.

…andler Start the drainer after the process engine is built in the constructor. Restart it when initializeNewProcessEngine() rebuilds the engine at runtime. Shut it down gracefully via WorkflowHandler.shutdown(), which is called from ManagedShutdown.stop() in OpenMetadataApplication.

…tency The E2E test must tolerate up to 10s CE poll + 30s drainer poll plus margin. Raise all Awaitility atMost() values to 90 seconds.

…roadcast disruption A DB failure during outbox INSERT should not prevent the signal broadcast path from executing. Log the error and continue.

…rity

…an older row SKIP LOCKED skips individual locked rows, not entire task groups. Without this guard, Worker B could grab a newer status while Worker A still holds the oldest — violating per-task ordering. The fix queries the absolute oldest createdAt (no lock) and skips the task if the locked row is newer.

…ycle Collapse findDistinctPendingTaskIds + per-task findAndLockOldestPending + per-task findOldestPendingCreatedAt into a single findAndLockAllOldestPending query using MIN(createdAt) JOIN with FOR UPDATE SKIP LOCKED. Per-task ordering is preserved naturally: if the oldest row for a task is locked by another worker, the JOIN produces no match for that task.

C2: Replace MIN(createdAt) JOIN with ROW_NUMBER() PARTITION BY taskId to guarantee exactly one row per task even with identical timestamps. C3: Split bulk-lock transaction into bulk-read (no lock) + per-entry transactions. Row locks now held only during single-entry delivery, not the entire batch. Flowable API calls no longer inside a DB transaction holding locks on other rows. I2: Add MAX_ATTEMPTS=100. Entries exceeding this are excluded from the drain query and effectively dead-lettered for investigation. I3: Call cleanupDelivered() at end of each drain cycle with 7-day retention to prevent unbounded table growth. I4: Extract workflowInstanceId from ChangeEvent entity payload instead of fetching from DB. Eliminates extra round trip per task status change event. I5: Move signal broadcast before outbox enqueue so it always fires. Wrap enqueue in resilience4j retry (3 attempts) for transient DB errors. Unhandled failure propagates to event publisher for retry.

Add LIMIT 500 with ORDER BY attempts ASC, createdAt ASC to prevent unbounded result sets and prioritize fresh messages over stuck ones. Separate cleanup into its own try-catch for cleaner error diagnostics.

…OutboxIT Move E2E test to integration-tests module where the full application stack is running (CE pipeline, schedulers, drainer). The test verifies the complete outbox delivery pipeline through observable outcomes: 1. Deploy workflow → create table → workflow triggers → task created 2. PATCH task InProgress → PATCH task Completed 3. Poll workflow instance until FINISHED status 4. Assert stage results contain expected status transitions

…sage

…nqueue

…t enqueue

Strangler Fig bridge that syncs Task lifecycle events to TCRS records, enabling workflow-managed incidents to keep the existing incident UI/API working while Tasks become the source of truth. Components: - aboutEntityLink field on Task schema (EntityLink format with testCase FQN + incident stateId), backed by generated DB columns + index - IncidentTcrsSyncHandler: lifecycle handler mapping Task events to TCRS records (New, Ack, Assigned, Resolved) with entity relationships - TCRS guard in openOrAssignTask() to skip Task creation when a workflow-managed Task already exists for the incident - SetupImpl builds aboutEntityLink for incident task types - testCaseStatus added to WorkflowTriggerFields enum - E2E integration test verifying full pipeline

…al status check - Rename specificUsers to specificAssignees using EntityLink strings (e.g. <#E::user::alice>, <#E::team::engineers>) to support both users and teams, consistent with the existing user task pattern - CloseTaskImpl.isAlreadyClosed() now checks terminal status from the resolved template instead of checking resolution != null

…orkflow-bridge

…incident-tcrs-sync-hook

…pen-metadata/OpenMetadata into feat/ilw-item2-incident-tcrs-sync-hook

Adds configurable duplicate trigger handling for governance workflows. When the same entity triggers a workflow again while an instance is already running, the trigger's onConflict policy controls the behavior: - restart (default): terminate old instance, start new (existing behavior) - skip: keep existing instance, ignore new event - forward: deliver 'retrigger' status to the active ManualTask The retrigger status flows through the existing IntermediateCatchEvent path — zero BPMN changes needed. ManualTask exposes 'retrigger' as a routable branch only when the trigger uses onConflict=forward, allowing workflow authors to wire it via normal graph edges (self-loop for incidents, back-to-decision-tree for approvals).

github-actions · 2026-03-30T15:43:48Z

❌ Lint Check Failed — ESLint + Prettier (core-components)

The following files have style issues that need to be fixed:

Fix locally (fast — changed files only):

cd openmetadata-ui-core-components/src/main/resources/ui
yarn ui-checkstyle:changed

Or to fix all files: yarn lint:fix && yarn pretty

github-actions · 2026-03-30T15:47:42Z

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

github-actions · 2026-03-30T15:49:20Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

gitar-bot · 2026-03-30T15:49:54Z

Code Review 👍 Approved with suggestions 2 resolved / 3 findings

Introduces onConflict trigger policy with retrigger-as-branch capability, resolving head-of-line blocking in outbox drainer and failed delivery termination issues. Consider addressing the extractStringValue crash on single-quote string input in IncidentTcrsSyncHandler.

💡 Bug: extractStringValue crashes on single-quote string input

📄 openmetadata-service/src/main/java/org/openmetadata/service/events/lifecycle/handlers/IncidentTcrsSyncHandler.java:225-230

In IncidentTcrsSyncHandler.extractStringValue(), if the input string is a single " character, s.substring(1, s.length() - 1) becomes s.substring(1, 0) which throws StringIndexOutOfBoundsException. Similarly, a string like "value (starts with quote, doesn't end with one) will incorrectly strip the last character.

This is called on fc.getNewValue() from ChangeDescription field changes, which could contain unexpected formats.

Suggested fix

private String extractStringValue(Object value) {
    if (value instanceof String s) {
        if (s.length() >= 2 && s.startsWith(""") && s.endsWith(""")) {
            return s.substring(1, s.length() - 1);
        }
        return s;
    }
    return value != null ? value.toString() : null;
}

✅ 2 resolved

✅ Edge Case: onConflict=forward doesn't terminate old instance on failed delivery

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/triggers/impl/FilterEntityImpl.java:92-105 📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/triggers/impl/FilterEntityImpl.java:83-90 📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/WorkflowHandler.java:1459-1473
In FilterEntityImpl.java, when onConflict=forward and forwardRetrigger() returns false (no active ManualTask subscription found), the code falls through to start a new workflow instance without terminating the existing one. This means two concurrent instances will run for the same entity — the old one (stuck or between ManualTask cycles) and the new one.

This contradicts the documented intent of forward ("keeps the existing instance"). The restart default terminates duplicates, and skip blocks new instances, but forward on failed delivery does neither.

Impact: Duplicate workflow instances for the same entity, potentially creating duplicate tasks and conflicting state.

✅ Performance: Outbox drainer head-of-line blocking for ~50 min on persistent failures

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/outbox/TaskWorkflowOutboxDrainer.java:21-23 📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/outbox/TaskWorkflowOutboxDrainer.java:85-99
In TaskWorkflowOutboxDrainer, if the oldest pending message for a task consistently fails delivery (e.g., ManualTask subprocess is between re-entry cycles for an extended period), it blocks all newer messages for that task until it reaches MAX_ATTEMPTS=100. At POLL_INTERVAL_SECONDS=30, this means up to ~50 minutes of head-of-line blocking per task.

While this preserves ordering (which is correct), the combination of 100 attempts × 30s interval may be excessive for transient cases where the subprocess simply isn't ready yet. Consider exponential backoff or a lower max-attempts with a separate retry-later queue.

🤖 Prompt for agents

Code Review: Introduces onConflict trigger policy with retrigger-as-branch capability, resolving head-of-line blocking in outbox drainer and failed delivery termination issues. Consider addressing the extractStringValue crash on single-quote string input in IncidentTcrsSyncHandler.

1. 💡 Bug: extractStringValue crashes on single-quote string input
   Files: openmetadata-service/src/main/java/org/openmetadata/service/events/lifecycle/handlers/IncidentTcrsSyncHandler.java:225-230

   In `IncidentTcrsSyncHandler.extractStringValue()`, if the input string is a single `"` character, `s.substring(1, s.length() - 1)` becomes `s.substring(1, 0)` which throws `StringIndexOutOfBoundsException`. Similarly, a string like `"value` (starts with quote, doesn't end with one) will incorrectly strip the last character.
   
   This is called on `fc.getNewValue()` from ChangeDescription field changes, which could contain unexpected formats.

   Suggested fix:
   private String extractStringValue(Object value) {
       if (value instanceof String s) {
           if (s.length() >= 2 && s.startsWith(""") && s.endsWith(""")) {
               return s.substring(1, s.length() - 1);
           }
           return s;
       }
       return value != null ? value.toString() : null;
   }

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

gitar-bot · 2026-03-30T15:50:40Z

+  private String extractStringValue(Object value) {
+    if (value instanceof String s) {
+      return s.startsWith("\"") ? s.substring(1, s.length() - 1) : s;
+    }
+    return value != null ? value.toString() : null;
+  }


💡 Bug: extractStringValue crashes on single-quote string input

In IncidentTcrsSyncHandler.extractStringValue(), if the input string is a single " character, s.substring(1, s.length() - 1) becomes s.substring(1, 0) which throws StringIndexOutOfBoundsException. Similarly, a string like "value (starts with quote, doesn't end with one) will incorrectly strip the last character.

This is called on fc.getNewValue() from ChangeDescription field changes, which could contain unexpected formats.

Suggested fix:

private String extractStringValue(Object value) { if (value instanceof String s) { if (s.length() >= 2 && s.startsWith(""") && s.endsWith(""")) { return s.substring(1, s.length() - 1); } return s; } return value != null ? value.toString() : null; }

_{Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion}

IceS2 and others added 30 commits March 16, 2026 14:41

Implement IntermediateCatchEventBuilder

010f3ea

Implement IntermediateCatchEventBuilder

9403129

Update generated TypeScript types

77d0d61

Merge branch 'feat/incident-lifecycle-workflow' into feat/ilw-pr2-man…

bb29b37

…ual-task-node

Address PR review: configurable assignees, null-guard, required fields

0ab66d0

Update generated TypeScript types

1a5c2b8

fix: safe boolean cast and dedup assignees in resolveAssignees

a5020bc

feat(outbox): add task_workflow_outbox table migrations

aa52ef1

feat(outbox): add OutboxEntry POJO and TaskWorkflowOutboxDAO

5e78598

feat(outbox): add TaskWorkflowOutboxDrainer with unit tests

f431868

test(outbox): add consumer routing filter tests

7d39894

Add 4 reflection-based unit tests for isWorkflowManagedTaskStatusChange covering early-return conditions: non-update events, non-task entity types, missing changeDescription, and non-status field changes.

test(outbox): increase ManualTaskWorkflowTest timeouts for polling la…

4cb3ed6

…tency The E2E test must tolerate up to 10s CE poll + 30s drainer poll plus margin. Raise all Awaitility atMost() values to 90 seconds.

fix(outbox): wrap enqueueTaskMessage in try-catch to prevent signal b…

521d2f4

…roadcast disruption A DB failure during outbox INSERT should not prevent the signal broadcast path from executing. Log the error and continue.

fix(outbox): rename index prefix from idx_two_ to idx_outbox_ for cla…

35205ef

…rity

fix(outbox): add batch limit and prioritized ordering to drain query

926787c

Add LIMIT 500 with ORDER BY attempts ASC, createdAt ASC to prevent unbounded result sets and prioritize fresh messages over stuck ones. Separate cleanup into its own try-catch for cleaner error diagnostics.

fix(outbox): handle raw string FieldChange.newValue in enqueueTaskMes…

f73830b

…sage

style: spotless formatting on IncidentTaskIntegrationIT

56ec631

fix(outbox): wrap enqueue retry exhaustion in EventPublisherException

b5512e0

IceS2 and others added 9 commits March 23, 2026 08:29

fix(outbox): null createdBy fallback, exhausted cleanup, idempotent e…

b001e36

…nqueue

fix(outbox): use INSERT IGNORE / ON CONFLICT DO NOTHING for idempoten…

f7a40dc

…t enqueue

Update generated TypeScript types

46bdff2

Merge branch 'feat/ilw-pr2-manual-task-node' into feat/ilw-pr3-task-w…

5380b1b

…orkflow-bridge

Merge branch 'feat/ilw-pr3-task-workflow-bridge' into feat/ilw-item2-…

ea7a316

…incident-tcrs-sync-hook

Merge branch 'feat/ilw-item2-incident-tcrs-sync-hook' of github.com:o…

0bc1464

…pen-metadata/OpenMetadata into feat/ilw-item2-incident-tcrs-sync-hook

IceS2 requested a review from a team as a code owner March 30, 2026 15:41

github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Mar 30, 2026

Update generated TypeScript types

31297ae

gitar-bot bot reviewed Mar 30, 2026

View reviewed changes

Comment thread ...a/org/openmetadata/service/governance/workflows/elements/triggers/impl/FilterEntityImpl.java

gitar-bot bot reviewed Mar 30, 2026

View reviewed changes

Comment thread ...ain/java/org/openmetadata/service/governance/workflows/outbox/TaskWorkflowOutboxDrainer.java

IceS2 closed this Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workflows): add onConflict trigger policy with retrigger-as-branch#26865

feat(workflows): add onConflict trigger policy with retrigger-as-branch#26865
IceS2 wants to merge 40 commits intofeat/incident-lifecycle-workflowfrom
feat/ilw-retrigger-as-branch

IceS2 commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

gitar-bot bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

gitar-bot bot Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

IceS2 commented Mar 30, 2026

Summary

Why

Changes

Test plan

Uh oh!

github-actions bot commented Mar 30, 2026

❌ Lint Check Failed — ESLint + Prettier (core-components)

Uh oh!

github-actions bot commented Mar 30, 2026

✅ TypeScript Types Auto-Updated

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

gitar-bot bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gitar-bot bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gitar-bot bot commented Mar 30, 2026 •

edited

Loading