Fix test suite hang by reordering shutdown: stop event publishers before AsyncService and destroy Flowable after app shutdown by yan-3005 · Pull Request #26160 · open-metadata/OpenMetadata

yan-3005 · 2026-02-28T06:25:52Z

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

Shutdown ordering fix for test hang:
- Moved Flowable ProcessEngine destruction after Dropwizard app shutdown in TestSuiteBootstrap.cleanup()
- Reordered event publishers/schedulers before AsyncService shutdown in OpenMetadataApplication.stop()

_{This will update automatically on new commits.}

…owhandler reordering

gitar-bot · 2026-02-28T06:28:13Z

🔍 CI failure analysis for 6fec1ca: Multiple test failures (47 unit tests + 2 E2E tests) appear unrelated to PR's shutdown ordering changes, as failures occur during test execution, not during cleanup.

Issue

Multiple test failures across different test suites:

Playwright E2E Tests (postgresql shard 1, 6) - 2 tests failed/flaky:
- SearchIndexApplication test - Timed out waiting for API responses and UI elements
- Metric test (flaky) - "Target page, context or browser has been closed" errors
Maven Unit/Integration Tests - 47 tests failed across multiple areas:
- DataProductResourceTest (6 failures) - Domain migration and data product workflow issues
- EventSubscriptionResourceTest (16 failures) - NullPointerExceptions in Slack/MSTeams/Webhook callbacks
- NotificationResourceTest (7 failures) - Connection/timeout issues with notification endpoints
- APIServiceResourceTest (8 failures) - Type mismatch in OpenAPI schema deserialization
- IngestionPipelineLogStorageTest (5 failures) - S3 multipart upload and streaming issues
- JsonUtilsTest (1 failure) - Unexpected "entityStatus" field in serialization
- Others (4 failures) - Miscellaneous issues

Root Cause

The test failures appear UNRELATED to the PR changes. Here's why:

PR Changes:

Modified Java shutdown ordering in TestSuiteBootstrap.cleanup() - moved Flowable ProcessEngine destruction to occur after Dropwizard app shutdown
Reordered event publishers/schedulers in OpenMetadataApplication.stop() - now shuts down EventPubSub and EventSubscriptionScheduler before AsyncService

Evidence these failures are unrelated:

Timing: All failures occur during test execution, NOT during cleanup/shutdown phases
Nature of failures:
- EventSubscriptionResourceTest NPEs suggest callback infrastructure issues, not shutdown ordering
- APIServiceResourceTest type mismatches indicate schema/deserialization problems
- IngestionPipelineLogStorageTest S3 failures suggest infrastructure/configuration issues
- Playwright timeout waiting for API responses is a runtime issue, not cleanup
Scope: 47 diverse test failures across completely different subsystems (events, notifications, API services, S3 storage, UI tests)
Shutdown code is only executed AFTER tests complete, not during test execution

Details

Failure categories breakdown

Event Subscription Failures (16 tests):

Cannot invoke "...EventDetails.getEvents()" because "details" is null

Pattern: Callback resources (Slack, MSTeams, Webhook) returning null EventDetails. This suggests the event subscription callback infrastructure has pre-existing issues.

API Service Failures (8 tests):

expected: <org.openmetadata.schema.services.connections.api.OpenAPISchemaURL@...> 
but was: <{openAPISchemaURL=...}>

Pattern: Expected a proper object but got a Map. This is a deserialization issue in the API service connection schema handling.

Notification Test Failures (7 tests):

Connection refused (Connection refused)
Timeout waiting for connection from pool

Pattern: Tests trying to connect to external notification services that aren't available in the test environment.

Data Product Failures (6 tests):

expected: <success> but was: <failure>
expected: <1> but was: <0>
Output port should be in target domain after migration

Pattern: Data product domain migration logic issues.

Ingestion Pipeline Log Storage (5 tests):

Failed to complete multipart upload
S3 object should exist at: test-logs/...
The specified key does not exist

Pattern: S3 storage issues - likely LocalStack or test infrastructure problems.

Playwright E2E (2 tests):

SearchIndexApplication: 180s timeout waiting for API response /api/v1/apps/name/SearchIndexingApplication/status
Metric: Page closed unexpectedly (flaky test indicator)

Why shutdown ordering changes cannot cause these failures

The PR changes only affect cleanup operations that happen after all tests finish:

TestSuiteBootstrap.cleanup() is called in @AfterAll - runs AFTER the entire test suite completes
OpenMetadataApplication.stop() is called when the Dropwizard app shuts down - part of cleanup, not runtime

The failures occur during:

Test execution (EventSubscription callbacks returning null)
API calls during tests (deserialization issues)
Test infrastructure setup (connection refused)
Test assertions (domain migration verification)

These are all runtime behaviors that execute before cleanup code runs.

The reordering in OpenMetadataApplication.stop():

// NEW ORDER:
EventPubSub.shutdown();
EventSubscriptionScheduler.shutDown();
AsyncService.getInstance().shutdown();
AppScheduler.shutDown();

This only matters when the application is stopping, not when it's serving requests during tests.

Code Review ✅ Approved

Clean shutdown reordering fix with no logic changes. All shutdown methods are independent with no cross-dependencies, making the reorder safe. The new sequence correctly follows "stop producers before consumers" for the app and "stop app before engine" for tests.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-02-28T07:32:31Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-02-28T18:36:10Z

Changes have been cherry-picked to the 1.12.2 branch.

…owhandler reordering (#26160) (cherry picked from commit ecb6650)

Ordering of async service and app shutdown, and test bootstrap workfl…

6fec1ca

…owhandler reordering

yan-3005 self-assigned this Feb 28, 2026

yan-3005 added safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch backend labels Feb 28, 2026

yan-3005 had a problem deploying to test February 28, 2026 06:25 — with GitHub Actions Error

yan-3005 temporarily deployed to test February 28, 2026 06:25 — with GitHub Actions Inactive

yan-3005 had a problem deploying to test February 28, 2026 06:25 — with GitHub Actions Failure

yan-3005 temporarily deployed to test February 28, 2026 06:25 — with GitHub Actions Inactive

harshach approved these changes Feb 28, 2026

View reviewed changes

harshach merged commit ecb6650 into main Feb 28, 2026
55 of 80 checks passed

harshach deleted the ram/fix-aync-service-shutdown branch February 28, 2026 18:34

github-actions Bot pushed a commit that referenced this pull request Feb 28, 2026

Ordering of async service and app shutdown, and test bootstrap workfl…

e18aafb

…owhandler reordering (#26160) (cherry picked from commit ecb6650)

gitar-bot Bot mentioned this pull request Mar 1, 2026

Fix Metrics collection; reduce no.of metrics; improve slow request lo… #25751

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test suite hang by reordering shutdown: stop event publishers before AsyncService and destroy Flowable after app shutdown#26160

Fix test suite hang by reordering shutdown: stop event publishers before AsyncService and destroy Flowable after app shutdown#26160
harshach merged 1 commit intomainfrom
ram/fix-aync-service-shutdown

yan-3005 commented Feb 28, 2026 •

edited by gitar-bot Bot

Loading

Uh oh!

gitar-bot Bot commented Feb 28, 2026 •

edited

Loading

Issue

Root Cause

Details

Uh oh!

sonarqubecloud Bot commented Feb 28, 2026

Uh oh!

Uh oh!

github-actions Bot commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yan-3005 commented Feb 28, 2026 • edited by gitar-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

Checklist:

Summary by Gitar

Uh oh!

gitar-bot Bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Root Cause

Details

Uh oh!

sonarqubecloud Bot commented Feb 28, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

Uh oh!

github-actions Bot commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yan-3005 commented Feb 28, 2026 •

edited by gitar-bot Bot

Loading

gitar-bot Bot commented Feb 28, 2026 •

edited

Loading