Skip to content

Fix test suite hang by reordering shutdown: stop event publishers before AsyncService and destroy Flowable after app shutdown#26160

Merged
harshach merged 1 commit intomainfrom
ram/fix-aync-service-shutdown
Feb 28, 2026
Merged

Fix test suite hang by reordering shutdown: stop event publishers before AsyncService and destroy Flowable after app shutdown#26160
harshach merged 1 commit intomainfrom
ram/fix-aync-service-shutdown

Conversation

@yan-3005
Copy link
Copy Markdown
Contributor

@yan-3005 yan-3005 commented Feb 28, 2026

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • Shutdown ordering fix for test hang:
    • Moved Flowable ProcessEngine destruction after Dropwizard app shutdown in TestSuiteBootstrap.cleanup()
    • Reordered event publishers/schedulers before AsyncService shutdown in OpenMetadataApplication.stop()

This will update automatically on new commits.

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Feb 28, 2026

🔍 CI failure analysis for 6fec1ca: Multiple test failures (47 unit tests + 2 E2E tests) appear unrelated to PR's shutdown ordering changes, as failures occur during test execution, not during cleanup.

Issue

Multiple test failures across different test suites:

  1. Playwright E2E Tests (postgresql shard 1, 6) - 2 tests failed/flaky:

    • SearchIndexApplication test - Timed out waiting for API responses and UI elements
    • Metric test (flaky) - "Target page, context or browser has been closed" errors
  2. Maven Unit/Integration Tests - 47 tests failed across multiple areas:

    • DataProductResourceTest (6 failures) - Domain migration and data product workflow issues
    • EventSubscriptionResourceTest (16 failures) - NullPointerExceptions in Slack/MSTeams/Webhook callbacks
    • NotificationResourceTest (7 failures) - Connection/timeout issues with notification endpoints
    • APIServiceResourceTest (8 failures) - Type mismatch in OpenAPI schema deserialization
    • IngestionPipelineLogStorageTest (5 failures) - S3 multipart upload and streaming issues
    • JsonUtilsTest (1 failure) - Unexpected "entityStatus" field in serialization
    • Others (4 failures) - Miscellaneous issues

Root Cause

The test failures appear UNRELATED to the PR changes. Here's why:

PR Changes:

  • Modified Java shutdown ordering in TestSuiteBootstrap.cleanup() - moved Flowable ProcessEngine destruction to occur after Dropwizard app shutdown
  • Reordered event publishers/schedulers in OpenMetadataApplication.stop() - now shuts down EventPubSub and EventSubscriptionScheduler before AsyncService

Evidence these failures are unrelated:

  1. Timing: All failures occur during test execution, NOT during cleanup/shutdown phases
  2. Nature of failures:
    • EventSubscriptionResourceTest NPEs suggest callback infrastructure issues, not shutdown ordering
    • APIServiceResourceTest type mismatches indicate schema/deserialization problems
    • IngestionPipelineLogStorageTest S3 failures suggest infrastructure/configuration issues
    • Playwright timeout waiting for API responses is a runtime issue, not cleanup
  3. Scope: 47 diverse test failures across completely different subsystems (events, notifications, API services, S3 storage, UI tests)
  4. Shutdown code is only executed AFTER tests complete, not during test execution

Details

Failure categories breakdown

Event Subscription Failures (16 tests):

Cannot invoke "...EventDetails.getEvents()" because "details" is null

Pattern: Callback resources (Slack, MSTeams, Webhook) returning null EventDetails. This suggests the event subscription callback infrastructure has pre-existing issues.

API Service Failures (8 tests):

expected: <org.openmetadata.schema.services.connections.api.OpenAPISchemaURL@...> 
but was: <{openAPISchemaURL=...}>

Pattern: Expected a proper object but got a Map. This is a deserialization issue in the API service connection schema handling.

Notification Test Failures (7 tests):

Connection refused (Connection refused)
Timeout waiting for connection from pool

Pattern: Tests trying to connect to external notification services that aren't available in the test environment.

Data Product Failures (6 tests):

expected: <success> but was: <failure>
expected: <1> but was: <0>
Output port should be in target domain after migration

Pattern: Data product domain migration logic issues.

Ingestion Pipeline Log Storage (5 tests):

Failed to complete multipart upload
S3 object should exist at: test-logs/...
The specified key does not exist

Pattern: S3 storage issues - likely LocalStack or test infrastructure problems.

Playwright E2E (2 tests):

  • SearchIndexApplication: 180s timeout waiting for API response /api/v1/apps/name/SearchIndexingApplication/status
  • Metric: Page closed unexpectedly (flaky test indicator)
Why shutdown ordering changes cannot cause these failures

The PR changes only affect cleanup operations that happen after all tests finish:

  1. TestSuiteBootstrap.cleanup() is called in @AfterAll - runs AFTER the entire test suite completes
  2. OpenMetadataApplication.stop() is called when the Dropwizard app shuts down - part of cleanup, not runtime

The failures occur during:

  • Test execution (EventSubscription callbacks returning null)
  • API calls during tests (deserialization issues)
  • Test infrastructure setup (connection refused)
  • Test assertions (domain migration verification)

These are all runtime behaviors that execute before cleanup code runs.

The reordering in OpenMetadataApplication.stop():

// NEW ORDER:
EventPubSub.shutdown();
EventSubscriptionScheduler.shutDown();
AsyncService.getInstance().shutdown();
AppScheduler.shutDown();

This only matters when the application is stopping, not when it's serving requests during tests.

Code Review ✅ Approved

Clean shutdown reordering fix with no logic changes. All shutdown methods are independent with no cross-dependencies, making the reorder safe. The new sequence correctly follows "stop producers before consumers" for the app and "stop app before engine" for tests.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

@harshach harshach merged commit ecb6650 into main Feb 28, 2026
55 of 80 checks passed
@harshach harshach deleted the ram/fix-aync-service-shutdown branch February 28, 2026 18:34
@github-actions
Copy link
Copy Markdown
Contributor

Changes have been cherry-picked to the 1.12.2 branch.

github-actions Bot pushed a commit that referenced this pull request Feb 28, 2026
…owhandler reordering (#26160)

(cherry picked from commit ecb6650)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants