Skip to content

ISSUE #3058 - Bulk Endpoint For Bundle TestSuite Selection#26490

Open
TeddyCr wants to merge 10 commits intoopen-metadata:mainfrom
TeddyCr:ISSUE-3058
Open

ISSUE #3058 - Bulk Endpoint For Bundle TestSuite Selection#26490
TeddyCr wants to merge 10 commits intoopen-metadata:mainfrom
TeddyCr:ISSUE-3058

Conversation

@TeddyCr
Copy link
Collaborator

@TeddyCr TeddyCr commented Mar 13, 2026

Describe your changes:

Fixes #3058

  • Add new bulk endpoint PUT /v1/dataQuality/testCases/logicalTestCases/bulk that supports adding test cases to a logical test suite by explicit IDs (mode: "ids") or selecting all test cases with optional filter (right now supports id exclusions (mode: "all"))
  • Deprecate the existing PUT /logicalTestCases endpoint with Deprecation and Sunset response headers (removal in 2.0)
  • Replace N+1 individual postUpdate calls with batch processing: bulk Entity.getEntities() fetch, postUpdateMany(), and bulk search index updates via SearchIndexHandler.onEntitiesUpdated()
  • Add onEntitiesUpdated to the lifecycle event system (EntityLifecycleEventHandler, EntityLifecycleEventDispatcher) with per-entity fallback for backward compatibility

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Improvement

  • I have added tests around the new logic.
  • For connector/ingestion changes: I updated the documentation.

@github-actions github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Mar 13, 2026
@github-actions
Copy link
Contributor

⚠️ TypeScript Types Need Update

The generated TypeScript types are out of sync with the JSON schema changes.

Since this is a pull request from a forked repository, the types cannot be automatically committed.
Please generate and commit the types manually:

cd openmetadata-ui/src/main/resources/ui
./json2ts-generate-all.sh -l true
git add src/generated/
git commit -m "Update generated TypeScript types"
git push

After pushing the changes, this check will pass automatically.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new bulk endpoint PUT /v1/dataQuality/testCases/logicalTestCases/bulk for adding test cases to a logical test suite, supporting two modes: explicit IDs (ids) and select-all with optional exclusions (all). It also deprecates the existing single endpoint, replaces N+1 update patterns with batch processing, and extends the lifecycle event system with onEntitiesUpdated.

Changes:

  • New JSON schemas and bulk endpoint for adding test cases to logical test suites by IDs or select-all with exclusions
  • Batch processing via postUpdateMany(), onEntitiesUpdated() lifecycle events, and bulk search index updates with per-entity fallback
  • Deprecation headers on the existing /logicalTestCases endpoint and comprehensive integration tests

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
bundleSuiteBulkAddRequest.json Root schema for bulk add request with mode and selection
bundleSuiteBulkAddRequestBulkByIds.json Schema for ID-based selection
bundleSuiteBulkAddRequestBulkAll.json Schema for select-all with exclusion filter
TestCaseResource.java New bulk endpoint, deprecation headers on old endpoint, extracted validation helper
TestCaseRepository.java Batch relationship insert, diff-based update logic, addAllTestCasesToLogicalTestSuite
EntityRepository.java New postUpdateMany() for batch cache + lifecycle + RDF updates
CollectionDAO.java New SQL methods for bulk-inserting all/filtered relationships
EntityLifecycleEventHandler.java Default onEntitiesUpdated with per-entity fallback
EntityLifecycleEventDispatcher.java Dispatch bulk updated events to handlers
SearchIndexHandler.java Bulk search index update with individual fallback
SearchRepository.java Renamed updateEntitiesBulkupdateEntitiesIndex, kept old as delegate
EntityLifecycleEventDispatcherTest.java Tests for bulk dispatch with per-entity change descriptions
SearchIndexHandlerTest.java Tests for bulk and fallback search index updates
TestCaseResourceIT.java Integration tests for both modes, edge cases, idempotency, deprecated endpoint

You can also share your feedback on Copilot code review. Take the survey.

@TeddyCr TeddyCr requested a review from a team as a code owner March 13, 2026 20:50
@gitar-bot
Copy link

gitar-bot bot commented Mar 13, 2026

Code Review ⚠️ Changes requested 0 resolved / 5 findings

Bulk endpoint for bundle TestSuite selection loads all test cases into memory unbounded in "select all" mode, and deadlock retry logic retries immediately with zero delay and only 2 attempts, risking re-deadlock in production. Additionally, mode enum is required in schema but still null-checked, schema descriptions incorrectly reference "test suites" instead of "test cases", and new JDBI query methods use raw string interpolation into SQL.

⚠️ Performance: "Select all" mode loads all test cases into memory unbounded

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:930 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:954 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:964 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:2128 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:909 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:932

The addAllTestCasesToLogicalTestSuite method in TestCaseRepository.java has no pagination or batching. After bulk-inserting relationships for ALL test cases, it:

  1. Calls findTo() which returns ALL relationship records (no LIMIT)
  2. Calls Entity.getEntities(refs, "*", Include.ALL) loading every newly-added test case with all fields into a single in-memory list
  3. Passes all entities to postUpdateMany() which writes them all to cache and dispatches lifecycle events

In a system with thousands or tens of thousands of test cases, this will cause excessive memory usage and potentially OOM errors. The same pattern applies to SearchRepository.updateEntitiesIndex() which receives the full list.

Consider processing in batches (e.g., 500 at a time) using Lists.partition() or similar, and add a LIMIT/pagination to the findTo() call or use a streaming approach.

Suggested fix
// In addAllTestCasesToLogicalTestSuite, batch the post-update processing:
int BATCH_SIZE = 500;
List<List<EntityReference>> batches = Lists.partition(testCaseReferences, BATCH_SIZE);
for (List<EntityReference> batch : batches) {
    List<TestCase> updatedBatch = getLogicalSuiteUpdatedTestCase(batch);
    postUpdateMany(updatedBatch);
}
⚠️ Performance: Deadlock retry has no backoff delay, likely to re-deadlock

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:5451 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:5432

The executeWithDeadlockRetry method retries immediately after a deadlock exception with zero delay and only 2 total attempts. When two concurrent threads deadlock on tag_usage rows, an immediate retry means they will likely acquire locks in the same conflicting order and deadlock again. The sorting by targetFQNHash/tagFQNHash/source helps reduce deadlocks, but when they do occur, the retry is nearly useless without a small random backoff.

This affects applyTagsBatch, applyTagsBatchMultiTarget, and deleteTagsBatch — all called during bulk operations.

Suggested fix
default void executeWithDeadlockRetry(Runnable operation) {
  for (int attempt = 1; attempt <= TAG_USAGE_MAX_ATTEMPTS; attempt++) {
    try {
      operation.run();
      return;
    } catch (RuntimeException ex) {
      if (!isTransientDeadlock(ex) || attempt == TAG_USAGE_MAX_ATTEMPTS) {
        throw ex;
      }
      try {
        Thread.sleep(ThreadLocalRandom.current().nextLong(10, 50 * attempt));
      } catch (InterruptedException ie) {
        Thread.currentThread().interrupt();
        throw ex;
      }
    }
  }
}
💡 Edge Case: Mode enum is required in schema but still null-checked

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/dqtests/TestCaseResource.java:1216 📄 openmetadata-spec/src/main/resources/json/schema/api/tests/bundleSuiteBulkAddRequest.json:29

In bundleSuiteBulkAddRequest.json, mode is listed in the required array, meaning JSON schema validation should reject requests without it. However, TestCaseResource.java:1216-1219 checks nullOrEmpty(mode) and throws an IllegalArgumentException. This is defensive but the error message says modes are [ids, all] while the enum already enforces this. More importantly, if mode is required and validated by the framework, this check is dead code. If it's NOT validated by the framework, then requests with mode=null would get a generic error rather than reaching this check (since @Valid + required should reject it first).

💡 Quality: Schema descriptions say "test suites" but feature adds test cases

📄 openmetadata-spec/src/main/resources/json/schema/api/tests/bundleSuiteBulkAddRequest.json:5 📄 openmetadata-spec/src/main/resources/json/schema/api/tests/bundleSuiteBulkAddRequestBulkAll.json:5 📄 openmetadata-spec/src/main/resources/json/schema/api/tests/bundleSuiteBulkAddRequestBulkByIds.json:5

The JSON schema descriptions in all three new schema files consistently refer to "test suites" (e.g., "Add all test suites", "List of test suite IDs to add", "selecting all suites with optional exclusions") when the feature actually adds test cases to a logical test suite. This is misleading for API consumers reading the OpenAPI documentation.

💡 Security: @define("cond") methods should be package-private or documented

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1827 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:2005 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:2071 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:2102

Four new JDBI query methods (findToBatchWithRelationsAndCondition, findFromBatchWithRelationsAndCondition, findToRelationshipsForEntityWithCondition, findFromRelationshipsForEntityWithCondition) use @Define("cond") which performs raw string interpolation into SQL. Currently, all callers go through safe wrapper methods that only pass hardcoded strings derived from the Include enum. However, these ...WithCondition methods are public interface methods — any future caller could inadvertently pass unsanitized input, creating a SQL injection vector. Consider adding a Javadoc warning or restricting access.

Suggested fix
Add Javadoc to each ...WithCondition method:
/**
 * Internal — callers MUST use the typed-Include overload.
 * The {@code cond} parameter is interpolated directly into SQL;
 * passing user-controlled values causes SQL injection.
 */
🤖 Prompt for agents
Code Review: Bulk endpoint for bundle TestSuite selection loads all test cases into memory unbounded in "select all" mode, and deadlock retry logic retries immediately with zero delay and only 2 attempts, risking re-deadlock in production. Additionally, mode enum is required in schema but still null-checked, schema descriptions incorrectly reference "test suites" instead of "test cases", and new JDBI query methods use raw string interpolation into SQL.

1. ⚠️ Performance: "Select all" mode loads all test cases into memory unbounded
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:930, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:954, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:964, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:2128, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:909, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:932

   The `addAllTestCasesToLogicalTestSuite` method in `TestCaseRepository.java` has no pagination or batching. After bulk-inserting relationships for ALL test cases, it:
   1. Calls `findTo()` which returns ALL relationship records (no LIMIT)
   2. Calls `Entity.getEntities(refs, "*", Include.ALL)` loading every newly-added test case with all fields into a single in-memory list
   3. Passes all entities to `postUpdateMany()` which writes them all to cache and dispatches lifecycle events
   
   In a system with thousands or tens of thousands of test cases, this will cause excessive memory usage and potentially OOM errors. The same pattern applies to `SearchRepository.updateEntitiesIndex()` which receives the full list.
   
   Consider processing in batches (e.g., 500 at a time) using `Lists.partition()` or similar, and add a LIMIT/pagination to the `findTo()` call or use a streaming approach.

   Suggested fix:
   // In addAllTestCasesToLogicalTestSuite, batch the post-update processing:
   int BATCH_SIZE = 500;
   List<List<EntityReference>> batches = Lists.partition(testCaseReferences, BATCH_SIZE);
   for (List<EntityReference> batch : batches) {
       List<TestCase> updatedBatch = getLogicalSuiteUpdatedTestCase(batch);
       postUpdateMany(updatedBatch);
   }

2. 💡 Edge Case: Mode enum is required in schema but still null-checked
   Files: openmetadata-service/src/main/java/org/openmetadata/service/resources/dqtests/TestCaseResource.java:1216, openmetadata-spec/src/main/resources/json/schema/api/tests/bundleSuiteBulkAddRequest.json:29

   In `bundleSuiteBulkAddRequest.json`, `mode` is listed in the `required` array, meaning JSON schema validation should reject requests without it. However, `TestCaseResource.java:1216-1219` checks `nullOrEmpty(mode)` and throws an `IllegalArgumentException`. This is defensive but the error message says modes are `[ids, all]` while the enum already enforces this. More importantly, if `mode` is required and validated by the framework, this check is dead code. If it's NOT validated by the framework, then requests with `mode=null` would get a generic error rather than reaching this check (since `@Valid` + required should reject it first).

3. 💡 Quality: Schema descriptions say "test suites" but feature adds test cases
   Files: openmetadata-spec/src/main/resources/json/schema/api/tests/bundleSuiteBulkAddRequest.json:5, openmetadata-spec/src/main/resources/json/schema/api/tests/bundleSuiteBulkAddRequestBulkAll.json:5, openmetadata-spec/src/main/resources/json/schema/api/tests/bundleSuiteBulkAddRequestBulkByIds.json:5

   The JSON schema descriptions in all three new schema files consistently refer to "test suites" (e.g., "Add all test suites", "List of test suite IDs to add", "selecting all suites with optional exclusions") when the feature actually adds **test cases** to a logical test suite. This is misleading for API consumers reading the OpenAPI documentation.

4. ⚠️ Performance: Deadlock retry has no backoff delay, likely to re-deadlock
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:5451, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:5432

   The `executeWithDeadlockRetry` method retries immediately after a deadlock exception with zero delay and only 2 total attempts. When two concurrent threads deadlock on tag_usage rows, an immediate retry means they will likely acquire locks in the same conflicting order and deadlock again. The sorting by `targetFQNHash/tagFQNHash/source` helps reduce deadlocks, but when they do occur, the retry is nearly useless without a small random backoff.
   
   This affects `applyTagsBatch`, `applyTagsBatchMultiTarget`, and `deleteTagsBatch` — all called during bulk operations.

   Suggested fix:
   default void executeWithDeadlockRetry(Runnable operation) {
     for (int attempt = 1; attempt <= TAG_USAGE_MAX_ATTEMPTS; attempt++) {
       try {
         operation.run();
         return;
       } catch (RuntimeException ex) {
         if (!isTransientDeadlock(ex) || attempt == TAG_USAGE_MAX_ATTEMPTS) {
           throw ex;
         }
         try {
           Thread.sleep(ThreadLocalRandom.current().nextLong(10, 50 * attempt));
         } catch (InterruptedException ie) {
           Thread.currentThread().interrupt();
           throw ex;
         }
       }
     }
   }

5. 💡 Security: @Define("cond") methods should be package-private or documented
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1827, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:2005, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:2071, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:2102

   Four new JDBI query methods (`findToBatchWithRelationsAndCondition`, `findFromBatchWithRelationsAndCondition`, `findToRelationshipsForEntityWithCondition`, `findFromRelationshipsForEntityWithCondition`) use `@Define("cond")` which performs raw string interpolation into SQL. Currently, all callers go through safe wrapper methods that only pass hardcoded strings derived from the `Include` enum. However, these `...WithCondition` methods are public interface methods — any future caller could inadvertently pass unsanitized input, creating a SQL injection vector. Consider adding a Javadoc warning or restricting access.

   Suggested fix:
   Add Javadoc to each ...WithCondition method:
   /**
    * Internal — callers MUST use the typed-Include overload.
    * The {@code cond} parameter is interpolated directly into SQL;
    * passing user-controlled values causes SQL injection.
    */

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@github-actions
Copy link
Contributor

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 66%
66.04% (57478/87031) 45.65% (30408/66606) 48.62% (9113/18742)

@sonarqubecloud
Copy link

@sonarqubecloud
Copy link

@github-actions
Copy link
Contributor

🟡 Playwright Results — all passed (15 flaky)

✅ 3329 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 183 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 454 0 1 2
✅ Shard 2 305 0 0 1
🟡 Shard 3 651 0 7 33
🟡 Shard 4 725 0 5 47
✅ Shard 5 591 0 0 67
🟡 Shard 6 603 0 2 33
🟡 15 flaky test(s) (passed on retry)
  • Features/CustomizeDetailPage.spec.ts › API Endpoint - customization should work (shard 1, 1 retry)
  • Features/BulkImport.spec.ts › Database (shard 3, 1 retry)
  • Features/DataQuality/TestCaseImportExportE2eFlow.spec.ts › EditAll User: Complete export-import-validate flow (shard 3, 1 retry)
  • Features/DataQuality/TestCaseIncidentPermissions.spec.ts › User with TEST_CASE.EDIT_ALL can see edit icon on incidents (shard 3, 1 retry)
  • Features/DataQuality/TestCaseResultPermissions.spec.ts › User with TEST_CASE.EDIT_ALL can see edit action on test case (shard 3, 1 retry)
  • Features/EntitySummaryPanel.spec.ts › should display summary panel for table (shard 3, 1 retry)
  • Features/Glossary/GlossaryP3Tests.spec.ts › should handle multiple rapid API calls (shard 3, 1 retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Features/UserProfileOnlineStatus.spec.ts › Should not show online status for inactive users (shard 4, 1 retry)
  • Flow/ObservabilityAlerts.spec.ts › Ingestion Pipeline alert (shard 4, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Domain announcement create, edit & delete (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Rename domain with assets (tables, topics, dashboards) preserves associations (shard 4, 1 retry)
  • Pages/Login.spec.ts › Refresh should work (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sort users within teams page

2 participants