Skip to content

fix: cherry-pick 413 iterative bisection and TaggedOperation refactor into 1.12.5#27127

Closed
manerow wants to merge 1 commit into1.12.5from
fix/413-bisection-1.12.5
Closed

fix: cherry-pick 413 iterative bisection and TaggedOperation refactor into 1.12.5#27127
manerow wants to merge 1 commit into1.12.5from
fix/413-bisection-1.12.5

Conversation

@manerow
Copy link
Copy Markdown
Contributor

@manerow manerow commented Apr 7, 2026

Cherry-pick of #26347 (da400385d4) from main into 1.12.5.

Changes

  • Adds iterative bisection on 413 (Request Entity Too Large) responses in ElasticSearchIndexSink and OpenSearchIndexSink — instead of failing the whole batch, the sink splits it in half recursively until the oversized document is isolated and skipped with a clear error
  • Introduces TaggedOperation wrapper so entity references are tracked through the bulk pipeline, ensuring errors report the actual entity instead of a positional index
  • Conflict resolution: kept all 1.12.5-specific test dependencies in openmetadata-service/pom.xml and dropped junit-jupiter-params which was removed by the original commit

Motivation

The Data Insights Application on customer environments was failing with 413 errors when bulk indexing large entity batches. This fix was already merged to main but was missing from the 1.12.5 release branch.

Fixes #26344

…ctor in bulk index sinks (#26347)

* fix: add iterative bisection on 413 and TaggedOperation refactor in bulk index sinks

* fix: align JUnit 5 versions via BOM to fix local test discovery

* fix: pass ReportDataType via constructor to CreateReportDataProcessor
@github-actions github-actions bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 7, 2026
Comment on lines 289 to +292
List<?> bulkOps =
(List<?>) entityProcessor.process(enriched, contextData);
operationsQueue.addAll(bulkOps);
EntityReference ref = entity.getEntityReference();
bulkOps.forEach(op -> opsQueue.add(new TaggedOperation<>(op, ref)));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Raw-type TaggedOperation loses compile-time type safety

In DataAssetsWorkflow.processSource, the processor returns List<?> and each element is wrapped as TaggedOperation<>(op, ref) where op is Object. The sink field is declared as raw Sink searchIndexSink, so write(batch) compiles without type checking. If the processor ever returns a non-BulkOperation object, the error would only surface at runtime deep inside the sink.

This is a pre-existing raw-type pattern on the Sink field, but the new code adds another layer of type erasure through TaggedOperation<?>. Consider narrowing the types to catch mismatches at compile time.

Suggested fix:

// In DataAssetsWorkflow, type the sink and processor fields properly:
private Sink<List<TaggedOperation<BulkOperation>>, BulkResponse> searchIndexSink;
private Processor<List<BulkOperation>, EntityInterface> entityProcessor;

// Then the forEach becomes type-safe:
List<BulkOperation> bulkOps = entityProcessor.process(enriched, contextData);
bulkOps.forEach(op -> opsQueue.add(new TaggedOperation<>(op, ref)));

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 7, 2026

Code Review 👍 Approved with suggestions 0 resolved / 1 findings

Cherry-picks iterative bisection and TaggedOperation refactor into 1.12.5 to address issue #26344. Consider adding type parameters to TaggedOperation usage in DataAssetsWorkflow to maintain compile-time type safety.

💡 Quality: Raw-type TaggedOperation loses compile-time type safety

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/insights/workflows/dataAssets/DataAssetsWorkflow.java:289-292 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/insights/workflows/dataAssets/DataAssetsWorkflow.java:347-356

In DataAssetsWorkflow.processSource, the processor returns List<?> and each element is wrapped as TaggedOperation<>(op, ref) where op is Object. The sink field is declared as raw Sink searchIndexSink, so write(batch) compiles without type checking. If the processor ever returns a non-BulkOperation object, the error would only surface at runtime deep inside the sink.

This is a pre-existing raw-type pattern on the Sink field, but the new code adds another layer of type erasure through TaggedOperation<?>. Consider narrowing the types to catch mismatches at compile time.

Suggested fix
// In DataAssetsWorkflow, type the sink and processor fields properly:
private Sink<List<TaggedOperation<BulkOperation>>, BulkResponse> searchIndexSink;
private Processor<List<BulkOperation>, EntityInterface> entityProcessor;

// Then the forEach becomes type-safe:
List<BulkOperation> bulkOps = entityProcessor.process(enriched, contextData);
bulkOps.forEach(op -> opsQueue.add(new TaggedOperation<>(op, ref)));
🤖 Prompt for agents
Code Review: Cherry-picks iterative bisection and TaggedOperation refactor into 1.12.5 to address issue #26344. Consider adding type parameters to TaggedOperation usage in DataAssetsWorkflow to maintain compile-time type safety.

1. 💡 Quality: Raw-type TaggedOperation loses compile-time type safety
   Files: openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/insights/workflows/dataAssets/DataAssetsWorkflow.java:289-292, openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/insights/workflows/dataAssets/DataAssetsWorkflow.java:347-356

   In `DataAssetsWorkflow.processSource`, the processor returns `List<?>` and each element is wrapped as `TaggedOperation<>(op, ref)` where `op` is `Object`. The sink field is declared as raw `Sink searchIndexSink`, so `write(batch)` compiles without type checking. If the processor ever returns a non-BulkOperation object, the error would only surface at runtime deep inside the sink.
   
   This is a pre-existing raw-type pattern on the `Sink` field, but the new code adds another layer of type erasure through `TaggedOperation<?>`. Consider narrowing the types to catch mismatches at compile time.

   Suggested fix:
   // In DataAssetsWorkflow, type the sink and processor fields properly:
   private Sink<List<TaggedOperation<BulkOperation>>, BulkResponse> searchIndexSink;
   private Processor<List<BulkOperation>, EntityInterface> entityProcessor;
   
   // Then the forEach becomes type-safe:
   List<BulkOperation> bulkOps = entityProcessor.process(enriched, contextData);
   bulkOps.forEach(op -> opsQueue.add(new TaggedOperation<>(op, ref)));

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@manerow
Copy link
Copy Markdown
Contributor Author

manerow commented Apr 7, 2026

Dropping this for now — will cherry-pick into 1.12.6 once the branch is up.

@manerow manerow closed this Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant