Fixes #28138: Pass selective fields instead of "*" in batch entity fetches to prevent OOM#28151
Conversation
There was a problem hiding this comment.
Pull request overview
This PR replaces ad-hoc "*" fields arguments at several Entity.getEntity / EntityRepository.listAfter / PaginatedEntitiesSource call sites with explicit, narrower field selections. The goal is to stop fetching every relation/extension field when only a small subset is actually consumed, reducing DB joins and serialization cost in workflow, ingestion, and app-context paths.
Changes:
- Governance workflow delegates (
SinkTaskDelegate,SetGlossaryTermStatusImpl,SetEntityCertificationImpl,RollbackEntityImpl,CheckChangeDescriptionTaskImpl) now request either an empty/specific field set orReindexingUtil.getSearchIndexFields(entityType)instead of"*". - Insights workflows (
DataAssetsWorkflow,CostAnalysisWorkflow) now use the entity-specific search-index fields helper (andlifeCycleonly for tables, no extra fields for the database service listing). SearchIndexRetryWorker.reindexEntityCascadeandApplicationContext.initializeswitch from"*"to targeted fields (getSearchIndexFields(...)and"pipelines"respectively).
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-service/.../search/SearchIndexRetryWorker.java | Use per-type search-index fields when re-fetching entities for cascade reindex. |
| openmetadata-service/.../automatedTask/sink/SinkTaskDelegate.java | Replace "*" with ReindexingUtil.getSearchIndexFields(...) in batch and single-entity sink paths. |
| openmetadata-service/.../automatedTask/impl/SetGlossaryTermStatusImpl.java | Load glossary term with no extra fields prior to status patch. |
| openmetadata-service/.../automatedTask/impl/SetEntityCertificationImpl.java | Load entity with only certification field for the patch. |
| openmetadata-service/.../automatedTask/impl/RollbackEntityImpl.java | Load current entity without extra fields (full version is reloaded later via getVersion). |
| openmetadata-service/.../automatedTask/impl/CheckChangeDescriptionTaskImpl.java | Load entity without extra fields; only changeDescription is consumed. |
| openmetadata-service/.../insights/workflows/dataAssets/DataAssetsWorkflow.java | Use getSearchIndexFields(entityType) for paginated source fields. |
| openmetadata-service/.../insights/workflows/costAnalysis/CostAnalysisWorkflow.java | Database services pull no extra fields; tables pull only lifeCycle. |
| openmetadata-service/.../apps/ApplicationContext.java | List installed apps with only the pipelines field instead of all fields. |
🟡 Playwright Results — all passed (11 flaky)✅ 4110 passed · ❌ 0 failed · 🟡 11 flaky · ⏭️ 86 skipped
🟡 11 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
Code Review ✅ ApprovedReplaces wildcard field selectors with specific fields across multiple workflows and search indexers to minimize unnecessary data fetching. No issues found. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
|
Failed to cherry-pick changes to the 1.12.9 branch. |
|
Changes have been cherry-picked to the 1.13 branch. |
(cherry picked from commit 71893d5)



Fixes #28138
I worked on replacing blind "*" (all-fields) entity fetches with selective field lists across background apps, workflows, and governance tasks because hydrating every field —
including heavy embedded arrays like columns, tags, followers, owners, lineage, sampleData, and changeDescription — for entities processed in bulk causes excessive memory
consumption and OOMs.
This continues the earlier selective-fields work: DataInsightsApp and SearchIndexApp already pass selective fields, but several adjacent batch processes still passed "*", so
the OOM symptoms persisted. This change closes those remaining gaps.
Tier 1 — reuse ReindexingUtil.getSearchIndexFields(entityType) (the same per-entity allow-list SearchIndexApp uses):
Tier 1 — reuse ReindexingUtil.getSearchIndexFields(entityType) (the same per-entity allow-list SearchIndexApp uses):
Tier 2 — minimal hand-picked field lists (tied 1:1 to the getters each consumer actually calls):
CheckEntityAttributesImpl, DataCompletenessImpl, SetEntityAttributeImpl, CreateTask, and RdfIndexApp were intentionally left as-is — they genuinely need full entity state
(rule-engine / arbitrary field-path evaluation), or warrant their own analysis (RDF).
Type of change:
High-level design:
N/A — small change.
Tests:
▎ Note: no automated tests were added in this PR. The Tier 1 changes reuse an allow-list already exercised by SearchIndexApp's test coverage. The Tier 2 governance-task
▎ changes (which feed JSON-patch flows) would benefit from an integration test confirming no unintended field clobber — happy to add openmetadata-integration-tests coverage if
▎ reviewers want it before merge.
UI screen recording / screenshots:
Not applicable.
Checklist: