Skip to content

Fix payload size issue#27342

Open
mohityadav766 wants to merge 21 commits intomainfrom
fix-payload-issue
Open

Fix payload size issue#27342
mohityadav766 wants to merge 21 commits intomainfrom
fix-payload-issue

Conversation

@mohityadav766
Copy link
Copy Markdown
Member

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

@mohityadav766 mohityadav766 self-assigned this Apr 14, 2026
@mohityadav766 mohityadav766 requested a review from a team as a code owner April 14, 2026 08:19
Copilot AI review requested due to automatic review settings April 14, 2026 08:19
@github-actions github-actions bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 14, 2026
@github-actions
Copy link
Copy Markdown
Contributor

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses search-index payload size limits by deduplicating repeated lineage SQL text across edges, storing the SQL once per document and referencing it via a key on each edge. It also tightens bulk payload sizing/headroom and adds safeguards for oversized single documents.

Changes:

  • Introduces sqlQueryKey on lineage edges and a doc-level lineageSqlQueries map; deduplicates SQL during index doc construction and in the lineage update script.
  • Updates Elasticsearch/OpenSearch index mappings (EN) to include sqlQueryKey and to store lineageSqlQueries as a non-indexed object (enabled: false).
  • Reduces effective bulk payload target to ~9MB, adds direct-index fallback for oversized documents, and adds unit tests for SQL deduplication.

Reviewed changes

Copilot reviewed 66 out of 67 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
openmetadata-ui/src/main/resources/ui/src/generated/api/lineage/esLineageData.ts Adds sqlQueryKey to the generated UI type for ES lineage edge payloads.
openmetadata-spec/src/main/resources/json/schema/api/lineage/esLineageData.json Extends lineage edge schema with sqlQueryKey.
openmetadata-spec/src/main/resources/elasticsearch/en/worksheet_index_mapping.json Adds sqlQueryKey to upstreamLineage mapping and adds lineageSqlQueries (disabled object).
openmetadata-spec/src/main/resources/elasticsearch/en/topic_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/table_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping to support SQL dedup in table lineage edges.
openmetadata-spec/src/main/resources/elasticsearch/en/stored_procedure_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/spreadsheet_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/search_entity_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/prompt_template_index_mapping.json Adds lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/pipeline_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/mlmodel_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/metric_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/llm_model_index_mapping.json Adds lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/file_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/directory_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/dashboard_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/dashboard_data_model_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/container_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/chart_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/api_endpoint_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/api_collection_index_mapping.json Adds sqlQueryKey and lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/ai_governance_policy_index_mapping.json Adds lineageSqlQueries mapping.
openmetadata-spec/src/main/resources/elasticsearch/en/ai_agent_index_mapping.json Adds lineageSqlQueries mapping.
openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/SearchIndexTest.java Adds unit tests validating SQL deduplication behavior and large-edge-count scenario.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/WorksheetIndex.java Switches to SearchIndex.populateLineageData to write deduped lineage fields.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/TopicIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/TableIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/StoredProcedureIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/StorageServiceIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SpreadsheetIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SecurityServiceIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SearchServiceIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SearchIndex.java Adds populateLineageData to populate upstreamLineage + lineageSqlQueries and invoke dedup.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SearchEntityIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/PromptTemplateIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/PipelineServiceIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/PipelineIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/MlModelServiceIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/MlModelIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/MetricIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/MetadataServiceIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/MessagingServiceIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/McpServiceIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/McpServerIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/LlmServiceIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/LlmModelIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/FileIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DriveServiceIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DomainIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DirectoryIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DatabaseServiceIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DataProductIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DashboardServiceIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DashboardIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DashboardDataModelIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/ContainerIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/ChartIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/AiGovernancePolicyIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/AiApplicationIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/APIServiceIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/APIEndpointIndex.java Switches to SearchIndex.populateLineageData.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java Lowers bulk max payload target to ~9MB headroom.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexUtils.java Adds deduplicateSqlAcrossEdges utility that clears sqlQuery and sets sqlQueryKey.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchClusterMetrics.java Applies the same 10% payload headroom in conservative defaults.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchClient.java Updates lineage upsert script to deduplicate SQL into doc-level lineageSqlQueries and set sqlQueryKey.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/OpenSearchBulkSink.java Adjusts bulk overhead estimate, flush thresholding, and adds direct-index fallback for oversized docs.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSink.java Same as OpenSearch sink: overhead, flush behavior, and direct-index fallback.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 14, 2026

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 63%
63.99% (59824/93485) 43.71% (31313/71630) 46.81% (9410/20102)

Copilot AI review requested due to automatic review settings April 14, 2026 09:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 67 out of 68 changed files in this pull request and generated 4 comments.

Copilot AI review requested due to automatic review settings April 14, 2026 17:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 144 out of 145 changed files in this pull request and generated 4 comments.

Copilot AI review requested due to automatic review settings April 14, 2026 17:38
Resolved conflicts:
- Index classes: took main's refactored structure (getEntityTypeName, new interfaces,
  removed getCommonAttributesMap calls to base class) while preserving our
  getExcludedFields() additions for APICollection, Container, Dashboard, Database,
  DatabaseSchema, GlossaryTerm, LlmService, and Team
- Locale mapping add/add conflicts: took main's locale-specific analyzer configs
  (jp/ru/zh use different tokenizers) and reapplied our lineageSqlQueries additions
  to the new entity types (ai_agent, ai_governance_policy, llm_model, prompt_template)
- DashboardIndex: also added charts to excluded fields (fix for missing exclusion)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 153 out of 154 changed files in this pull request and generated no new comments.

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 14, 2026

Code Review ✅ Approved 5 resolved / 5 findings

Fixes payload size issues by resolving buffer flush reordering, incorrect doc indexing, and memory allocation inefficiencies. Standardizes the 9MB buffer threshold and addresses Painless script mapping collisions.

✅ 5 resolved
Bug: Missing totalSubmitted increment for directly-indexed oversized docs

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSink.java:317-322 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/OpenSearchBulkSink.java:368-374 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSink.java:311-323 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/OpenSearchBulkSink.java:362-374
When a document exceeds maxPayloadSizeBytes, indexDocumentDirectly() increments totalSuccess (or totalFailed), and the caller increments processSuccess, but totalSubmitted is never incremented. For normal bulk documents, totalSubmitted is incremented in flushInternal(). This causes stats to report fewer documents submitted than actually succeeded, breaking the invariant totalSubmitted >= totalSuccess.

Bug: Buffer flush reorder can leave single oversized op stuck in buffer

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSink.java:822-828 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/OpenSearchBulkSink.java:965-971
The refactored CustomBulkProcessor.add() now flushes before adding the operation and only when !buffer.isEmpty(). If a single operation larger than maxPayloadSizeBytes arrives when the buffer is empty, it gets added without triggering a flush. The next operation will then trigger a flush of the oversized batch. While the addEntity guard mostly prevents this, time-series and column paths (and any future callers) can still hit this scenario.

Bug: Painless script sqlKey assignment can collide on non-contiguous maps

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchClient.java:264-265
The ADD_UPDATE_LINEAGE Painless script computes the new SQL key as String.valueOf(sqlMap.size() + 1). This assumes keys are a dense sequence starting at 1. If lineageSqlQueries ever has gaps (e.g., stale entries from removed edges that weren't cleaned up, or manual modifications), size() + 1 can collide with an existing key, silently overwriting a different SQL query. The REMOVE_LINEAGE_SCRIPT removes edges from upstreamLineage but does not clean up the corresponding lineageSqlQueries entries, so orphaned keys accumulate and the map size diverges from the max key.

Performance: Redundant multi-MB byte array allocations in strip methods

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexUtils.java:104-105 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexUtils.java:131-132
Both stripLineageForSize and stripDocMapIfOversized call json.getBytes(StandardCharsets.UTF_8) multiple times on the same (unchanged) string. For example in stripLineageForSize, lines 104 and 105 compute .getBytes(UTF_8).length on the same json string twice, allocating a potentially multi-megabyte byte array each time only to discard it. The same pattern repeats in stripDocMapIfOversized at lines 131-132.

Store the byte length in a local variable after each serialization to avoid the redundant allocation.

Quality: Magic number 9MB duplicated across three files

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:1085 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexRetryWorker.java:484 📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ReindexingConfiguration.java:49
The 9 MB payload limit (9L * 1024L * 1024L) is hardcoded independently in SearchRepository.java:1085, SearchIndexRetryWorker.java:484, and ReindexingConfiguration.java:49. If the limit needs to change, it's easy to miss one of these locations, leading to inconsistent behavior between live indexing, retry, and reindexing paths.

Consider extracting a shared constant (e.g., in ReindexingConfiguration or a common constants class) and referencing it from all three call sites.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants