Conversation
✅ TypeScript Types Auto-UpdatedThe generated TypeScript types have been automatically updated based on JSON schema changes in this PR. |
There was a problem hiding this comment.
Pull request overview
This PR addresses search-index payload size limits by deduplicating repeated lineage SQL text across edges, storing the SQL once per document and referencing it via a key on each edge. It also tightens bulk payload sizing/headroom and adds safeguards for oversized single documents.
Changes:
- Introduces
sqlQueryKeyon lineage edges and a doc-levellineageSqlQueriesmap; deduplicates SQL during index doc construction and in the lineage update script. - Updates Elasticsearch/OpenSearch index mappings (EN) to include
sqlQueryKeyand to storelineageSqlQueriesas a non-indexed object (enabled: false). - Reduces effective bulk payload target to ~9MB, adds direct-index fallback for oversized documents, and adds unit tests for SQL deduplication.
Reviewed changes
Copilot reviewed 66 out of 67 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-ui/src/main/resources/ui/src/generated/api/lineage/esLineageData.ts | Adds sqlQueryKey to the generated UI type for ES lineage edge payloads. |
| openmetadata-spec/src/main/resources/json/schema/api/lineage/esLineageData.json | Extends lineage edge schema with sqlQueryKey. |
| openmetadata-spec/src/main/resources/elasticsearch/en/worksheet_index_mapping.json | Adds sqlQueryKey to upstreamLineage mapping and adds lineageSqlQueries (disabled object). |
| openmetadata-spec/src/main/resources/elasticsearch/en/topic_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/table_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping to support SQL dedup in table lineage edges. |
| openmetadata-spec/src/main/resources/elasticsearch/en/stored_procedure_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/spreadsheet_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/search_entity_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/prompt_template_index_mapping.json | Adds lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/pipeline_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/mlmodel_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/metric_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/llm_model_index_mapping.json | Adds lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/file_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/directory_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/dashboard_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/dashboard_data_model_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/container_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/chart_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/api_endpoint_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/api_collection_index_mapping.json | Adds sqlQueryKey and lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/ai_governance_policy_index_mapping.json | Adds lineageSqlQueries mapping. |
| openmetadata-spec/src/main/resources/elasticsearch/en/ai_agent_index_mapping.json | Adds lineageSqlQueries mapping. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/SearchIndexTest.java | Adds unit tests validating SQL deduplication behavior and large-edge-count scenario. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/WorksheetIndex.java | Switches to SearchIndex.populateLineageData to write deduped lineage fields. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/TopicIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/TableIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/StoredProcedureIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/StorageServiceIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SpreadsheetIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SecurityServiceIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SearchServiceIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SearchIndex.java | Adds populateLineageData to populate upstreamLineage + lineageSqlQueries and invoke dedup. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SearchEntityIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/PromptTemplateIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/PipelineServiceIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/PipelineIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/MlModelServiceIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/MlModelIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/MetricIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/MetadataServiceIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/MessagingServiceIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/McpServiceIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/McpServerIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/LlmServiceIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/LlmModelIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/FileIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DriveServiceIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DomainIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DirectoryIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DatabaseServiceIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DataProductIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DashboardServiceIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DashboardIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/DashboardDataModelIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/ContainerIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/ChartIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/AiGovernancePolicyIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/AiApplicationIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/APIServiceIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/APIEndpointIndex.java | Switches to SearchIndex.populateLineageData. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java | Lowers bulk max payload target to ~9MB headroom. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexUtils.java | Adds deduplicateSqlAcrossEdges utility that clears sqlQuery and sets sqlQueryKey. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchClusterMetrics.java | Applies the same 10% payload headroom in conservative defaults. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchClient.java | Updates lineage upsert script to deduplicate SQL into doc-level lineageSqlQueries and set sqlQueryKey. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/OpenSearchBulkSink.java | Adjusts bulk overhead estimate, flush thresholding, and adds direct-index fallback for oversized docs. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSink.java | Same as OpenSearch sink: overhead, flush behavior, and direct-index fallback. |
openmetadata-spec/src/main/resources/elasticsearch/en/table_index_mapping.json
Show resolved
Hide resolved
...e/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSink.java
Show resolved
Hide resolved
...vice/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/OpenSearchBulkSink.java
Show resolved
Hide resolved
openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/SearchIndex.java
Show resolved
Hide resolved
openmetadata-spec/src/main/resources/elasticsearch/en/table_index_mapping.json
Show resolved
Hide resolved
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchClient.java
Show resolved
Hide resolved
...e/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSink.java
Show resolved
Hide resolved
...vice/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/OpenSearchBulkSink.java
Show resolved
Hide resolved
...e/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSink.java
Show resolved
Hide resolved
...vice/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/OpenSearchBulkSink.java
Show resolved
Hide resolved
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchIndexUtils.java
Show resolved
Hide resolved
...ervice/src/test/java/org/openmetadata/service/search/indexes/AddUpdateLineageScriptTest.java
Show resolved
Hide resolved
Resolved conflicts: - Index classes: took main's refactored structure (getEntityTypeName, new interfaces, removed getCommonAttributesMap calls to base class) while preserving our getExcludedFields() additions for APICollection, Container, Dashboard, Database, DatabaseSchema, GlossaryTerm, LlmService, and Team - Locale mapping add/add conflicts: took main's locale-specific analyzer configs (jp/ru/zh use different tokenizers) and reapplied our lineageSqlQueries additions to the new entity types (ai_agent, ai_governance_policy, llm_model, prompt_template) - DashboardIndex: also added charts to excluded fields (fix for missing exclusion) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code Review ✅ Approved 5 resolved / 5 findingsFixes payload size issues by resolving buffer flush reordering, incorrect doc indexing, and memory allocation inefficiencies. Standardizes the 9MB buffer threshold and addresses Painless script mapping collisions. ✅ 5 resolved✅ Bug: Missing totalSubmitted increment for directly-indexed oversized docs
✅ Bug: Buffer flush reorder can leave single oversized op stuck in buffer
✅ Bug: Painless script sqlKey assignment can collide on non-contiguous maps
✅ Performance: Redundant multi-MB byte array allocations in strip methods
✅ Quality: Magic number 9MB duplicated across three files
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
|



Describe your changes:
Fixes
I worked on ... because ...
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>