feat: column name hashing in FQN construction#26950
feat: column name hashing in FQN construction#26950mohittilala wants to merge 8 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Implements hashed column-name segments in column FQN construction (using md5_<32hex>) to prevent long/raw column names from exceeding entityLink VARCHAR(3072) limits, and updates backend/ingestion/UI code paths to work with hashed column FQNs while keeping Column.name readable.
Changes:
- Backend: Introduces
ColumnNameHash, hashes column segments when building column FQNs, updates joins + validation to support raw or hashed identifiers, and adds a v1.13.0 data migration to recompute stored column FQNs and test-case entity links. - Ingestion: Hashes the column segment when building FQNs, hashes entity-link → FQN conversions, removes column-name truncation, and updates/extends unit tests for hashing + special characters.
- UI: Prefers
column.namefor building entity links and attempts to resolve readable column names in lineage views/modals instead of showing hashed FQN segments.
Reviewed changes
Copilot reviewed 38 out of 38 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-ui/src/main/resources/ui/src/utils/TableUtils.tsx | Prefer column.name over parsing the column name from FQN when building entity links. |
| openmetadata-ui/src/main/resources/ui/src/utils/Lineage/LineageUtils.tsx | Adds column-name lookup by column FQN to populate readable lineage table values. |
| openmetadata-ui/src/main/resources/ui/src/utils/EntityLineageUtils.tsx | Updates lineage delete-modal text to resolve readable column names from node data when possible. |
| openmetadata-ui/src/main/resources/ui/src/context/LineageProvider/LineageProvider.tsx | Passes ReactFlow nodes into getModalBodyText for column-name resolution. |
| openmetadata-ui/src/main/resources/ui/src/components/Tag/TagsContainerV2/TagsContainerV2.tsx | Build column entity links using columnData.name when available. |
| openmetadata-ui/src/main/resources/ui/src/components/LineageTable/LineageTable.tsx | Displays resolved fromColumnName/toColumnName in lineage column tables (fallback to FQN segment). |
| openmetadata-ui/src/main/resources/ui/src/components/Lineage/Lineage.interface.ts | Extends ColumnLevelLineageNode with fromColumnName/toColumnName. |
| openmetadata-ui/src/main/resources/ui/src/components/Database/TableDescription/TableDescription.component.tsx | Build column entity links using raw column name when available; fixes memo deps. |
| openmetadata-spec/src/main/resources/json/schema/entity/data/table.json | Removes :: prohibition pattern from columnName schema. |
| openmetadata-service/src/main/java/org/openmetadata/service/util/ColumnNameHash.java | New backend utility for hashing/resolving column FQN segments. |
| openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v1130/MigrationUtil.java | Adds v1.13.0 migration logic to recompute column FQNs and update test-case entity links. |
| openmetadata-service/src/main/java/org/openmetadata/service/migration/postgres/v1130/Migration.java | Runs the new column-FQN migration for Postgres. |
| openmetadata-service/src/main/java/org/openmetadata/service/migration/mysql/v1130/Migration.java | Runs the new column-FQN migration for MySQL. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java | Hashes column segment when persisting joins; resolves join column names back to raw for API output. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java | Updates validateColumn to accept raw names or hashed column segments. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnUtil.java | Hashes column segment during setColumnFQN(...) for columns and nested children. |
| ingestion/tests/unit/utils/test_fqn_special_chars.py | Updates special-character column FQN expectations to hashed column segments. |
| ingestion/tests/unit/utils/test_column_name_hash.py | New unit tests validating hashing behavior and MD5 compatibility. |
| ingestion/src/metadata/utils/fqn.py | Hashes column_name for Column FQN building; stops reserved-keyword encoding for column segments. |
| ingestion/src/metadata/utils/entity_link.py | Hashes column segment when converting entity links to column FQNs if not already hashed. |
| ingestion/src/metadata/utils/datalake/datalake_utils.py | Removes column-name truncation from datalake column extraction. |
| ingestion/src/metadata/utils/column_name_hash.py | New ingestion utility for hashing and detecting hashed column segments. |
| ingestion/src/metadata/ingestion/source/database/json_schema_extractor.py | Removes column-name truncation when creating columns from JSON schema. |
| ingestion/src/metadata/ingestion/source/database/glue/metadata.py | Removes column-name truncation for Glue columns. |
| ingestion/src/metadata/ingestion/source/database/column_helpers.py | Deletes the truncate_column_name helper implementation. |
| ingestion/src/metadata/ingestion/source/dashboard/tableau/metadata.py | Removes column-name truncation for Tableau columns. |
| ingestion/src/metadata/ingestion/source/dashboard/superset/mixin.py | Removes column-name truncation for Superset columns. |
| ingestion/src/metadata/ingestion/source/dashboard/sigma/metadata.py | Removes column-name truncation for Sigma columns. |
| ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py | Removes column-name truncation for QuickSight columns. |
| ingestion/src/metadata/ingestion/source/dashboard/qliksense/metadata.py | Removes column-name truncation for QlikSense columns. |
| ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py | Removes column-name truncation for PowerBI columns/measures. |
| ingestion/src/metadata/ingestion/source/dashboard/microstrategy/metadata.py | Removes column-name truncation for MicroStrategy columns. |
| ingestion/src/metadata/ingestion/source/dashboard/microstrategy/helpers.py | Removes column-name truncation for MicroStrategy parsing helper. |
| ingestion/src/metadata/ingestion/source/dashboard/looker/columns.py | Removes column-name truncation for Looker columns. |
| ingestion/src/metadata/ingestion/models/custom_basemodel_validation.py | Removes reserved-keyword encoding entries for column-name-related wrapper models. |
| ingestion/src/metadata/data_quality/validations/mixins/sqa_validator_mixin.py | Adds hash-based fallback when resolving columns from entity links. |
| ingestion/src/metadata/data_quality/validations/mixins/pandas_validator_mixin.py | Adds hash-based fallback for dataframe column resolution. |
| ingestion/src/metadata/data_quality/processor/test_case_runner.py | Adds hash-based fallback when matching test-case column entity links to table columns. |
Comments suppressed due to low confidence (1)
ingestion/src/metadata/data_quality/processor/test_case_runner.py:358
columncan still beNonehere (e.g., when neither the raw name nor the hashed fallback matches any table column). The subsequentcolumn.dataType/column.nameaccess will then raise anAttributeError. Add an explicit guard after the lookup (raise a clear error or mark the test case as incompatible andcontinue) before dereferencingcolumn.
if column.dataType not in test_definition.supportedDataTypes:
self.status.failed(
StackTraceError(
name="Incompatible Column for Test Case",
error=f"Test case {tc.name.root} of type {test_definition.name.root}"
| if not is_hashed_column_fqn_segment(col_segment): | ||
| col_segment = hash_column_name(col_segment) | ||
| return f"{split_entity_link[1]}.{col_segment}" | ||
|
|
There was a problem hiding this comment.
split() returns the raw entity-link segment, which may still be URL-encoded (spaces/unicode/special chars). Hashing col_segment without decoding can produce a different hash than the one generated from the actual column name, causing mismatched FQNs. Decode/unquote the column segment (same way as get_decoded_column) before hashing or checking is_hashed_column_fqn_segment.
| if not is_hashed_column_fqn_segment(col_segment): | |
| col_segment = hash_column_name(col_segment) | |
| return f"{split_entity_link[1]}.{col_segment}" | |
| decoded_col_segment = unquote_plus(col_segment) | |
| if not is_hashed_column_fqn_segment(decoded_col_segment): | |
| decoded_col_segment = hash_column_name(decoded_col_segment) | |
| return f"{split_entity_link[1]}.{decoded_col_segment}" |
| const columns = (entityData as unknown as { columns?: Column[] }).columns; | ||
| if (!columns) { | ||
| return undefined; | ||
| } | ||
| const match = columns.find((col) => col.fullyQualifiedName === columnFqn); |
There was a problem hiding this comment.
This lookup only checks the top-level columns array and ignores nested children. For structured columns, this will fail to resolve and will fall back to displaying the hashed FQN segment in the lineage UI. Consider performing a recursive search across columns/children (similar to findColumnByEntityLink in TableUtils.tsx) and add a type guard instead of the unknown cast so the function behaves predictably when entityData doesn't include columns.
| for (const node of nodes) { | ||
| const columns: Column[] = node.data?.node?.columns ?? []; | ||
| const match = columns.find( | ||
| (col) => col.fullyQualifiedName === columnFqn | ||
| ); |
There was a problem hiding this comment.
This only searches node.data.node.columns at the top level. If the lineage column FQN corresponds to a nested/child column, the modal will fall back to getPartialNameFromTableFQN(...) and show the hashed segment. Update the lookup to traverse Column.children recursively when matching by fullyQualifiedName.
| render: (columnFqn: string, record: SourceType) => | ||
| columnNameRender( | ||
| columnFqn, | ||
| record as unknown as ColumnLevelLineageNode, | ||
| 'fromColumnName' | ||
| ), |
There was a problem hiding this comment.
The render function relies on record as unknown as ColumnLevelLineageNode to access fromColumnName. This hides type mismatches and can break silently if the table dataSource changes. Prefer typing columnImpactColumns as ColumnsType<ColumnLevelLineageNode> (and the corresponding dataSource) so the renderer receives the correct record type without unknown casts.
| render: (columnFqn: string, record: SourceType) => | ||
| columnNameRender( | ||
| columnFqn, | ||
| record as unknown as ColumnLevelLineageNode, | ||
| 'toColumnName' | ||
| ), |
There was a problem hiding this comment.
Same as above for toColumnName: the unknown cast defeats type safety. Adjust the column table typing (columns + dataSource) so record is a ColumnLevelLineageNode here and remove the cast.
| Column firstColumn = columns.get(0); | ||
| if (firstColumn.getFullyQualifiedName() == null) { | ||
| return false; | ||
| } | ||
| String fqn = firstColumn.getFullyQualifiedName(); | ||
| String lastSegment = fqn.substring(fqn.lastIndexOf('.') + 1); | ||
| return ColumnNameHash.isHashedColumnFQNSegment(lastSegment); |
There was a problem hiding this comment.
columnsAlreadyHashed only inspects the first column’s FQN segment. If the first entry is already hashed but other columns (or nested children) are not, the migration will incorrectly skip the entity and leave mixed/legacy column FQNs. Consider checking all columns (and recursively their children) for a hashed last segment, or recompute unconditionally when any column is not hashed.
| Column firstColumn = columns.get(0); | |
| if (firstColumn.getFullyQualifiedName() == null) { | |
| return false; | |
| } | |
| String fqn = firstColumn.getFullyQualifiedName(); | |
| String lastSegment = fqn.substring(fqn.lastIndexOf('.') + 1); | |
| return ColumnNameHash.isHashedColumnFQNSegment(lastSegment); | |
| for (Column column : columns) { | |
| if (!isColumnAndChildrenHashed(column)) { | |
| return false; | |
| } | |
| } | |
| return true; | |
| } | |
| private static boolean isColumnAndChildrenHashed(Column column) { | |
| if (column == null) { | |
| return false; | |
| } | |
| String fqn = column.getFullyQualifiedName(); | |
| if (fqn == null) { | |
| return false; | |
| } | |
| String lastSegment = fqn.substring(fqn.lastIndexOf('.') + 1); | |
| if (!ColumnNameHash.isHashedColumnFQNSegment(lastSegment)) { | |
| return false; | |
| } | |
| List<Column> children = column.getChildren(); | |
| if (!nullOrEmpty(children)) { | |
| for (Column child : children) { | |
| if (!isColumnAndChildrenHashed(child)) { | |
| return false; | |
| } | |
| } | |
| } | |
| return true; |
| def is_hashed_column_fqn_segment(segment: str) -> bool: | ||
| """Check whether a string is a hashed column FQN segment.""" | ||
| return ( | ||
| segment is not None | ||
| and segment.startswith(HASH_PREFIX) |
There was a problem hiding this comment.
is_hashed_column_fqn_segment is typed as accepting str, but the implementation explicitly checks for None. Update the type hint to Optional[str] (and import Optional) to match the actual supported inputs and avoid static-analysis inconsistencies.
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
🛡️ TRIVY SCAN RESULT 🛡️ Target:
|
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
libpng-dev |
CVE-2026-33416 | 🚨 HIGH | 1.6.39-2+deb12u3 | 1.6.39-2+deb12u4 |
libpng-dev |
CVE-2026-33636 | 🚨 HIGH | 1.6.39-2+deb12u3 | 1.6.39-2+deb12u4 |
libpng16-16 |
CVE-2026-33416 | 🚨 HIGH | 1.6.39-2+deb12u3 | 1.6.39-2+deb12u4 |
libpng16-16 |
CVE-2026-33636 | 🚨 HIGH | 1.6.39-2+deb12u3 | 1.6.39-2+deb12u4 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: Java
Vulnerabilities (37)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.12.7 | 2.15.0 |
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.13.4 | 2.15.0 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42003 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4.2 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42004 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4 |
com.google.code.gson:gson |
CVE-2022-25647 | 🚨 HIGH | 2.2.4 | 2.8.9 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.3.0 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.3.0 | 3.25.5, 4.27.5, 4.28.2 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.7.1 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.7.1 | 3.25.5, 4.27.5, 4.28.2 |
com.nimbusds:nimbus-jose-jwt |
CVE-2023-52428 | 🚨 HIGH | 9.8.1 | 9.37.2 |
com.squareup.okhttp3:okhttp |
CVE-2021-0341 | 🚨 HIGH | 3.12.12 | 4.9.2 |
commons-beanutils:commons-beanutils |
CVE-2025-48734 | 🚨 HIGH | 1.9.4 | 1.11.0 |
commons-io:commons-io |
CVE-2024-47554 | 🚨 HIGH | 2.8.0 | 2.14.0 |
dnsjava:dnsjava |
CVE-2024-25638 | 🚨 HIGH | 2.1.7 | 3.6.0 |
io.airlift:aircompressor |
CVE-2025-67721 | 🚨 HIGH | 0.27 | 2.0.3 |
io.netty:netty-codec-http |
CVE-2026-33870 | 🚨 HIGH | 4.1.96.Final | 4.1.132.Final, 4.2.10.Final |
io.netty:netty-codec-http2 |
CVE-2025-55163 | 🚨 HIGH | 4.1.96.Final | 4.2.4.Final, 4.1.124.Final |
io.netty:netty-codec-http2 |
CVE-2026-33871 | 🚨 HIGH | 4.1.96.Final | 4.1.132.Final, 4.2.11.Final |
io.netty:netty-codec-http2 |
GHSA-xpw8-rcwv-8f8p | 🚨 HIGH | 4.1.96.Final | 4.1.100.Final |
io.netty:netty-handler |
CVE-2025-24970 | 🚨 HIGH | 4.1.96.Final | 4.1.118.Final |
net.minidev:json-smart |
CVE-2021-31684 | 🚨 HIGH | 1.3.2 | 1.3.3, 2.4.4 |
net.minidev:json-smart |
CVE-2023-1370 | 🚨 HIGH | 1.3.2 | 2.4.9 |
org.apache.avro:avro |
CVE-2024-47561 | 🔥 CRITICAL | 1.7.7 | 1.11.4 |
org.apache.avro:avro |
CVE-2023-39410 | 🚨 HIGH | 1.7.7 | 1.11.3 |
org.apache.derby:derby |
CVE-2022-46337 | 🔥 CRITICAL | 10.14.2.0 | 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0 |
org.apache.ivy:ivy |
CVE-2022-46751 | 🚨 HIGH | 2.5.1 | 2.5.2 |
org.apache.mesos:mesos |
CVE-2018-1330 | 🚨 HIGH | 1.4.3 | 1.6.0 |
org.apache.spark:spark-core_2.12 |
CVE-2025-54920 | 🚨 HIGH | 3.5.6 | 3.5.7 |
org.apache.thrift:libthrift |
CVE-2019-0205 | 🚨 HIGH | 0.12.0 | 0.13.0 |
org.apache.thrift:libthrift |
CVE-2020-13949 | 🚨 HIGH | 0.12.0 | 0.14.0 |
org.apache.zookeeper:zookeeper |
CVE-2023-44981 | 🔥 CRITICAL | 3.6.3 | 3.7.2, 3.8.3, 3.9.1 |
org.eclipse.jetty:jetty-server |
CVE-2024-13009 | 🚨 HIGH | 9.4.56.v20240826 | 9.4.57.v20241219 |
org.lz4:lz4-java |
CVE-2025-12183 | 🚨 HIGH | 1.8.0 | 1.8.1 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: Node.js
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: Python
Vulnerabilities (13)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
apache-airflow |
CVE-2026-26929 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow |
CVE-2026-28779 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow |
CVE-2026-30911 | 🚨 HIGH | 3.1.7 | 3.1.8 |
cryptography |
CVE-2026-26007 | 🚨 HIGH | 42.0.8 | 46.0.5 |
jaraco.context |
CVE-2026-23949 | 🚨 HIGH | 5.3.0 | 6.1.0 |
jaraco.context |
CVE-2026-23949 | 🚨 HIGH | 6.0.1 | 6.1.0 |
pyOpenSSL |
CVE-2026-27459 | 🚨 HIGH | 24.1.0 | 26.0.0 |
starlette |
CVE-2025-62727 | 🚨 HIGH | 0.48.0 | 0.49.1 |
urllib3 |
CVE-2025-66418 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2025-66471 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2026-21441 | 🚨 HIGH | 1.26.20 | 2.6.3 |
wheel |
CVE-2026-24049 | 🚨 HIGH | 0.45.1 | 0.46.2 |
wheel |
CVE-2026-24049 | 🚨 HIGH | 0.45.1 | 0.46.2 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: /etc/ssl/private/ssl-cert-snakeoil.key
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/extended_sample_data.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/lineage.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_data.json
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_data.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_data_aut.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_usage.json
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_usage.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_usage_aut.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️ Target:
|
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.12.7 | 2.15.0 |
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.13.4 | 2.15.0 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42003 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4.2 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42004 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4 |
com.google.code.gson:gson |
CVE-2022-25647 | 🚨 HIGH | 2.2.4 | 2.8.9 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.3.0 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.3.0 | 3.25.5, 4.27.5, 4.28.2 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.7.1 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.7.1 | 3.25.5, 4.27.5, 4.28.2 |
com.nimbusds:nimbus-jose-jwt |
CVE-2023-52428 | 🚨 HIGH | 9.8.1 | 9.37.2 |
com.squareup.okhttp3:okhttp |
CVE-2021-0341 | 🚨 HIGH | 3.12.12 | 4.9.2 |
commons-beanutils:commons-beanutils |
CVE-2025-48734 | 🚨 HIGH | 1.9.4 | 1.11.0 |
commons-io:commons-io |
CVE-2024-47554 | 🚨 HIGH | 2.8.0 | 2.14.0 |
dnsjava:dnsjava |
CVE-2024-25638 | 🚨 HIGH | 2.1.7 | 3.6.0 |
io.airlift:aircompressor |
CVE-2025-67721 | 🚨 HIGH | 0.27 | 2.0.3 |
io.netty:netty-codec-http |
CVE-2026-33870 | 🚨 HIGH | 4.1.96.Final | 4.1.132.Final, 4.2.10.Final |
io.netty:netty-codec-http2 |
CVE-2025-55163 | 🚨 HIGH | 4.1.96.Final | 4.2.4.Final, 4.1.124.Final |
io.netty:netty-codec-http2 |
CVE-2026-33871 | 🚨 HIGH | 4.1.96.Final | 4.1.132.Final, 4.2.11.Final |
io.netty:netty-codec-http2 |
GHSA-xpw8-rcwv-8f8p | 🚨 HIGH | 4.1.96.Final | 4.1.100.Final |
io.netty:netty-handler |
CVE-2025-24970 | 🚨 HIGH | 4.1.96.Final | 4.1.118.Final |
net.minidev:json-smart |
CVE-2021-31684 | 🚨 HIGH | 1.3.2 | 1.3.3, 2.4.4 |
net.minidev:json-smart |
CVE-2023-1370 | 🚨 HIGH | 1.3.2 | 2.4.9 |
org.apache.avro:avro |
CVE-2024-47561 | 🔥 CRITICAL | 1.7.7 | 1.11.4 |
org.apache.avro:avro |
CVE-2023-39410 | 🚨 HIGH | 1.7.7 | 1.11.3 |
org.apache.derby:derby |
CVE-2022-46337 | 🔥 CRITICAL | 10.14.2.0 | 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0 |
org.apache.ivy:ivy |
CVE-2022-46751 | 🚨 HIGH | 2.5.1 | 2.5.2 |
org.apache.mesos:mesos |
CVE-2018-1330 | 🚨 HIGH | 1.4.3 | 1.6.0 |
org.apache.spark:spark-core_2.12 |
CVE-2025-54920 | 🚨 HIGH | 3.5.6 | 3.5.7 |
org.apache.thrift:libthrift |
CVE-2019-0205 | 🚨 HIGH | 0.12.0 | 0.13.0 |
org.apache.thrift:libthrift |
CVE-2020-13949 | 🚨 HIGH | 0.12.0 | 0.14.0 |
org.apache.zookeeper:zookeeper |
CVE-2023-44981 | 🔥 CRITICAL | 3.6.3 | 3.7.2, 3.8.3, 3.9.1 |
org.eclipse.jetty:jetty-server |
CVE-2024-13009 | 🚨 HIGH | 9.4.56.v20240826 | 9.4.57.v20241219 |
org.lz4:lz4-java |
CVE-2025-12183 | 🚨 HIGH | 1.8.0 | 1.8.1 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: Node.js
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: Python
Vulnerabilities (24)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
Authlib |
CVE-2026-27962 | 🔥 CRITICAL | 1.6.6 | 1.6.9 |
Authlib |
CVE-2026-28490 | 🚨 HIGH | 1.6.6 | 1.6.9 |
Authlib |
CVE-2026-28498 | 🚨 HIGH | 1.6.6 | 1.6.9 |
Authlib |
CVE-2026-28802 | 🚨 HIGH | 1.6.6 | 1.6.7 |
PyJWT |
CVE-2026-32597 | 🚨 HIGH | 2.11.0 | 2.12.0 |
Werkzeug |
CVE-2024-34069 | 🚨 HIGH | 2.2.3 | 3.0.3 |
aiohttp |
CVE-2025-69223 | 🚨 HIGH | 3.12.12 | 3.13.3 |
apache-airflow |
CVE-2026-26929 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow |
CVE-2026-28779 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow |
CVE-2026-30911 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow-providers-http |
CVE-2025-69219 | 🚨 HIGH | 5.6.4 | 6.0.0 |
cryptography |
CVE-2026-26007 | 🚨 HIGH | 42.0.8 | 46.0.5 |
jaraco.context |
CVE-2026-23949 | 🚨 HIGH | 5.3.0 | 6.1.0 |
jaraco.context |
CVE-2026-23949 | 🚨 HIGH | 6.0.1 | 6.1.0 |
protobuf |
CVE-2026-0994 | 🚨 HIGH | 4.25.8 | 6.33.5, 5.29.6 |
pyOpenSSL |
CVE-2026-27459 | 🚨 HIGH | 24.1.0 | 26.0.0 |
pyasn1 |
CVE-2026-30922 | 🚨 HIGH | 0.6.2 | 0.6.3 |
ray |
CVE-2025-62593 | 🔥 CRITICAL | 2.47.1 | 2.52.0 |
starlette |
CVE-2025-62727 | 🚨 HIGH | 0.48.0 | 0.49.1 |
tornado |
CVE-2026-31958 | 🚨 HIGH | 6.5.4 | 6.5.5 |
urllib3 |
CVE-2025-66418 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2025-66471 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2026-21441 | 🚨 HIGH | 1.26.20 | 2.6.3 |
wheel |
CVE-2026-24049 | 🚨 HIGH | 0.45.1 | 0.46.2 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: usr/bin/docker
Vulnerabilities (2)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
stdlib |
CVE-2025-68121 | 🔥 CRITICAL | v1.25.6 | 1.24.13, 1.25.7, 1.26.0-rc.3 |
stdlib |
CVE-2026-25679 | 🚨 HIGH | v1.25.6 | 1.25.8, 1.26.1 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: /etc/ssl/private/ssl-cert-snakeoil.key
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO
No Vulnerabilities Found
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
Code Review ✅ Approved 4 resolved / 4 findingsImplements column name hashing in FQN construction with consistent UTF-8 encoding across Java and Python, adds columnsAlreadyHashed migration tracking, removes unsafe type casts, and consolidates duplicate recursive column search functions. All findings have been addressed. ✅ 4 resolved✅ Bug: Java hash uses platform charset; Python uses UTF-8 explicitly
✅ Edge Case: Migration columnsAlreadyHashed only checks first column
✅ Quality: Unsafe cast to ColumnLevelLineageNode in LineageTable
✅ Quality: Duplicate recursive column search functions across two files
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
| const columnName = | ||
| column.name ?? | ||
| EntityLink.getTableColumnNameFromColumnFqn( | ||
| column.fullyQualifiedName ?? '', | ||
| false | ||
| ); |
There was a problem hiding this comment.
findColumnByEntityLink now prefers column.name when generating the entity link, but EntityLink.getTableEntityLink does not escape/encode columnName. With column-name hashing, column entity links are expected to use the hashed FQN segment; since column.name is typically always defined and may contain reserved separators (e.g. ::, >), this will prevent matching hashed entity links and can also produce invalid/ambiguous entity links. Consider deriving the entity-link column segment from column.fullyQualifiedName (via getTableColumnNameFromColumnFqn) and, if needed for backward compatibility, compare against both raw-name and hashed-segment forms.
| if (!isGlossaryType && entityType === EntityType.TABLE) { | ||
| const entityLink = EntityLink.getTableEntityLink( | ||
| entityFqn ?? '', | ||
| EntityLink.getTableColumnNameFromColumnFqn(columnData?.fqn ?? '', false) | ||
| columnData?.name ?? | ||
| EntityLink.getTableColumnNameFromColumnFqn( | ||
| columnData?.fqn ?? '', | ||
| false | ||
| ) | ||
| ); |
There was a problem hiding this comment.
When building the column entity link for tag suggestions, this code prefers columnData?.name. EntityLink.getTableEntityLink does not encode the column segment, and the table schema now allows :: in column names; using the raw name here can generate invalid/ambiguous entity links and will also fail to match suggestions stored using the hashed column identifier. Prefer using the column segment extracted from columnData?.fqn (hashed segment) for the entity link, and keep the readable name only for display.
| const entityLink = useMemo( | ||
| () => | ||
| entityType === EntityType.TABLE | ||
| ? EntityLink.getTableEntityLink( | ||
| entityFqn, | ||
| EntityLink.getTableColumnNameFromColumnFqn(columnData.fqn, false) | ||
| columnData.record?.name ?? | ||
| EntityLink.getTableColumnNameFromColumnFqn(columnData.fqn, false) | ||
| ) |
There was a problem hiding this comment.
EntityLink.getTableEntityLink does not escape/encode the column segment. Preferring columnData.record?.name here can produce invalid/ambiguous entity links now that column names may legally contain :: and other reserved characters, and it can prevent matching against stored entity links that use the hashed column identifier. Prefer using the column segment from columnData.fqn (i.e., getTableColumnNameFromColumnFqn(columnData.fqn, false)) when constructing the entity link.
| const findColumnNameByFqn = ( | ||
| columnFqn: string, | ||
| entityData: NodeData['entity'] | ||
| ): string | undefined => { | ||
| const columns = (entityData as unknown as { columns?: Column[] }).columns; | ||
| if (!columns) { | ||
| return undefined; |
There was a problem hiding this comment.
findColumnNameByFqn relies on entityData as unknown as { columns?: Column[] } because NodeData['entity'] is typed as EntityReference, but LineageProvider populates NodeData.entity with full LineageNodeType objects (including columns). Consider tightening the NodeData type to reflect the actual shape (e.g., LineageNodeType | EntityReference) and adding a type guard for columns to avoid unsafe casts and make this helper safer to evolve.
| public static boolean isHashedColumnFQNSegment(String segment) { | ||
| return segment != null && segment.startsWith(HASH_PREFIX) && segment.length() == HASH_LENGTH; | ||
| } |
There was a problem hiding this comment.
isHashedColumnFQNSegment only checks prefix + length. A real column segment like md5_<32 chars> would be misclassified as already-hashed, which can break validation/migration logic. Consider also validating that the suffix is 32 hex characters (e.g., [0-9a-fA-F]{32}) to avoid accidental collisions.
| # Now populate the dictionary with the imported classes. | ||
| # Note: ColumnName, ColumnProfile, and CustomColumnName entries have been | ||
| # removed — column FQN segments are now hashed, making reserved-keyword | ||
| # encoding unnecessary for column names. Table-level transforms are kept | ||
| # for backward compatibility with existing encoded data in the database. | ||
| TRANSFORMABLE_ENTITIES.update( |
There was a problem hiding this comment.
The comment indicates reserved-keyword encoding for column names was removed, but BaseModel.parse_name still applies replace_separators recursively to CreateTableRequest (and other Create* models), which will continue encoding column name.root values containing ::, > or ". With the new hashing-based FQN segment, this would cause the server and client to hash different inputs (encoded vs raw), breaking cross-language FQN/entity-link consistency. To align with the new approach, update the name-transform logic to skip column-name encoding for create/store models (or otherwise ensure both Java and Python hash the same normalized form).
|
🔴 Playwright Results — 50 failure(s), 21 flaky✅ 3360 passed · ❌ 50 failed · 🟡 21 flaky · ⏭️ 259 skipped
Genuine Failures (failed on all attempts)❌
|


Describe your changes:
Fixes https://github.com/open-metadata/openmetadata-collate/issues/3253
Summary
Implements column name hashing to decouple FQN storage constraints from raw column names. Column names from sources like BigQuery structs and Snowflake can exceed
entityLinkVARCHAR(3072) limits when used directly in FQNs. This is the long-term fix for ingestion failures previously patched in #26530.Approach: Hash only the column segment in FQN construction.
column.namestays as the raw readable source name,column.fullyQualifiedNameusesmd5_<32 hex chars>for column segments.BEFORE: name = "customer_email" FQN = "svc.db.schema.tbl.customer_email"
AFTER: name = "customer_email" FQN = "svc.db.schema.tbl.md5_a1b2c3d4..."
Why this approach:
nameis the column identity —EntityUtil.columnMatchuses it for re-ingestion matchingdisplayNameis user-editable — can't be sole store of original namecol.namedirectly — completely unaffectedChanges
Core
ColumnNameHashutility (Java + Python) — MD5-based, consistent with existingEntityUtil.hash()ColumnUtil.setColumnFQN(),ContainerRepository,TableRepositoryjoins — hash column name before building FQNfqn.pyColumn builder — hashcolumn_nameparametervalidateColumn()— supports both raw names and hashed segmentsresolveColumnName()— reverse lookup from hash to readable name for API responsesCleanup
truncate_column_name()and removed from 12 connector/dashboard filescolumnNamepattern constraint (::prohibition) fromtable.jsonColumnName,ColumnProfile,CustomColumnNamefrom reserved keyword encodingData Quality
test_case_runner.py,sqa_validator_mixin.py,pandas_validator_mixin.py— hash-based fallback when matching columns from entity linksentity_link.py— hash column name when converting entity link to FQNMigration (v1.13.0)
Frontend
column.namewhen building entity linksfromColumnName/toColumnNametoColumnLevelLineageNodeinterfaceWhat is NOT affected
col.name(raw)col.name(raw)columnMatchusesname(raw)col.getName()(raw)getEntityName()(raw)Test plan
EntityUtil.hash())mvn spotless:apply::,>,", unicode, and 1000+ char namesType of change:
Checklist:
Fixes <issue-number>: <short explanation>