Skip to content

Fixes: Datalake Connector#25953

Merged
ulixius9 merged 7 commits intomainfrom
fixes/datalake-connector
Feb 18, 2026
Merged

Fixes: Datalake Connector#25953
ulixius9 merged 7 commits intomainfrom
fixes/datalake-connector

Conversation

@keshavmohta09
Copy link
Copy Markdown
Contributor

@keshavmohta09 keshavmohta09 commented Feb 18, 2026

Describe your changes:

Fixes #25956

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • CSV parser robustness:
    • Removed engine="python" parameter from pd.read_csv() in dsv.py to default to C engine for handling NUL bytes and large fields
  • Table name length constraint:
    • Implemented MD5 hashing for table names exceeding 256 characters with displayName fallback for full path preservation

This will update automatically on new commits.


@keshavmohta09 keshavmohta09 self-assigned this Feb 18, 2026
@keshavmohta09 keshavmohta09 added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Feb 18, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 18, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

Vulnerabilities (42)

Package Vulnerability ID Severity Installed Version Fixed Version
libpng-dev CVE-2026-22695 🚨 HIGH 1.6.39-2+deb12u1 1.6.39-2+deb12u2
libpng-dev CVE-2026-22801 🚨 HIGH 1.6.39-2+deb12u1 1.6.39-2+deb12u2
libpng-dev CVE-2026-25646 🚨 HIGH 1.6.39-2+deb12u1 1.6.39-2+deb12u3
libpng16-16 CVE-2026-22695 🚨 HIGH 1.6.39-2+deb12u1 1.6.39-2+deb12u2
libpng16-16 CVE-2026-22801 🚨 HIGH 1.6.39-2+deb12u1 1.6.39-2+deb12u2
libpng16-16 CVE-2026-25646 🚨 HIGH 1.6.39-2+deb12u1 1.6.39-2+deb12u3
linux-libc-dev CVE-2024-46786 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-21946 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-22022 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-22083 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-22107 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-22121 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-37926 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-38022 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-38129 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-38361 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-38718 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-39871 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-68340 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-68349 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-68800 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-71085 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2025-71116 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-22984 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-22990 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23001 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23010 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23054 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23074 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23097 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23120 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23121 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23124 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23125 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23126 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23128 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23133 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23139 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23140 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23142 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23144 🚨 HIGH 6.1.159-1 6.1.162-1
linux-libc-dev CVE-2026-23156 🚨 HIGH 6.1.159-1 6.1.162-1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (9)

Package Vulnerability ID Severity Installed Version Fixed Version
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 18, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
libpam-modules CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-modules-bin CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-runtime CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam0g CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (19)

Package Vulnerability ID Severity Installed Version Fixed Version
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
aiohttp CVE-2025-69223 🚨 HIGH 3.13.2 3.13.3
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6
azure-core CVE-2026-21226 🚨 HIGH 1.37.0 1.38.0
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
protobuf CVE-2026-0994 🚨 HIGH 4.25.8 6.33.5, 5.29.6
pyasn1 CVE-2026-23490 🚨 HIGH 0.6.1 0.6.2
python-multipart CVE-2026-24486 🚨 HIGH 0.0.20 0.0.22
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: usr/bin/docker

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
stdlib CVE-2025-68121 🔥 CRITICAL v1.25.5 1.24.13, 1.25.7, 1.26.0-rc.3
stdlib CVE-2025-61726 🚨 HIGH v1.25.5 1.24.12, 1.25.6
stdlib CVE-2025-61728 🚨 HIGH v1.25.5 1.24.12, 1.25.6
stdlib CVE-2025-61730 🚨 HIGH v1.25.5 1.24.12, 1.25.6

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Feb 18, 2026

🔍 CI failure analysis for b6c5bb4: SonarCloud Quality Gate failed due to Security Review Rating E on new code (requires ≥ A). Previous test failures (breakpoint issue) have been fixed in commit c0caf5e.

Current CI Failures

1. SonarCloud Code Analysis (ACTIVE - Job ID: 64000186844)

Status: Failed on commit b6c5bb49d5ef1aec914ad92064817633fbb3241f (latest)

Issue: Quality Gate failed

Failed condition:

  • Security Review Rating on New Code: E (required ≥ A)

Details:
SonarCloud detected security review issues in the new code added by this PR. The security rating is E (worst rating), but the quality gate requires at least an A rating.

Analysis link: https://sonarcloud.io/dashboard?id=open-metadata-ingestion&pullRequest=25953

What this means:
SonarCloud has flagged potential security concerns in the new code. Common causes include:

  • Use of potentially unsafe operations (e.g., MD5 hashing, which is cryptographically weak)
  • Missing input validation
  • Potential injection vulnerabilities
  • Unsafe file operations
  • Weak cryptographic practices

Files modified in this PR:

  • ingestion/src/metadata/ingestion/source/database/datalake/metadata.py - Uses MD5 hashing for table names
  • ingestion/src/metadata/readers/dataframe/dsv.py - CSV parser engine changes
  • ingestion/tests/unit/readers/test_dsv_reader.py - Test additions
  • ingestion/tests/unit/topology/database/test_datalake.py - Test additions

Most likely cause:
The MD5 hash usage in metadata.py for generating table names from long paths:

from hashlib import md5
...
table_name = md5(table_name.encode()).hexdigest()

While MD5 is cryptographically weak, it's being used here for non-cryptographic purposes (generating stable, short identifiers from long strings), which is an acceptable use case. This is likely a false positive from a security perspective.

2. Test Failures - Python 3.10 & 3.11 (RESOLVED)

Previous failures:

  • Python 3.10 (job ID: 63990590748) - Failed on commit 59f117d
  • Python 3.11 (job ID: 63990590766) - Failed on commit 59f117d

Issue: breakpoint() debugging statement in test code

Status:Fixed in commit c0caf5e ("fix: removed breakpoint from test_datalake")

The breakpoint has been removed and is no longer in the latest commit. These jobs just need to be re-run on the latest commit.

Root Cause Summary

SonarCloud Failure (Active)

Type: Code quality/security review
Impact: Blocking PR merge due to quality gate
Likely trigger: MD5 usage in table name hashing (line 304-306 in datalake/metadata.py)
Risk level: Low - MD5 is used for non-cryptographic purposes (identifier generation, not security)

Test Failures (Resolved)

Type: Test execution error
Impact: Tests failed on older commits, but fixed now
Resolution: Breakpoint removed in later commit
Status: Requires CI re-run on latest commit

Details

Commit timeline:

  1. 59f117d - Had breakpoint, test failures occurred
  2. c0caf5e - Breakpoint removed, tests should pass
  3. b6c5bb49 - Latest commit, SonarCloud failure detected

Current PR state:

  • Latest commit: b6c5bb49d5ef1aec914ad92064817633fbb3241f
  • Test failures: Fixed (re-run needed)
  • SonarCloud: Failing (needs review/exception)
Code Review 👍 Approved with suggestions 0 resolved / 1 findings

The MD5 hashing logic and CSV engine switch look correct. The previously identified minor issue (error message logging the MD5 hash instead of original table name) remains unaddressed but is low impact.

💡 Quality: Error message logs MD5 hash instead of original table name

📄 ingestion/src/metadata/ingestion/source/database/datalake/metadata.py:332

In the except block at line 332, if an exception occurs after the table name has been hashed (lines 302-304), the error message will contain the MD5 hash instead of the original file path. This makes debugging harder since operators would see something like Unexpected exception to yield table [a1b2c3d4...] instead of the original file path.

Consider preserving the original table name for error reporting, e.g., by using the display_name variable (which holds the original path when hashing occurred).

Suggested fix
                    error=f"Unexpected exception to yield table [{display_name or table_name}]: {exc}",

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed for 'open-metadata-ingestion'

Failed conditions
E Security Review Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

@ulixius9 ulixius9 merged commit 9d0b032 into main Feb 18, 2026
29 of 31 checks passed
@ulixius9 ulixius9 deleted the fixes/datalake-connector branch February 18, 2026 12:44
keshavmohta09 added a commit that referenced this pull request Feb 18, 2026
* csv field_size_limit to 1gb & datalake hash as table name when string contains more than 256 chars

* fix: switched engine from python to c in pandas read_csv | issues: line contains NUL & pandas.errors.ParserError

* fix: removed breakpoint from test_datalake
keshavmohta09 added a commit that referenced this pull request Feb 19, 2026
* csv field_size_limit to 1gb & datalake hash as table name when string contains more than 256 chars

* fix: switched engine from python to c in pandas read_csv | issues: line contains NUL & pandas.errors.ParserError

* fix: removed breakpoint from test_datalake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Datalake Ingestion: Multiple CSV Parsing & Table Name Errors During Metadata Ingestion

2 participants