Skip to content

Feat/issue 28620 exasol add access to comments#28621

Open
ArBridgeman wants to merge 23 commits into
open-metadata:mainfrom
exasol:feat/ISSUE-28620-exasol_add_access_to_comments
Open

Feat/issue 28620 exasol add access to comments#28621
ArBridgeman wants to merge 23 commits into
open-metadata:mainfrom
exasol:feat/ISSUE-28620-exasol_add_access_to_comments

Conversation

@ArBridgeman
Copy link
Copy Markdown
Contributor

@ArBridgeman ArBridgeman commented Jun 2, 2026

Describe your changes:

Fixes #28620

I worked on:

Type of change:

  • New feature

High-level design:

N/A

Tests:

Use cases covered

  • Users can comment on one or more columns in a table. When they run metadata ingestion, this would show up in the UI.

Unit tests

Backend integration tests

Ingestion integration tests

Playwright (UI) tests

Manual testing performed

UI screen recording / screenshots:

Not applicable.

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • My PR is linked to a GitHub issue via Fixes #<issue-number> above.
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • For UI changes: I attached a screen recording and/or screenshots above.
  • I have added tests (unit / integration / Playwright as applicable) and listed them above.

Summary by Gitar

  • Exasol metadata ingestion:
    • Added support for fetching column-level comments via EXA_ALL_COLUMNS.
    • Updated sqla_utils.py and metadata.py to integrate column comment reflection into the Exasol inspector.
  • Profiler enhancements:
    • Implemented ExasolSystemMetricsComputer to track DML operations (INSERT, UPDATE, DELETE) using system audit logs.
    • Added ExasolProfilerInterface and updated service_spec.py to support system metrics computation.
    • Refactored ExasolTableMetricComputer to improve view handling and catalog query accuracy.
  • Test suite expansion:
    • Added integration tests for query and system metric logic in tests/integration/sources/database/exasol/.
    • Added automated CLI E2E tests in tests/cli_e2e/test_cli_exasol.py to verify metadata and comment extraction.
  • Miscellaneous improvements:
    • Enabled ModuloFn for the Exasol dialect in modulo.py.
    • Updated exasol.json test connection configuration to include audit table requirements.

This will update automatically on new commits.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

schema: str | None = None,
**kw: Any,
) -> list[ReflectedColumn]:
"""
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed as currently sqlalchemy-exasol does not provide the comments with the returned columns.

Comment thread ingestion/src/metadata/ingestion/source/database/exasol/queries.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/exasol/sqla_utils.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/exasol/queries.py Outdated
from metadata.utils.service_spec.default import DefaultDatabaseSpec

ServiceSpec = DefaultDatabaseSpec(
profiler_class=ExasolProfilerInterface, # pyright: ignore[reportArgumentType]
Copy link
Copy Markdown
Contributor Author

@ArBridgeman ArBridgeman Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to otherwise activate profiler yet, as not all the components are ready based on our local testing.

This was mostly to resolve this test from failing test_create_table_with_profiler, and we believe is a result of the work done by another OMD developer in #27912

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/src/metadata/ingestion/source/database/exasol/queries.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@ArBridgeman ArBridgeman marked this pull request as ready for review June 2, 2026 15:05
@ArBridgeman ArBridgeman requested a review from a team as a code owner June 2, 2026 15:05
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Jun 2, 2026

Code Review ✅ Approved 4 resolved / 4 findings

Enables Exasol column-level comment ingestion and system metric computation, addressing previous issues with parameterized SQL, caching, and table matching filters. No issues found.

✅ 4 resolved
Security: EXASOL_SYSTEM_METRICS_QUERY uses string formatting, not parameterized SQL

📄 ingestion/src/metadata/ingestion/source/database/exasol/queries.py:63-77 📄 ingestion/src/metadata/profiler/metrics/system/exasol/system.py:82-89
The EXASOL_SYSTEM_METRICS_QUERY injects schema, table, database_name, and operations via Python .format() directly into the SQL string (line 75: LIKE UPPER('%{schema}.{table}%')). While these values currently originate from internal metadata (runner.schema_name, runner.table_name), this pattern is vulnerable to SQL injection if any of these values contain special characters (e.g., a schema named foo'; DROP TABLE --). In contrast, the EXASOL_GET_COLUMN_COMMENTS query in the same file correctly uses parameterized queries with :schema and :table_name placeholders.

Note: the LIKE clause makes full parameterization tricky, but at minimum the values should be sanitized or the query restructured to use bind parameters where possible.

Performance: Double @reflection.cache on get_columns may cause stale results

📄 ingestion/src/metadata/ingestion/source/database/exasol/sqla_utils.py:32-43
The custom get_columns in sqla_utils.py is decorated with @reflection.cache, but it delegates to get_columns._original which is EXADialect.get_columns — itself likely decorated with @reflection.cache. This creates a double-caching layer. The outer cache will cache the enriched result (with comments), meaning if comments are updated the cache won't be invalidated. Additionally, the inner cached result is already stored, so there's unnecessary overhead.

Consider removing the @reflection.cache from the custom wrapper if the original already handles caching, or verify that caching the enriched result is the intended behavior.

Edge Case: LIKE-based table matching in system metrics may match wrong tables

📄 ingestion/src/metadata/ingestion/source/database/exasol/queries.py:75
The EXASOL_SYSTEM_METRICS_QUERY uses UPPER(s.sql_text) LIKE UPPER('%{schema}.{table}%') to identify DML operations on a specific table. This pattern can produce false positives — for example, a table named foo would match queries against foo_bar, or a query that references schema.table in a comment or string literal would be counted. Other system metrics implementations typically use more precise catalog views rather than text matching on SQL statements.

Bug: NOT LIKE filters broken after removing .format() escaping

📄 ingestion/src/metadata/ingestion/source/database/exasol/queries.py:76-77
The query EXASOL_SYSTEM_METRICS_QUERY previously had {{ and }} as escape sequences for Python's .format() method (producing literal { and }). Now that the query is used with text() + bind parameters instead of .format(), these doubled braces are no longer unescaped and will be sent literally as {{ and }} to the database.

This means the NOT LIKE filters on lines 76-77 will look for literal {{"app": "OpenMetadata"...}} in SQL text, but actual query headers use single braces {"app": "OpenMetadata"...}. As a result, internal OpenMetadata and dbt queries will no longer be excluded from system metrics, potentially inflating row counts.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend Exasol implementation to include column-level comments

1 participant