Fix: Optimise redshift memory usage by ulixius9 · Pull Request #25904 · open-metadata/OpenMetadata

ulixius9 · 2026-02-16T09:33:37Z

Describe your changes:

Fixes #20649

This PR optimizes memory usage in the Redshift connector during metadata ingestion by preventing unbounded memory growth when processing multiple schemas. The changes replace global SQLAlchemy reflection caches with per-schema caching and move partition detail loading from database-level to schema-level scope.

Summary by Gitar

Memory optimization:
- Removed @reflection.cache decorators from get_columns, _get_schema_column_info, and _get_all_relation_info to prevent cross-schema cache accumulation
- Added _clear_reflection_cache() method to clear SQLAlchemy inspector cache before processing each schema
Per-schema caching:
- Implemented custom single-schema cache in _get_schema_column_info and _get_all_relation_info that auto-invalidates when schema changes
Scoped partition loading:
- Modified get_partition_details() to accept optional schema_name parameter for targeted queries
- Moved partition loading from global database initialization to per-schema processing in query_table_names_and_types()
Resource cleanup:
- Added explicit result.close() calls to release database cursors

_{This will update automatically on new commits.}

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

gitar-bot · 2026-02-16T09:38:18Z

-            ).fetchall()
+            query = REDSHIFT_PARTITION_DETAILS
+            if schema_name:
+                query += f" AND \"schema\" = '{schema_name}'"


⚠️ Security: SQL injection via string formatting in get_partition_details

The new schema_name parameter is concatenated directly into the SQL query via f-string interpolation:

query += f" AND \"schema\" = '{schema_name}'"

While the schema_name value typically comes from Redshift's own metadata catalog (and is thus unlikely to contain malicious content in normal usage), this is still a SQL injection vector. A schema name containing a single quote (e.g., foo'; DROP TABLE ...--) would break the query or allow injection.

Note that the existing codebase has similar patterns in _get_schema_column_info and _get_all_relation_info, so this is consistent with the current style. However, for new code, it's better to use parameterized queries.

The method query_table_names_and_types in the same file already demonstrates the correct pattern using sql.text() with named parameters (line 236-246). The same approach should be used here.

Suggested fix: Use a parameterized query approach. Since the base query is a raw string constant, wrap the full query with sql.text() and use a bind parameter:

query = REDSHIFT_PARTITION_DETAILS if schema_name: query += " AND \"schema\" = :schema_name" results = self.connection.execute( statement=sql.text(query), {"schema_name": schema_name} ).fetchall()

Suggested fix:

query = REDSHIFT_PARTITION_DETAILS if schema_name: query += " AND \"schema\" = :schema_name" results = self.connection.execute( statement=sql.text(query), {"schema_name": schema_name} if schema_name else {}, ).fetchall()

_{Was this helpful? React with 👍 / 👎}

github-actions · 2026-02-16T09:44:51Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion:trivy (debian 12.12)`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`libpam-modules`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam-modules-bin`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam-runtime`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam0g`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (38)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.12.7	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.13.4	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.15.2	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.16.1	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.airlift:aircompressor`	CVE-2025-67721	🚨 HIGH	0.27	2.0.3
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (21)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`Werkzeug`	CVE-2024-34069	🚨 HIGH	2.2.3	3.0.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.12.12	3.13.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.13.2	3.13.3
`apache-airflow`	CVE-2025-68438	🚨 HIGH	3.1.5	3.1.6
`apache-airflow`	CVE-2025-68675	🚨 HIGH	3.1.5	3.1.6, 2.11.1
`azure-core`	CVE-2026-21226	🚨 HIGH	1.37.0	1.38.0
`cryptography`	CVE-2026-26007	🚨 HIGH	42.0.8	46.0.5
`google-cloud-aiplatform`	CVE-2026-2472	🚨 HIGH	1.130.0	1.131.0
`google-cloud-aiplatform`	CVE-2026-2473	🚨 HIGH	1.130.0	1.133.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`protobuf`	CVE-2026-0994	🚨 HIGH	4.25.8	6.33.5, 5.29.6
`pyasn1`	CVE-2026-23490	🚨 HIGH	0.6.1	0.6.2
`python-multipart`	CVE-2026-24486	🚨 HIGH	0.0.20	0.0.22
`ray`	CVE-2025-62593	🔥 CRITICAL	2.47.1	2.52.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `usr/bin/docker`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`stdlib`	CVE-2025-68121	🔥 CRITICAL	v1.25.5	1.24.13, 1.25.7, 1.26.0-rc.3
`stdlib`	CVE-2025-61726	🚨 HIGH	v1.25.5	1.24.12, 1.25.6
`stdlib`	CVE-2025-61728	🚨 HIGH	v1.25.5	1.24.12, 1.25.6
`stdlib`	CVE-2025-61730	🚨 HIGH	v1.25.5	1.24.12, 1.25.6

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

No Vulnerabilities Found

github-actions · 2026-02-16T09:45:26Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (37)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.12.7	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.13.4	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.15.2	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.airlift:aircompressor`	CVE-2025-67721	🚨 HIGH	0.27	2.0.3
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (9)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`apache-airflow`	CVE-2025-68438	🚨 HIGH	3.1.5	3.1.6
`apache-airflow`	CVE-2025-68675	🚨 HIGH	3.1.5	3.1.6, 2.11.1
`cryptography`	CVE-2026-26007	🚨 HIGH	42.0.8	46.0.5
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/extended_sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/lineage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data_aut.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

No Vulnerabilities Found

gitar-bot · 2026-02-16T11:58:48Z

+    # Single-schema cache: invalidate when schema changes
+    cached = getattr(self, "_relation_info_cache", None)
+    if cached is not None and cached[0] == schema:
+        return cached[1]


⚠️ Bug: Cache key ignores table_name, may return stale results

The single-schema cache in _get_all_relation_info keys only on schema, but the function also filters by table_name (lines 400-403). If the function is first called with (schema="public", table_name="users"), the cached result contains only the "users" relation. A subsequent call with (schema="public", table_name="orders") would hit the cache and return the stale result (only "users"), missing "orders".

The previous @reflection.cache decorator keyed on all arguments (including table_name), so this is a behavioral regression.

The fix should include table_name in the cache key, or (better) only cache when table_name is None (the common bulk-fetch path), since caching single-table lookups provides little benefit.

Suggested fix:

# Single-schema cache: invalidate when schema or table_name changes table_name = kw.get("table_name", None) cached = getattr(self, "_relation_info_cache", None) if cached is not None and cached[0] == schema and cached[1] == table_name: return cached[2]

_{Was this helpful? React with 👍 / 👎}

d-miranda · 2026-03-02T15:47:31Z

Hey guys!
We were being affected with this issue. The changes this MR introduces, based on the 1.11.9 release solved the issue.
Can you guys include it in the next release?

gitar-bot · 2026-03-02T18:54:09Z

+    def _clear_reflection_cache(self) -> None:
+        """Clear the SQLAlchemy inspector's info_cache to release
+        cached column / relation data from prior schemas.
+
+        This prevents unbounded memory growth when ingesting many
+        schemas, since _get_schema_column_info, get_columns, and
+        _get_all_relation_info all use @reflection.cache.
+        """


💡 Quality: Stale docstring references removed @reflection.cache

The docstring for _clear_reflection_cache() states:

"since _get_schema_column_info, get_columns, and _get_all_relation_info all use @reflection.cache"

However, this PR specifically removes @reflection.cache from all three of those functions. The docstring should be updated to reflect that the info_cache is being cleared for the remaining @reflection.cache-decorated functions (e.g., get_table_comment, get_view_definition, _get_pg_column_info) and to clean up any residual cached data.

Suggested fix:

def _clear_reflection_cache(self) -> None: """Clear the SQLAlchemy inspector's info_cache to release cached column / relation data from prior schemas. This prevents unbounded memory growth when ingesting many schemas by clearing residual cached data from remaining @reflection.cache-decorated functions (e.g., get_table_comment). """

_{Was this helpful? React with 👍 / 👎}

gitar-bot · 2026-03-02T18:54:42Z

🔍 CI failure analysis for 51b740d: SonarCloud Quality Gate failed due to insufficient code coverage (6.1%) on new code, below the required ≥20% threshold. Unit test and Playwright E2E failures from previous runs are unrelated to this PR's Redshift changes.

Issue

The [open-metadata-ingestion] SonarCloud Code Analysis check failed due to insufficient code coverage on new code.

Root Cause

Code coverage on new code is 6.1%, below the required threshold of ≥20%.

The PR introduces new methods and logic in the Redshift connector without corresponding test coverage:

New method: _clear_reflection_cache() in metadata.py (lines 209–221)
Modified method: get_partition_details() with new schema parameter logic (lines 193–207)
New per-schema caching logic in _get_schema_column_info() (lines 137–153)
New per-schema caching logic in _get_all_relation_info() (lines 393–417)

Details

To pass the Quality Gate, unit tests should be added covering:

Cache clearing functionality in _clear_reflection_cache()
Schema-specific partition loading in get_partition_details(schema_name)
Per-schema cache invalidation logic in both _get_schema_column_info() and _get_all_relation_info()
Edge cases: schema changes, cache hit/miss scenarios, error handling

View SonarCloud dashboard

Previously observed unrelated failures (may or may not still be present):

Unit Tests & Static Checks (3.11): 3 REST API connector tests failing with 'str' object has no attribute 'get' — unrelated to Redshift changes
playwright-ci-postgresql (3/6): Frontend E2E test timeouts/flakiness — unrelated to Redshift changes

Code Review ⚠️ Changes requested 0 resolved / 3 findings

Memory optimization approach is sound, but the SQL injection risk in get_partition_details and the table_name-ignoring cache key in _get_all_relation_info remain unaddressed from prior review.

⚠️

Bug: Cache key ignores table_name, may return stale results

📄 ingestion/src/metadata/ingestion/source/database/redshift/utils.py:393-396

The single-schema cache in _get_all_relation_info keys only on schema, but the function also filters by table_name (lines 400-403). If the function is first called with (schema="public", table_name="users"), the cached result contains only the "users" relation. A subsequent call with (schema="public", table_name="orders") would hit the cache and return the stale result (only "users"), missing "orders".

The previous @reflection.cache decorator keyed on all arguments (including table_name), so this is a behavioral regression.

The fix should include table_name in the cache key, or (better) only cache when table_name is None (the common bulk-fetch path), since caching single-table lookups provides little benefit.

Suggested fix

    # Single-schema cache: invalidate when schema or table_name changes
    table_name = kw.get("table_name", None)
    cached = getattr(self, "_relation_info_cache", None)
    if cached is not None and cached[0] == schema and cached[1] == table_name:
        return cached[2]

⚠️

Security: SQL injection via string formatting in get_partition_details

📄 ingestion/src/metadata/ingestion/source/database/redshift/metadata.py:201

The new schema_name parameter is concatenated directly into the SQL query via f-string interpolation:

query += f" AND \"schema\" = '{schema_name}'"

While the schema_name value typically comes from Redshift's own metadata catalog (and is thus unlikely to contain malicious content in normal usage), this is still a SQL injection vector. A schema name containing a single quote (e.g., foo'; DROP TABLE ...--) would break the query or allow injection.

Note that the existing codebase has similar patterns in _get_schema_column_info and _get_all_relation_info, so this is consistent with the current style. However, for new code, it's better to use parameterized queries.

The method query_table_names_and_types in the same file already demonstrates the correct pattern using sql.text() with named parameters (line 236-246). The same approach should be used here.

Suggested fix: Use a parameterized query approach. Since the base query is a raw string constant, wrap the full query with sql.text() and use a bind parameter:

query = REDSHIFT_PARTITION_DETAILS
if schema_name:
    query += " AND \"schema\" = :schema_name"
results = self.connection.execute(
    statement=sql.text(query),
    {"schema_name": schema_name}
).fetchall()

Suggested fix

            query = REDSHIFT_PARTITION_DETAILS
            if schema_name:
                query += " AND \"schema\" = :schema_name"
            results = self.connection.execute(
                statement=sql.text(query),
                {"schema_name": schema_name} if schema_name else {},
            ).fetchall()

💡 Quality: Stale docstring references removed @reflection.cache

📄 ingestion/src/metadata/ingestion/source/database/redshift/metadata.py:209-216

The docstring for _clear_reflection_cache() states:

"since _get_schema_column_info, get_columns, and _get_all_relation_info all use @reflection.cache"

However, this PR specifically removes @reflection.cache from all three of those functions. The docstring should be updated to reflect that the info_cache is being cleared for the remaining @reflection.cache-decorated functions (e.g., get_table_comment, get_view_definition, _get_pg_column_info) and to clean up any residual cached data.

Suggested fix

    def _clear_reflection_cache(self) -> None:
        """Clear the SQLAlchemy inspector's info_cache to release
        cached column / relation data from prior schemas.

        This prevents unbounded memory growth when ingesting many
        schemas by clearing residual cached data from remaining
        @reflection.cache-decorated functions (e.g., get_table_comment).
        """

🤖 Prompt for agents

Code Review: Memory optimization approach is sound, but the SQL injection risk in `get_partition_details` and the `table_name`-ignoring cache key in `_get_all_relation_info` remain unaddressed from prior review.

1. ⚠️ Bug: Cache key ignores table_name, may return stale results
   Files: ingestion/src/metadata/ingestion/source/database/redshift/utils.py:393-396

   The single-schema cache in `_get_all_relation_info` keys only on `schema`, but the function also filters by `table_name` (lines 400-403). If the function is first called with `(schema="public", table_name="users")`, the cached result contains only the "users" relation. A subsequent call with `(schema="public", table_name="orders")` would hit the cache and return the stale result (only "users"), missing "orders".
   
   The previous `@reflection.cache` decorator keyed on all arguments (including `table_name`), so this is a behavioral regression.
   
   The fix should include `table_name` in the cache key, or (better) only cache when `table_name` is None (the common bulk-fetch path), since caching single-table lookups provides little benefit.

   Suggested fix:
   # Single-schema cache: invalidate when schema or table_name changes
   table_name = kw.get("table_name", None)
   cached = getattr(self, "_relation_info_cache", None)
   if cached is not None and cached[0] == schema and cached[1] == table_name:
       return cached[2]

2. ⚠️ Security: SQL injection via string formatting in get_partition_details
   Files: ingestion/src/metadata/ingestion/source/database/redshift/metadata.py:201

   The new `schema_name` parameter is concatenated directly into the SQL query via f-string interpolation:
   
   ```python
   query += f" AND \"schema\" = '{schema_name}'"
   ```
   
   While the `schema_name` value typically comes from Redshift's own metadata catalog (and is thus unlikely to contain malicious content in normal usage), this is still a SQL injection vector. A schema name containing a single quote (e.g., `foo'; DROP TABLE ...--`) would break the query or allow injection.
   
   Note that the existing codebase has similar patterns in `_get_schema_column_info` and `_get_all_relation_info`, so this is consistent with the current style. However, for new code, it's better to use parameterized queries.
   
   The method `query_table_names_and_types` in the same file already demonstrates the correct pattern using `sql.text()` with named parameters (line 236-246). The same approach should be used here.
   
   **Suggested fix**: Use a parameterized query approach. Since the base query is a raw string constant, wrap the full query with `sql.text()` and use a bind parameter:
   
   ```python
   query = REDSHIFT_PARTITION_DETAILS
   if schema_name:
       query += " AND \"schema\" = :schema_name"
   results = self.connection.execute(
       statement=sql.text(query),
       {"schema_name": schema_name}
   ).fetchall()
   ```

   Suggested fix:
   query = REDSHIFT_PARTITION_DETAILS
   if schema_name:
       query += " AND \"schema\" = :schema_name"
   results = self.connection.execute(
       statement=sql.text(query),
       {"schema_name": schema_name} if schema_name else {},
   ).fetchall()

3. 💡 Quality: Stale docstring references removed @reflection.cache
   Files: ingestion/src/metadata/ingestion/source/database/redshift/metadata.py:209-216

   The docstring for `_clear_reflection_cache()` states:
   
   > "since _get_schema_column_info, get_columns, and _get_all_relation_info all use @reflection.cache"
   
   However, this PR specifically removes `@reflection.cache` from all three of those functions. The docstring should be updated to reflect that the info_cache is being cleared for the remaining `@reflection.cache`-decorated functions (e.g., `get_table_comment`, `get_view_definition`, `_get_pg_column_info`) and to clean up any residual cached data.

   Suggested fix:
   def _clear_reflection_cache(self) -> None:
       """Clear the SQLAlchemy inspector's info_cache to release
       cached column / relation data from prior schemas.
   
       This prevents unbounded memory growth when ingesting many
       schemas by clearing residual cached data from remaining
       @reflection.cache-decorated functions (e.g., get_table_comment).
       """

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-03-04T05:56:04Z

Quality Gate failed for 'open-metadata-ingestion'

Failed conditions
6.1% Coverage on New Code (required ≥ 20%)

See analysis details on SonarQube Cloud

Fix: Optimise redshift memory usage

c6d8350

ulixius9 requested a review from a team as a code owner February 16, 2026 09:33

ulixius9 temporarily deployed to test February 16, 2026 09:33 — with GitHub Actions Inactive

ulixius9 had a problem deploying to test February 16, 2026 09:33 — with GitHub Actions Failure

ulixius9 temporarily deployed to test February 16, 2026 09:33 — with GitHub Actions Inactive

github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Feb 16, 2026

ulixius9 mentioned this pull request Feb 16, 2026

Fix memory leak in Redshift ingestion by adding proper connection management #22888

Closed

gitar-bot Bot reviewed Feb 16, 2026

View reviewed changes

Merge branch 'main' into redshift_mem

51b740d

keshavmohta09 temporarily deployed to test March 2, 2026 18:51 — with GitHub Actions Inactive

keshavmohta09 had a problem deploying to test March 2, 2026 18:51 — with GitHub Actions Failure

keshavmohta09 temporarily deployed to test March 2, 2026 18:51 — with GitHub Actions Inactive

gitar-bot Bot reviewed Mar 2, 2026

View reviewed changes

keshavmohta09 temporarily deployed to test March 4, 2026 05:35 — with GitHub Actions Inactive

keshavmohta09 approved these changes Mar 4, 2026

View reviewed changes

ulixius9 merged commit f16bcd3 into main Mar 4, 2026
43 of 46 checks passed

ulixius9 deleted the redshift_mem branch March 4, 2026 06:23

ulixius9 added a commit that referenced this pull request Mar 4, 2026

Fix: Optimise redshift memory usage (#25904)

2c03620

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Optimise redshift memory usage#25904

Fix: Optimise redshift memory usage#25904
ulixius9 merged 2 commits intomainfrom
redshift_mem

ulixius9 commented Feb 16, 2026 •

edited by gitar-bot Bot

Loading

Uh oh!

gitar-bot Bot Feb 16, 2026

Uh oh!

github-actions Bot commented Feb 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Feb 16, 2026 •

edited

Loading

Uh oh!

gitar-bot Bot Feb 16, 2026

Uh oh!

d-miranda commented Mar 2, 2026

Uh oh!

gitar-bot Bot Mar 2, 2026

Uh oh!

gitar-bot Bot commented Mar 2, 2026 •

edited

Loading

Issue

Root Cause

Details

Uh oh!

sonarqubecloud Bot commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ulixius9 commented Feb 16, 2026 • edited by gitar-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Summary by Gitar

Type of change:

Checklist:

Uh oh!

gitar-bot Bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

Vulnerabilities (4)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (38)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (21)

🛡️ TRIVY SCAN RESULT 🛡️

Target: usr/bin/docker

Vulnerabilities (4)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

Uh oh!

github-actions Bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (9)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

Uh oh!

gitar-bot Bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

ulixius9 commented Feb 16, 2026 •

edited by gitar-bot Bot

Loading

github-actions Bot commented Feb 16, 2026 •

edited

Loading

Target: `openmetadata-ingestion:trivy (debian 12.12)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `usr/bin/docker`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

github-actions Bot commented Feb 16, 2026 •

edited

Loading

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/ingestion/pipelines/extended_sample_data.yaml`

Target: `/ingestion/pipelines/lineage.yaml`

Target: `/ingestion/pipelines/sample_data.json`

Target: `/ingestion/pipelines/sample_data.yaml`

Target: `/ingestion/pipelines/sample_data_aut.yaml`

Target: `/ingestion/pipelines/sample_usage.json`

Target: `/ingestion/pipelines/sample_usage.yaml`

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

gitar-bot Bot commented Mar 2, 2026 •

edited

Loading