Python CI Optimizations by SumanMaharana · Pull Request #26913 · open-metadata/OpenMetadata

SumanMaharana · 2026-04-01T06:46:37Z

Describe your changes:

Fixes #26898
This pull request refactors the Python test workflow to separate static checks from unit tests, introduces a more flexible dependency installation mode for the test environment setup, and improves test coverage handling. The changes streamline CI runs, reduce resource usage for static checks, and make coverage reporting more accurate.

Workflow and CI improvements:

Split the Python test workflow into two jobs: py-static-checks for static analysis (using only the main Python version and minimal dependencies) and py-unit-tests for running unit tests across multiple Python versions. This reduces unnecessary dependency installation and speeds up static checks. [1] [2] [3]
Added a new install-mode input to .github/actions/setup-openmetadata-test-environment, allowing either a full install (all dependencies) or a lightweight "dev-only" install for static checks. The static check job now uses the "dev-only" mode. [1] [2] [3]

Dependency and caching enhancements:

Improved caching for Python dependencies in the setup action by specifying relevant files for cache invalidation, speeding up CI runs.

Test coverage and reporting:

Modified unit_tests in noxfile.py to allow skipping coverage collection via a --no-cov flag, which is now used for non-main Python versions to avoid redundant coverage runs. [1] [2]
Updated the coverage configuration in pyproject.toml to omit generated code from coverage reports, making coverage metrics more meaningful.

Other workflow tweaks:

Adjusted integration test sharding for better distribution.
Updated workflow dependencies to include the new static check job in status and coverage aggregation.

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

General approach: Avoid running workflows that both (a) execute untrusted PR code and (b) use shared caches in the default-branch security context (pull_request_target). The standard fix is to change such workflows to trigger on pull_request instead of pull_request_target, so they run in the PR’s own context and cannot poison default-branch caches. This aligns with GitHub’s guidance for running untrusted code and with the background you provided.

Best fix here: Convert the three workflows that invoke or gate Python tests/builds from pull_request_target to pull_request, keeping their behavior otherwise the same. This removes the privileged default-branch context from these jobs while preserving label-based gating, change detection, and all existing steps. The composite action already avoids saving caches for pull_request_target events; after the fix, these workflows will never run as pull_request_target, so they will operate entirely in the unprivileged PR context and any caching will be scoped to the PR branch.

Concretely:

In .github/workflows/py-operator-build-test.yml

Change the on: section to use pull_request instead of pull_request_target, preserving the same types and paths filters.

No other changes are strictly needed; the checkout of github.event.pull_request.head.sha is still valid under pull_request.

In .github/workflows/py-tests-postgres.yml

Change the on: section trigger from pull_request_target to pull_request with the same types.

The rest of the workflow still functions: it already assumes github.event.pull_request is present.

In .github/workflows/py-tests.yml

Similarly, update the on: section from pull_request_target to pull_request.

Environment variables and jobs that reference github.event.pull_request continue to work for pull_request.

These changes eliminate the CodeQL-reported “privileged checkout of untrusted code in the default branch context” for the composite action path, and also address all the alert variants pointing at the pull_request_target triggers.

Copilot

Pull request overview

This PR optimizes Python CI by splitting static analysis from unit tests, adding a lighter dependency-install path for static checks, and refining how coverage is produced/combined to reduce redundant work across the test matrix.

Changes:

Split the Python test workflow into py-static-checks (single main Python) and py-unit-tests (matrix) to speed up static checks and reduce installs.
Add install-mode to the composite test-environment action to support a lighter “dev-only” install path.
Add a --no-cov option to the unit-tests nox session so only the main Python version collects coverage; update coverage config to omit generated code.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
`ingestion/pyproject.toml`	Omits generated code from coverage collection for more meaningful metrics.
`ingestion/noxfile.py`	Adds `--no-cov` support and durations reporting to reduce redundant coverage runs in CI.
`.github/workflows/py-tests.yml`	Splits static checks vs unit tests jobs; adjusts sharding and job dependencies for coverage/status aggregation.
`.github/actions/setup-openmetadata-test-environment/action.yml`	Introduces `install-mode` and improves uv caching behavior to speed CI setup.

Copilot · 2026-04-01T06:51:12Z

@@ -104,11 +101,52 @@
          nox --no-venv -s static-checks


The static-checks nox session is parameterized with python=get_python_versions(), so it may attempt to run for multiple Python versions. Since this job intentionally sets up only MAIN_PYTHON_VERSION, consider setting PYTHON_VERSIONS=${{ env.MAIN_PYTHON_VERSION }} for this step/job (the noxfile already honors PYTHON_VERSIONS) to ensure only one session runs and avoid redundant/incorrect interpreter selection when using --no-venv.

Suggested change

nox --no-venv -s static-checks

nox --no-venv -s static-checks

env:

PYTHON_VERSIONS: ${{ env.MAIN_PYTHON_VERSION }}

Copilot · 2026-04-01T06:51:12Z

      - name: Run Unit Tests
        run: |
          source env/bin/activate
          cd ingestion
-          nox --no-venv -s unit-tests
+          nox --no-venv -s unit-tests -- ${{ matrix.py-version != env.MAIN_PYTHON_VERSION && '--no-cov' || '' }}
        shell: bash


This job runs under a single matrix.py-version, but the nox sessions are defined with python=get_python_versions() (multiple versions). With --no-venv, nox may still enumerate/run/skip other versioned sessions depending on what interpreters are present on the runner. Setting PYTHON_VERSIONS=${{ matrix.py-version }} for this step/job would make the executed nox session unambiguous and avoid extra work.

Copilot · 2026-04-01T06:51:12Z

      matrix:
        py-version: ["3.10", "3.11", "3.12"]
        shard:
          - name: "shard-1"
            nox-args: >-
              tests/integration/ometa
+          - name: "shard-2"
+            nox-args: >-
              tests/integration/postgres


The integration job is matrixed by py-version, but the nox sessions are defined with python=get_python_versions() (multiple versions). Consider setting PYTHON_VERSIONS=${{ matrix.py-version }} at the job/step level so nox selects only the intended versioned session when running with --no-venv.

Copilot · 2026-04-01T06:51:13Z

+    skip_cov = "--no-cov" in args
+    if skip_cov:
+        extra_flags = [f for f in extra_flags if f != "--no-cov"]
+


--no-cov is treated as a custom flag for this nox session and stripped before passing args to pytest. Please add a short inline comment explaining that this flag is meant for CI to skip coverage on non-main Python versions (and that it is not a pytest option) to avoid confusion for local runs.

github-actions · 2026-04-01T06:57:52Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`libpng-dev`	CVE-2026-33416	🚨 HIGH	1.6.39-2+deb12u3	1.6.39-2+deb12u4
`libpng-dev`	CVE-2026-33636	🚨 HIGH	1.6.39-2+deb12u3	1.6.39-2+deb12u4
`libpng16-16`	CVE-2026-33416	🚨 HIGH	1.6.39-2+deb12u3	1.6.39-2+deb12u4
`libpng16-16`	CVE-2026-33636	🚨 HIGH	1.6.39-2+deb12u3	1.6.39-2+deb12u4

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (37)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.airlift:aircompressor`	CVE-2025-67721	🚨 HIGH	0.27	2.0.3
`io.netty:netty-codec-http`	CVE-2026-33870	🚨 HIGH	4.1.96.Final	4.1.132.Final, 4.2.10.Final
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	CVE-2026-33871	🚨 HIGH	4.1.96.Final	4.1.132.Final, 4.2.11.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.spark:spark-core_2.12`	CVE-2025-54920	🚨 HIGH	3.5.6	3.5.7
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (13)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`apache-airflow`	CVE-2026-26929	🚨 HIGH	3.1.7	3.1.8
`apache-airflow`	CVE-2026-28779	🚨 HIGH	3.1.7	3.1.8
`apache-airflow`	CVE-2026-30911	🚨 HIGH	3.1.7	3.1.8
`cryptography`	CVE-2026-26007	🚨 HIGH	42.0.8	46.0.5
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`pyOpenSSL`	CVE-2026-27459	🚨 HIGH	24.1.0	26.0.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/extended_sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/lineage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data_aut.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

No Vulnerabilities Found

github-actions · 2026-04-01T06:59:01Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion:trivy (debian 12.13)`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (37)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.airlift:aircompressor`	CVE-2025-67721	🚨 HIGH	0.27	2.0.3
`io.netty:netty-codec-http`	CVE-2026-33870	🚨 HIGH	4.1.96.Final	4.1.132.Final, 4.2.10.Final
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	CVE-2026-33871	🚨 HIGH	4.1.96.Final	4.1.132.Final, 4.2.11.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.spark:spark-core_2.12`	CVE-2025-54920	🚨 HIGH	3.5.6	3.5.7
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (27)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`Authlib`	CVE-2026-27962	🔥 CRITICAL	1.6.6	1.6.9
`Authlib`	CVE-2026-28490	🚨 HIGH	1.6.6	1.6.9
`Authlib`	CVE-2026-28498	🚨 HIGH	1.6.6	1.6.9
`Authlib`	CVE-2026-28802	🚨 HIGH	1.6.6	1.6.7
`PyJWT`	CVE-2026-32597	🚨 HIGH	2.11.0	2.12.0
`Werkzeug`	CVE-2024-34069	🚨 HIGH	2.2.3	3.0.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.12.12	3.13.3
`apache-airflow`	CVE-2026-26929	🚨 HIGH	3.1.7	3.1.8
`apache-airflow`	CVE-2026-28779	🚨 HIGH	3.1.7	3.1.8
`apache-airflow`	CVE-2026-30911	🚨 HIGH	3.1.7	3.1.8
`apache-airflow-providers-http`	CVE-2025-69219	🚨 HIGH	5.6.4	6.0.0
`cryptography`	CVE-2026-26007	🚨 HIGH	42.0.8	46.0.5
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`litellm`	CVE-2026-35030	🔥 CRITICAL	1.81.6	1.83.0
`litellm`	CVE-2026-35029	🚨 HIGH	1.81.6	1.83.0
`protobuf`	CVE-2026-0994	🚨 HIGH	4.25.8	6.33.5, 5.29.6
`pyOpenSSL`	CVE-2026-27459	🚨 HIGH	24.1.0	26.0.0
`pyasn1`	CVE-2026-30922	🚨 HIGH	0.6.2	0.6.3
`ray`	CVE-2025-62593	🔥 CRITICAL	2.47.1	2.52.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`tornado`	CVE-2026-31958	🚨 HIGH	6.5.4	6.5.5
`tornado`	CVE-2026-35536	🚨 HIGH	6.5.4	6.5.5
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `usr/bin/docker`

Vulnerabilities (2)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`stdlib`	CVE-2025-68121	🔥 CRITICAL	v1.25.6	1.24.13, 1.25.7, 1.26.0-rc.3
`stdlib`	CVE-2026-25679	🚨 HIGH	v1.25.6	1.25.8, 1.26.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

No Vulnerabilities Found

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Copilot · 2026-04-01T20:14:18Z

    @classmethod
    def setUpClass(cls):
        """set up class"""
+        cls.addClassCleanup(cls.tearDownClass)
        service: DatabaseService = cls.metadata.create_or_update(


setUpClass already triggers tearDownClass automatically in unittest. Adding cls.addClassCleanup(cls.tearDownClass) will cause tearDownClass to run twice on a normal successful run, which will likely error here because tearDownClass assumes the service still exists (get_by_name(...).id.root without a None check) and also removes the sqlite file. Instead, register idempotent cleanup callbacks for the individual resources (service deletion, file removal), or make tearDownClass safe to call multiple times and avoid registering it as a class cleanup.

Copilot · 2026-04-01T20:14:19Z


    @classmethod
    def setUpClass(cls) -> None:
+        cls.addClassCleanup(cls.tearDownClass)


unittest will call tearDownClass automatically; registering cls.addClassCleanup(cls.tearDownClass) means the class teardown runs twice on success. Even if most operations are wrapped in accumulate_errors, this can still introduce redundant work and potential flakiness (e.g., stopping an already-stopped container). Prefer adding class cleanups for the specific resources created in setUpClass (container stop + service deletion) or ensure tearDownClass is fully idempotent and do not register it as a class cleanup.

Suggested change

cls.addClassCleanup(cls.tearDownClass)

Copilot · 2026-04-01T20:14:19Z

+
+    - name: Install msodbcsql18
+      if: ${{ inputs.install-mode == 'full' }}
+      run: ACCEPT_EULA=Y sudo apt-get -qq install -y msodbcsql18


The msodbcsql18 install step no longer runs apt-get update (the previous combined install did). If the package lists are stale or the caching action doesn't refresh them, this can fail with "Unable to locate package" and the -qq flag will make diagnosis harder. Consider running sudo apt-get update (or ensuring the cache step refreshes indexes) before installing msodbcsql18, and/or avoid -qq for this step so failures are debuggable.

Suggested change

run: ACCEPT_EULA=Y sudo apt-get -qq install -y msodbcsql18

run: |

sudo apt-get update

ACCEPT_EULA=Y sudo apt-get install -y msodbcsql18

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.

github-actions · 2026-04-02T08:54:55Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Copilot · 2026-04-03T09:12:12Z

@@ -154,19 +154,37 @@ def unit_tests(session):
    test_paths = [a for a in args if not a.startswith("-")]
    extra_flags = [a for a in args if a.startswith("-")]



The current posargs parsing splits flags vs paths by startswith('-'), which breaks common pytest flags that take a separate value (e.g. the docstring example -k test_name becomes ['test_name', '-k'] and will be interpreted as a path + a flag missing its value). It also breaks --dist load (the load token is treated as a path). Consider preserving argument order and/or using an argparse-style parser for the handful of flags you need to intercept (--no-cov, --dist) while passing the rest through unchanged.

Copilot · 2026-04-03T09:12:13Z

    if standalone:
        args.remove("--standalone")

+    skip_cov = "--no-cov" in args
+    if skip_cov:
+        args.remove("--no-cov")


The --no-cov handling here relies on later posargs processing that splits flags vs paths using startswith('-'). That approach breaks pytest flags that take a separate value (e.g. -k expr, -m marker, --maxfail 1), since the value token is treated as a test path and the flag loses its value. Consider switching to an order-preserving pass-through (only removing --standalone, --no-cov, and --workers from the args list) to avoid reordering or losing paired values.

Copilot · 2026-04-03T09:12:13Z

+    - name: Cache and install Ubuntu dependencies
+      uses: awalsh128/cache-apt-pkgs-action@latest
+      with:
+        packages: unixodbc-dev python3-venv librdkafka-dev gcc libsasl2-dev build-essential libssl-dev libffi-dev libevent-dev python3-dev libkrb5-dev tdsodbc
+        version: 1.0


Using awalsh128/cache-apt-pkgs-action@latest makes the CI supply chain non-deterministic (the referenced code can change over time) and is generally discouraged for security/reproducibility. Pin this action to an immutable version tag or, ideally, a commit SHA.

Suggested change

- name: Cache and install Ubuntu dependencies

uses: awalsh128/cache-apt-pkgs-action@latest

with:

packages: unixodbc-dev python3-venv librdkafka-dev gcc libsasl2-dev build-essential libssl-dev libffi-dev libevent-dev python3-dev libkrb5-dev tdsodbc

version: 1.0

- name: Install Ubuntu dependencies

run: |

sudo apt-get update -qq

sudo apt-get install -y \

unixodbc-dev \

python3-venv \

librdkafka-dev \

gcc \

libsasl2-dev \

build-essential \

libssl-dev \

libffi-dev \

libevent-dev \

python3-dev \

libkrb5-dev \

tdsodbc

shell: bash

Copilot · 2026-04-03T09:12:13Z

+        while status not in ("success", "completed"):
+            try:
+                response = metadata.client.get(
+                    "/apps/name/SearchIndexingApplication/status?offset=0&limit=1"
                )
-        except Exception as e:
-            if "TimeoutError" in str(type(e).__name__) or "RuntimeError" in str(
-                type(e).__name__
-            ):
-                raise
-            time.sleep(1)
-            continue
-        time.sleep(2)
+                if len(response["data"]) == 0:
+                    raise RuntimeError("No reindexing status found after triggering")
+
+                status = response["data"][0]["status"]
+
+                if status in ("failed", "error"):
+                    reindex_failed = True
+                    break
+
+                if time.time() - start_complete > complete_timeout:
+                    raise TimeoutError(
+                        f"Timed out waiting for reindexing to complete. "
+                        f"Current status: {status}, elapsed: {int(time.time() - start_complete)}s"
+                    )
+            except Exception as e:
+                if "TimeoutError" in str(type(e).__name__) or "RuntimeError" in str(
+                    type(e).__name__
+                ):
+                    raise
+                time.sleep(1)
+                continue


In the polling loop, the timeout check is only evaluated inside the try block after a successful response parse. If metadata.client.get(...) repeatedly raises a non-TimeoutError/non-RuntimeError exception (e.g. transient network/JSON/KeyError), the except path will continue indefinitely without ever checking complete_timeout, so this loop can hang forever. Consider checking elapsed time on every iteration (including exception paths) and failing after complete_timeout (and optionally counting consecutive failures).

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-04T18:40:09Z

+    # pytest doesn't receive conflicting values.
+    has_dist_override = any(
+        f == "--dist" or f.startswith("--dist=") for f in extra_flags
+    )


The posargs parsing splits session.posargs into test_paths vs extra_flags based only on a leading -. This breaks for pytest flags that take a separate value (e.g. --dist load, -k expr, --maxfail 1): the value is incorrectly treated as a test path, and for --dist specifically pytest will receive --dist without its required argument. Consider either (a) not splitting at all and just append session.posargs as-is, or (b) use a proper argparse-like scan that keeps flag/value pairs together (handling --flag=value and --flag value).

Copilot · 2026-04-04T18:40:10Z

+    try:
+        cm = try_bind(container, 1433, 1433)
+        container = cm.__enter__()
+    except (docker.errors.BuildError, docker.errors.APIError) as exc:
+        pytest.skip(f"MSSQL container unavailable: {exc}")
+
+    try:


The fixture manually calls cm.__enter__() / cm.__exit__(None, None, None) instead of using a with statement. In the finally block, __exit__ is always invoked with (None, None, None) even if an exception occurred inside the fixture body; this can prevent context managers from handling cleanup correctly based on the exception info. Prefer with try_bind(...) as container: and do the skip logic around that, or call cm.__exit__(*sys.exc_info()) in finally so the context manager receives the real exception details.

gitar-bot · 2026-04-05T18:42:56Z

+    has_dist_override = any(a == "--dist" or a.startswith("--dist=") for a in args)
+
+    # If the caller provided no explicit test paths, inject the default.
+    has_explicit_paths = any(not a.startswith("-") for a in args)


⚠️ Bug: has_explicit_paths false-positive on paired-flag values

The comment at lines 152-156 correctly explains why splitting args by startswith("-") is dangerous — paired options like -k expr or --maxfail 1 have value tokens that don't start with -. However, the has_explicit_paths check on line 170 does exactly that split:

has_explicit_paths = any(not a.startswith("-") for a in args)

If a developer runs nox -s unit-tests -- -k test_something, args is ["-k", "test_something"]. The value "test_something" doesn't start with -, so has_explicit_paths is True, and the default tests/unit/ path is never injected. Pytest then receives no test path and falls back to testpaths from config or . — which may discover wrong tests or none at all.

The same issue exists in integration_tests at line 244.

This doesn't affect CI today because all shards pass explicit test paths in nox-args, but it breaks local developer workflows using -k, --maxfail N, etc.

Suggested fix:

Use a more robust heuristic, e.g. check if any arg looks like a path (contains '/' or os.path.exists): has_explicit_paths = any( not a.startswith("-") and ("/" in a or os.path.exists(a)) for a in args )

_{Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion}

gitar-bot · 2026-04-05T18:42:58Z

+    try:
+        with try_bind(container, 1433, 1433) as container:
+            docker_container = container.get_wrapped_container()
+            copy_dir_to_container(str(data_dir), docker_container, "/data")
+            res = docker_container.exec_run(
+                [
+                    "bash",
+                    "-c",
+                    " ".join(
+                        [
+                            "/opt/mssql-tools*/bin/sqlcmd",
+                            "-U",
+                            container.username,
+                            "-P",
+                            f"'{container.password}'",


💡 Edge Case: Wider exception catch may skip setup failures as skipped tests

The refactored mssql_container fixture now wraps the entire setup (docker exec, engine creation, SQL execution) inside try: with try_bind(...): ... except (docker.errors.BuildError, docker.errors.APIError). Previously, only the try_bind / __enter__ call was guarded by the docker-error catch; setup failures like exec_run or create_engine raising an APIError would propagate as real failures.

Now, any docker.errors.APIError raised during data copying, SQL execution, or engine creation would be caught and turned into pytest.skip(), masking legitimate setup bugs. While these specific exception types are uncommon after container start, the catch scope is broader than intended.

_{Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion}

Copilot

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Copilot · 2026-04-05T18:45:41Z

+    # If the caller provided no explicit test paths, inject the default.
+    has_explicit_paths = any(not a.startswith("-") for a in args)

    pytest_args = [
        "-c",
        "pyproject.toml",
-        "--cov=metadata",
-        "--cov-branch",
-        "--cov-config=pyproject.toml",
        "--junitxml=junit/test-results-unit.xml",
        "-n",
        "auto",
-        "--dist",
-        "loadfile",
+        "--durations=20",
    ]

-    pytest_args.extend(test_paths or ["tests/unit/"])
-    pytest_args.extend(extra_flags)
+    if not has_dist_override:
+        pytest_args += ["--dist", "loadfile"]
+
+    if not skip_cov:
+        pytest_args += [
+            "--cov=metadata",
+            "--cov-branch",
+            "--cov-config=pyproject.toml",
+        ]
+
+    if not has_explicit_paths:
+        pytest_args.append("tests/unit/")
+


has_explicit_paths = any(not a.startswith('-') for a in args) will incorrectly treat values for paired pytest options (e.g. -k expr, --maxfail 1) as “explicit paths”. As a result, nox -s unit-tests -- -k foo won’t inject the default tests/unit/ path and will run collection from the repo root instead. Consider detecting explicit paths via filesystem existence / globbing (e.g. Path(a).exists() for non-flag tokens) or a stricter heuristic (tokens containing / or ending with .py) so flag values don’t suppress the default path.

Copilot · 2026-04-05T18:45:41Z

+    # We only strip our custom flags (--standalone, --no-cov, --workers) from posargs
+    # and pass everything else through to pytest in original order. We avoid splitting
+    # by startswith("-") because that breaks paired options like "-k expr" or
+    # "--maxfail 1" — the value token would be misclassified as a test path.
    args = list(session.posargs)
    standalone = "--standalone" in args
    if standalone:
        args.remove("--standalone")

+    # --no-cov is a CI-only flag to skip coverage on non-main Python versions.
+    skip_cov = "--no-cov" in args
+    if skip_cov:
+        args.remove("--no-cov")
+
    workers = os.environ.get("PYTEST_INTEGRATION_WORKERS", "0")
    if "--workers" in args:
        idx = args.index("--workers")
        workers = args[idx + 1]
        args = args[:idx] + args[idx + 2 :]

-    # Separate test paths from pytest flags in posargs
-    test_paths = [a for a in args if not a.startswith("-")]
-    extra_flags = [a for a in args if a.startswith("-")]
+    # If no explicit test paths, inject the default directory.
+    has_explicit_paths = any(not a.startswith("-") for a in args)

    pytest_args = [
        "-c",
        "pyproject.toml",
-        "--cov=metadata",
-        "--cov-branch",
-        "--cov-config=pyproject.toml",
        "--junitxml=junit/test-results-integration.xml",
    ]

-    if not standalone:
+    if not skip_cov:
+        pytest_args += [
+            "--cov=metadata",
+            "--cov-branch",
+            "--cov-config=pyproject.toml",
+        ]
+
+    if not standalone and not skip_cov:
        pytest_args.append("--cov-append")
+
    use_xdist = int(workers) > 0
    if use_xdist:
        pytest_args.extend([f"-n{workers}", "--dist=loadgroup"])

-    pytest_args.extend(test_paths or ["tests/integration/"])
-    pytest_args.extend(extra_flags)
+    if not has_explicit_paths:
+        pytest_args.append("tests/integration/")
+


Same issue as unit-tests: has_explicit_paths = any(not a.startswith('-') for a in args) will be true for paired-option values like -k expr or --maxfail 1, which prevents injecting the default tests/integration/ path. This can unexpectedly run all tests when callers only intended to filter. Use a stricter “is this a path?” check (e.g. Path(token).exists()/glob, or //.py heuristics) before deciding to omit the default path.

Copilot · 2026-04-05T18:45:41Z

+logger = logging.getLogger(__name__)
+
+
+def _verify_tables_readable(engine: Engine, table_names: list):
+    """Verify all tables are readable after data loading.


logger is defined but never used. If logging isn’t needed here, remove import logging and the logger = logging.getLogger(__name__) assignment; otherwise, use logger in the retry loop (e.g., to log attempts/exceptions) to justify keeping it.

Copilot · 2026-04-05T18:45:42Z

+                transaciton = conn.begin()
+                conn.execute(
+                    text(
+                        f"SELECT * INTO {db_name}.SalesLT.CustomerCopy FROM {db_name}.SalesLT.Customer;"
+                    )
+                )
+                transaciton.commit()


Typo in variable name transaciton makes the fixture harder to read/maintain. Rename to transaction (and update uses) to avoid confusion.

Suggested change

transaciton = conn.begin()

conn.execute(

text(

f"SELECT * INTO {db_name}.SalesLT.CustomerCopy FROM {db_name}.SalesLT.Customer;"

)

)

transaciton.commit()

transaction = conn.begin()

conn.execute(

text(

f"SELECT * INTO {db_name}.SalesLT.CustomerCopy FROM {db_name}.SalesLT.Customer;"

)

)

transaction.commit()

Copilot · 2026-04-05T18:45:42Z

+    - name: Restore uv package cache
+      id: cache-uv
+      uses: actions/cache/restore@v4
+      with:
+        path: ${{ env.UV_CACHE_DIR }}
+        key: uv-${{ runner.os }}-py${{ inputs.python-version }}-${{ inputs.install-mode }}-${{ hashFiles('ingestion/setup.py') }}
+        restore-keys: |
+          uv-${{ runner.os }}-py${{ inputs.python-version }}-${{ inputs.install-mode }}-
+          uv-${{ runner.os }}-py${{ inputs.python-version }}-
+


The cache steps use path: ${{ env.UV_CACHE_DIR }}, but this composite action doesn’t define UV_CACHE_DIR anywhere. If astral-sh/setup-uv doesn’t export it on the runner, this resolves to an empty path and can cause actions/cache to error or silently skip caching. Consider setting UV_CACHE_DIR explicitly (e.g. ~/.cache/uv) and/or using a known output/input from setup-uv to determine the cache directory.

gitar-bot · 2026-04-06T05:58:25Z

Code Review ⚠️ Changes requested 6 resolved / 8 findings

Python CI optimizations resolved six issues including dev dependency installation and fixture teardown problems, but four findings remain open: has_explicit_paths misidentifies paired-flag values as test paths in both test shards, exception handling in the MSSQL fixture may skip setup failures, and the docker error catch is both too narrow and too broad for safe operation.

⚠️

Bug: has_explicit_paths false-positive on paired-flag values

📄 ingestion/noxfile.py:170 📄 ingestion/noxfile.py:244

The comment at lines 152-156 correctly explains why splitting args by startswith("-") is dangerous — paired options like -k expr or --maxfail 1 have value tokens that don't start with -. However, the has_explicit_paths check on line 170 does exactly that split:

has_explicit_paths = any(not a.startswith("-") for a in args)

If a developer runs nox -s unit-tests -- -k test_something, args is ["-k", "test_something"]. The value "test_something" doesn't start with -, so has_explicit_paths is True, and the default tests/unit/ path is never injected. Pytest then receives no test path and falls back to testpaths from config or . — which may discover wrong tests or none at all.

The same issue exists in integration_tests at line 244.

This doesn't affect CI today because all shards pass explicit test paths in nox-args, but it breaks local developer workflows using -k, --maxfail N, etc.

Suggested fix

Use a more robust heuristic, e.g. check if any arg
looks like a path (contains '/' or os.path.exists):

    has_explicit_paths = any(
        not a.startswith("-") and ("/" in a or os.path.exists(a))
        for a in args
    )

💡 Edge Case: Wider exception catch may skip setup failures as skipped tests

📄 ingestion/tests/integration/sql_server/conftest.py:85-99

The refactored mssql_container fixture now wraps the entire setup (docker exec, engine creation, SQL execution) inside try: with try_bind(...): ... except (docker.errors.BuildError, docker.errors.APIError). Previously, only the try_bind / __enter__ call was guarded by the docker-error catch; setup failures like exec_run or create_engine raising an APIError would propagate as real failures.

Now, any docker.errors.APIError raised during data copying, SQL execution, or engine creation would be caught and turned into pytest.skip(), masking legitimate setup bugs. While these specific exception types are uncommon after container start, the catch scope is broader than intended.

✅ 6 resolved

✅ Bug: dev-only install mode doesn't install dev deps for static checks

📄 .github/actions/setup-openmetadata-test-environment/action.yml:99-104 📄 .github/workflows/py-tests.yml:95 📄 ingestion/noxfile.py:41-42 📄 ingestion/noxfile.py:126-128
The new dev-only install mode (action.yml lines 99-104) only installs nox, but the static-checks nox session requires basedpyright (and other tools from the [dev] extra). Since the workflow runs with --no-venv, the noxfile's install() function is a no-op (it skips installs for PassthroughEnv), so the session cannot install its own dependencies either. This means the py-static-checks job will fail at runtime with a "basedpyright not found" error.

The fix is to have the dev-only mode install the [dev] extra in addition to nox.

✅ Bug: Missing service_name in deepcopy'd config may affect test_ingestion

📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:204-209 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:249 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:320 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:352 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:438 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:524 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:618 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:727
In test_orm_profiler_e2e.py, the ingest fixture now correctly avoids mutating the global ingestion_config by creating a shallow copy with spread syntax. The subsequent test functions (test_profiler_workflow, test_workflow_sample_profile, etc.) each add workflow_config["source"]["serviceName"] = service_name after deepcopy(ingestion_config). However, ingestion_config is a module-level dict that previously had serviceName set by the ingest fixture (via mutation). Now that ingest no longer mutates it, the deepcopy in each test will get a config without serviceName set — which is correctly fixed by the added lines. This looks correct as-is. However, check that there are no other test functions in this file that deepcopy(ingestion_config) without adding serviceName.

✅ Security: Third-party GitHub Action pinned to @latest tag

📄 .github/actions/setup-openmetadata-test-environment/action.yml:36
The new awalsh128/cache-apt-pkgs-action@latest usage pins to a mutable tag that can change at any time without notice. If the upstream repository is compromised or a breaking change is pushed, CI pipelines would be affected immediately. Every other action in the file and workflow uses a pinned version (e.g., @v4, @v1.3.4), so this breaks the existing security posture.

This is a supply chain risk: a malicious update to the action could exfiltrate secrets or inject code into the build environment.

✅ Bug: addClassCleanup(tearDownClass) causes double teardown execution

📄 ingestion/tests/integration/test_suite/test_e2e_workflow.py:146 📄 ingestion/tests/integration/test_suite/test_e2e_workflow.py:227-241 📄 ingestion/tests/integration/profiler/test_nosql_profiler.py:134 📄 ingestion/tests/integration/profiler/test_nosql_profiler.py:155-159
In Python's unittest, when setUpClass succeeds, the framework automatically calls tearDownClass after all tests, and then runs any registered addClassCleanup callbacks. So tearDownClass will be called twice in the normal (success) path.

In test_e2e_workflow.py, this will crash on the second invocation because:

metadata.get_by_name(...) returns None after the service was already deleted, causing AttributeError: 'NoneType' object has no attribute 'id' on line 234

os.remove(cls.db_path) raises FileNotFoundError since the file was already removed

In test_nosql_profiler.py, the accumulate_errors context manager may swallow errors, but mongo_container.stop() on an already-stopped container is still wasteful/error-prone.

The correct pattern is to either:

Extract cleanup into a separate helper and register that with addClassCleanup instead of tearDownClass itself, removing the tearDownClass override entirely

Or make tearDownClass fully idempotent with guard checks

✅ Edge Case: Workflow expression comparison to env var may not work

📄 .github/workflows/py-tests.yml:158 📄 .github/workflows/py-tests.yml:308
In the GitHub Actions workflow, the expression ${{ matrix.py-version != env.MAIN_PYTHON_VERSION && '--no-cov' || '' }} compares matrix.py-version against env.MAIN_PYTHON_VERSION. However, in GitHub Actions expression contexts, env.* references may not resolve correctly inside ${{ }} expressions — the env context is available but the comparison depends on the exact context where it's evaluated. If MAIN_PYTHON_VERSION is defined as a workflow-level env var (which it appears to be), this should work, but it's worth verifying in CI that non-3.10 versions actually receive --no-cov and 3.10 does not. A safer approach would be to use a matrix include to set a skip-cov flag explicitly per version.

...and 1 more resolved from earlier reviews

🤖 Prompt for agents

Code Review: Python CI optimizations resolved six issues including dev dependency installation and fixture teardown problems, but four findings remain open: `has_explicit_paths` misidentifies paired-flag values as test paths in both test shards, exception handling in the MSSQL fixture may skip setup failures, and the docker error catch is both too narrow and too broad for safe operation.

1. ⚠️ Bug: `has_explicit_paths` false-positive on paired-flag values
   Files: ingestion/noxfile.py:170, ingestion/noxfile.py:244

   The comment at lines 152-156 correctly explains why splitting args by `startswith("-")` is dangerous — paired options like `-k expr` or `--maxfail 1` have value tokens that don't start with `-`. However, the `has_explicit_paths` check on line 170 does exactly that split:
   
   ```python
   has_explicit_paths = any(not a.startswith("-") for a in args)
   ```
   
   If a developer runs `nox -s unit-tests -- -k test_something`, `args` is `["-k", "test_something"]`. The value `"test_something"` doesn't start with `-`, so `has_explicit_paths` is `True`, and the default `tests/unit/` path is never injected. Pytest then receives no test path and falls back to `testpaths` from config or `.` — which may discover wrong tests or none at all.
   
   The same issue exists in `integration_tests` at line 244.
   
   This doesn't affect CI today because all shards pass explicit test paths in `nox-args`, but it breaks local developer workflows using `-k`, `--maxfail N`, etc.

   Suggested fix:
   Use a more robust heuristic, e.g. check if any arg
   looks like a path (contains '/' or os.path.exists):
   
       has_explicit_paths = any(
           not a.startswith("-") and ("/" in a or os.path.exists(a))
           for a in args
       )

2. 💡 Edge Case: Wider exception catch may skip setup failures as skipped tests
   Files: ingestion/tests/integration/sql_server/conftest.py:85-99

   The refactored `mssql_container` fixture now wraps the entire setup (docker exec, engine creation, SQL execution) inside `try: with try_bind(...): ... except (docker.errors.BuildError, docker.errors.APIError)`. Previously, only the `try_bind` / `__enter__` call was guarded by the docker-error catch; setup failures like `exec_run` or `create_engine` raising an `APIError` would propagate as real failures.
   
   Now, any `docker.errors.APIError` raised during data copying, SQL execution, or engine creation would be caught and turned into `pytest.skip()`, masking legitimate setup bugs. While these specific exception types are uncommon after container start, the catch scope is broader than intended.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

Copilot

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated no new comments.

sonarqubecloud · 2026-04-06T06:55:04Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

IceS2 · 2026-04-07T07:06:16Z

Hey @SumanMaharana, I have only one question...

What are we gaining from splitting the static checks and unit tests jobs if the bottleneck for the whole workflow are the integration tests? Aren't we just repeating some steps and using more resources for no gain?

SumanMaharana · 2026-04-07T08:04:38Z

Hey @SumanMaharana, I have only one question...

What are we gaining from splitting the static checks and unit tests jobs if the bottleneck for the whole workflow are the integration tests? Aren't we just repeating some steps and using more resources for no gain?

Before this PR, static checks ran inside the py-unit-tests matrix across all 3 Python versions. Since linting and type checking are not Python-version-sensitive, we were running identical static checks 3 times (~5 min × 3 runners = ~15 runner-minutes), and the unit tests couldn't start until static checks on each runner finished.

After this PR, static checks run exactly once on Python 3.10 with a lighter install-mode: dev-only (~6 runner-minutes total), while the unit test shards and integration test shards all start at T=0 in parallel. The CI critical path looks like:

T=0 ──┬── py-static-checks   (~6 min)      ──┐
      ├── py-unit-tests      (~12-15 min)  ──┤── py-tests-status
      └── py-integration-tests(~40-43 min) ──┘

The "repeated steps" (checkout, wait-for-label, env setup) are all sub-minute cached operations — they don't materially contribute to runner-minutes. The expensive step, running nox, is cheaper for py-static-checks than it used to be (dev-only deps, no test collection overhead).

Python CI Optimizations

9a57918

SumanMaharana requested review from a team, akash-jain-10, harshach and tutte as code owners April 1, 2026 06:46

Copilot AI review requested due to automatic review settings April 1, 2026 06:46

Merge branch 'main' into optimize-py-ci

63cf74e

github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Apr 1, 2026

Copilot started reviewing on behalf of SumanMaharana April 1, 2026 06:47 View session

github-advanced-security AI found potential problems Apr 1, 2026

View reviewed changes

gitar-bot bot reviewed Apr 1, 2026

View reviewed changes

Comment thread .github/actions/setup-openmetadata-test-environment/action.yml

Copilot AI reviewed Apr 1, 2026

View reviewed changes

SumanMaharana had a problem deploying to test April 1, 2026 06:57 — with GitHub Actions Error

fix workflow file

412d830

SumanMaharana had a problem deploying to test April 1, 2026 07:19 — with GitHub Actions Error

fix static errors

71756c6

Copilot AI review requested due to automatic review settings April 1, 2026 08:00

added workflow optimizations

ef5be9b

gitar-bot bot reviewed Apr 1, 2026

View reviewed changes

Comment thread ingestion/tests/integration/test_suite/test_e2e_workflow.py Outdated

Copilot AI reviewed Apr 1, 2026

View reviewed changes

SumanMaharana added 2 commits April 2, 2026 11:40

fixed tests

5ed4f37

balance tests between shards

c229bb3

Copilot AI reviewed Apr 2, 2026

View reviewed changes

SumanMaharana added 2 commits April 2, 2026 18:17

checkstyle

b7f2066

Merge branch 'main' into optimize-py-ci

b9c74a5

gitar-bot bot reviewed Apr 3, 2026

View reviewed changes

Comment thread .github/workflows/py-tests.yml

gitar-bot bot reviewed Apr 3, 2026

View reviewed changes

Comment thread ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py

Copilot AI reviewed Apr 3, 2026

View reviewed changes

fix tests

26f3e26

gitar-bot bot reviewed Apr 3, 2026

View reviewed changes

Comment thread ingestion/tests/integration/trino/conftest.py Outdated

SumanMaharana added 2 commits April 4, 2026 01:23

added more balancing

b17e3de

fix databricks retry issue

1d3ba1f

Copilot AI reviewed Apr 4, 2026

View reviewed changes

SumanMaharana added 2 commits April 5, 2026 01:30

keep the shards for unit tests

e19f715

addressed comments

d585622

gitar-bot bot reviewed Apr 5, 2026

View reviewed changes

Copilot AI reviewed Apr 5, 2026

View reviewed changes

SumanMaharana added 2 commits April 6, 2026 00:18

addressed comments

d77cb6a

Merge branch 'main' into optimize-py-ci

205c2b6

Copilot AI reviewed Apr 6, 2026

View reviewed changes

IceS2 approved these changes Apr 7, 2026

View reviewed changes

@@ -12,7 +12,7 @@
             name: py-operator-build-test
             on:
               workflow_dispatch:
-              pull_request_target:
+              pull_request:
                 types: [labeled, opened, synchronize, reopened, ready_for_review]
                 paths:
                   - "ingestion/**"

@@ -12,7 +12,7 @@
             name: py-tests-postgres
             on:
               workflow_dispatch:
-              pull_request_target:
+              pull_request:
                 types: [labeled, opened, synchronize, reopened, ready_for_review]
             permissions:

@@ -12,7 +12,7 @@
             name: py-tests
             on:
               workflow_dispatch:
-              pull_request_target:
+              pull_request:
                 types: [labeled, opened, synchronize, reopened, ready_for_review]
             permissions:

-      run: ACCEPT_EULA=Y sudo apt-get -qq install -y msodbcsql18
+      run: |
+        sudo apt-get update
+        ACCEPT_EULA=Y sudo apt-get install -y msodbcsql18

		@@ -154,19 +154,37 @@ def unit_tests(session):
		test_paths = [a for a in args if not a.startswith("-")]
		extra_flags = [a for a in args if a.startswith("-")]

-    - name: Cache and install Ubuntu dependencies
-      uses: awalsh128/cache-apt-pkgs-action@latest
-      with:
-        packages: unixodbc-dev python3-venv librdkafka-dev gcc libsasl2-dev build-essential libssl-dev libffi-dev libevent-dev python3-dev libkrb5-dev tdsodbc
-        version: 1.0
+    - name: Install Ubuntu dependencies
+      run: |
+        sudo apt-get update -qq
+        sudo apt-get install -y \
+          unixodbc-dev \
+          python3-venv \
+          librdkafka-dev \
+          gcc \
+          libsasl2-dev \
+          build-essential \
+          libssl-dev \
+          libffi-dev \
+          libevent-dev \
+          python3-dev \
+          libkrb5-dev \
+          tdsodbc
+      shell: bash

Conversation

SumanMaharana commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

Checklist:

Uh oh!

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

Vulnerabilities (4)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (13)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

Uh oh!

github-actions bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

SumanMaharana commented Apr 1, 2026 •

edited

Loading

github-actions bot commented Apr 1, 2026 •

edited

Loading

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/ingestion/pipelines/extended_sample_data.yaml`

Target: `/ingestion/pipelines/lineage.yaml`

Target: `/ingestion/pipelines/sample_data.json`

Target: `/ingestion/pipelines/sample_data.yaml`

Target: `/ingestion/pipelines/sample_data_aut.yaml`

Target: `/ingestion/pipelines/sample_usage.json`

Target: `/ingestion/pipelines/sample_usage.yaml`

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

github-actions bot commented Apr 1, 2026 •

edited

Loading

Target: `openmetadata-ingestion:trivy (debian 12.13)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `usr/bin/docker`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`