Skip to content

Python CI Optimizations#26913

Open
SumanMaharana wants to merge 23 commits intomainfrom
optimize-py-ci
Open

Python CI Optimizations#26913
SumanMaharana wants to merge 23 commits intomainfrom
optimize-py-ci

Conversation

@SumanMaharana
Copy link
Copy Markdown
Contributor

@SumanMaharana SumanMaharana commented Apr 1, 2026

Describe your changes:

Fixes #26898
This pull request refactors the Python test workflow to separate static checks from unit tests, introduces a more flexible dependency installation mode for the test environment setup, and improves test coverage handling. The changes streamline CI runs, reduce resource usage for static checks, and make coverage reporting more accurate.

Workflow and CI improvements:

  • Split the Python test workflow into two jobs: py-static-checks for static analysis (using only the main Python version and minimal dependencies) and py-unit-tests for running unit tests across multiple Python versions. This reduces unnecessary dependency installation and speeds up static checks. [1] [2] [3]
  • Added a new install-mode input to .github/actions/setup-openmetadata-test-environment, allowing either a full install (all dependencies) or a lightweight "dev-only" install for static checks. The static check job now uses the "dev-only" mode. [1] [2] [3]

Dependency and caching enhancements:

  • Improved caching for Python dependencies in the setup action by specifying relevant files for cache invalidation, speeding up CI runs.

Test coverage and reporting:

  • Modified unit_tests in noxfile.py to allow skipping coverage collection via a --no-cov flag, which is now used for non-main Python versions to avoid redundant coverage runs. [1] [2]
  • Updated the coverage configuration in pyproject.toml to omit generated code from coverage reports, making coverage metrics more meaningful.

Other workflow tweaks:

  • Adjusted integration test sharding for better distribution.
  • Updated workflow dependencies to include the new static check job in status and coverage aggregation.

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Copilot AI review requested due to automatic review settings April 1, 2026 06:46
@github-actions github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Apr 1, 2026

Check failure

Code scanning / CodeQL

Cache Poisoning via execution of untrusted code High

Potential cache poisoning in the context of the default branch due to privilege checkout of untrusted code. (
pull_request_target
).
Potential cache poisoning in the context of the default branch due to privilege checkout of untrusted code. (
pull_request_target
).
Potential cache poisoning in the context of the default branch due to privilege checkout of untrusted code. (
pull_request_target
).
Potential cache poisoning in the context of the default branch due to privilege checkout of untrusted code. (
pull_request_target
).
Potential cache poisoning in the context of the default branch due to privilege checkout of untrusted code. (
pull_request_target
).

Copilot Autofix

AI 12 days ago

General approach: Avoid running workflows that both (a) execute untrusted PR code and (b) use shared caches in the default-branch security context (pull_request_target). The standard fix is to change such workflows to trigger on pull_request instead of pull_request_target, so they run in the PR’s own context and cannot poison default-branch caches. This aligns with GitHub’s guidance for running untrusted code and with the background you provided.

Best fix here: Convert the three workflows that invoke or gate Python tests/builds from pull_request_target to pull_request, keeping their behavior otherwise the same. This removes the privileged default-branch context from these jobs while preserving label-based gating, change detection, and all existing steps. The composite action already avoids saving caches for pull_request_target events; after the fix, these workflows will never run as pull_request_target, so they will operate entirely in the unprivileged PR context and any caching will be scoped to the PR branch.

Concretely:

  1. In .github/workflows/py-operator-build-test.yml

    • Change the on: section to use pull_request instead of pull_request_target, preserving the same types and paths filters.
    • No other changes are strictly needed; the checkout of github.event.pull_request.head.sha is still valid under pull_request.
  2. In .github/workflows/py-tests-postgres.yml

    • Change the on: section trigger from pull_request_target to pull_request with the same types.
    • The rest of the workflow still functions: it already assumes github.event.pull_request is present.
  3. In .github/workflows/py-tests.yml

    • Similarly, update the on: section from pull_request_target to pull_request.
    • Environment variables and jobs that reference github.event.pull_request continue to work for pull_request.

These changes eliminate the CodeQL-reported “privileged checkout of untrusted code in the default branch context” for the composite action path, and also address all the alert variants pointing at the pull_request_target triggers.


Suggested changeset 3
.github/workflows/py-operator-build-test.yml
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/py-operator-build-test.yml b/.github/workflows/py-operator-build-test.yml
--- a/.github/workflows/py-operator-build-test.yml
+++ b/.github/workflows/py-operator-build-test.yml
@@ -12,7 +12,7 @@
 name: py-operator-build-test
 on:
   workflow_dispatch:
-  pull_request_target:
+  pull_request:
     types: [labeled, opened, synchronize, reopened, ready_for_review]
     paths:
       - "ingestion/**"
EOF
@@ -12,7 +12,7 @@
name: py-operator-build-test
on:
workflow_dispatch:
pull_request_target:
pull_request:
types: [labeled, opened, synchronize, reopened, ready_for_review]
paths:
- "ingestion/**"
.github/workflows/py-tests-postgres.yml
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/py-tests-postgres.yml b/.github/workflows/py-tests-postgres.yml
--- a/.github/workflows/py-tests-postgres.yml
+++ b/.github/workflows/py-tests-postgres.yml
@@ -12,7 +12,7 @@
 name: py-tests-postgres
 on:
   workflow_dispatch:
-  pull_request_target:
+  pull_request:
     types: [labeled, opened, synchronize, reopened, ready_for_review]
 
 permissions:
EOF
@@ -12,7 +12,7 @@
name: py-tests-postgres
on:
workflow_dispatch:
pull_request_target:
pull_request:
types: [labeled, opened, synchronize, reopened, ready_for_review]

permissions:
.github/workflows/py-tests.yml
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/py-tests.yml b/.github/workflows/py-tests.yml
--- a/.github/workflows/py-tests.yml
+++ b/.github/workflows/py-tests.yml
@@ -12,7 +12,7 @@
 name: py-tests
 on:
   workflow_dispatch:
-  pull_request_target:
+  pull_request:
     types: [labeled, opened, synchronize, reopened, ready_for_review]
 
 permissions:
EOF
@@ -12,7 +12,7 @@
name: py-tests
on:
workflow_dispatch:
pull_request_target:
pull_request:
types: [labeled, opened, synchronize, reopened, ready_for_review]

permissions:
Copilot is powered by AI and may make mistakes. Always verify output.
Comment thread .github/actions/setup-openmetadata-test-environment/action.yml
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes Python CI by splitting static analysis from unit tests, adding a lighter dependency-install path for static checks, and refining how coverage is produced/combined to reduce redundant work across the test matrix.

Changes:

  • Split the Python test workflow into py-static-checks (single main Python) and py-unit-tests (matrix) to speed up static checks and reduce installs.
  • Add install-mode to the composite test-environment action to support a lighter “dev-only” install path.
  • Add a --no-cov option to the unit-tests nox session so only the main Python version collects coverage; update coverage config to omit generated code.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
ingestion/pyproject.toml Omits generated code from coverage collection for more meaningful metrics.
ingestion/noxfile.py Adds --no-cov support and durations reporting to reduce redundant coverage runs in CI.
.github/workflows/py-tests.yml Splits static checks vs unit tests jobs; adjusts sharding and job dependencies for coverage/status aggregation.
.github/actions/setup-openmetadata-test-environment/action.yml Introduces install-mode and improves uv caching behavior to speed CI setup.

@@ -104,11 +101,52 @@
nox --no-venv -s static-checks
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static-checks nox session is parameterized with python=get_python_versions(), so it may attempt to run for multiple Python versions. Since this job intentionally sets up only MAIN_PYTHON_VERSION, consider setting PYTHON_VERSIONS=${{ env.MAIN_PYTHON_VERSION }} for this step/job (the noxfile already honors PYTHON_VERSIONS) to ensure only one session runs and avoid redundant/incorrect interpreter selection when using --no-venv.

Suggested change
nox --no-venv -s static-checks
nox --no-venv -s static-checks
env:
PYTHON_VERSIONS: ${{ env.MAIN_PYTHON_VERSION }}

Copilot uses AI. Check for mistakes.
Comment on lines 145 to 150
- name: Run Unit Tests
run: |
source env/bin/activate
cd ingestion
nox --no-venv -s unit-tests
nox --no-venv -s unit-tests -- ${{ matrix.py-version != env.MAIN_PYTHON_VERSION && '--no-cov' || '' }}
shell: bash
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job runs under a single matrix.py-version, but the nox sessions are defined with python=get_python_versions() (multiple versions). With --no-venv, nox may still enumerate/run/skip other versioned sessions depending on what interpreters are present on the runner. Setting PYTHON_VERSIONS=${{ matrix.py-version }} for this step/job would make the executed nox session unambiguous and avoid extra work.

Copilot uses AI. Check for mistakes.
Comment on lines 168 to 176
matrix:
py-version: ["3.10", "3.11", "3.12"]
shard:
- name: "shard-1"
nox-args: >-
tests/integration/ometa
- name: "shard-2"
nox-args: >-
tests/integration/postgres
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration job is matrixed by py-version, but the nox sessions are defined with python=get_python_versions() (multiple versions). Consider setting PYTHON_VERSIONS=${{ matrix.py-version }} at the job/step level so nox selects only the intended versioned session when running with --no-venv.

Copilot uses AI. Check for mistakes.
Comment thread ingestion/noxfile.py
Comment on lines +157 to +160
skip_cov = "--no-cov" in args
if skip_cov:
extra_flags = [f for f in extra_flags if f != "--no-cov"]

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--no-cov is treated as a custom flag for this nox session and stripped before passing args to pytest. Please add a short inline comment explaining that this flag is meant for CI to skip coverage on non-main Python versions (and that it is not a pytest option) to avoid confusion for local runs.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
libpng-dev CVE-2026-33416 🚨 HIGH 1.6.39-2+deb12u3 1.6.39-2+deb12u4
libpng-dev CVE-2026-33636 🚨 HIGH 1.6.39-2+deb12u3 1.6.39-2+deb12u4
libpng16-16 CVE-2026-33416 🚨 HIGH 1.6.39-2+deb12u3 1.6.39-2+deb12u4
libpng16-16 CVE-2026-33636 🚨 HIGH 1.6.39-2+deb12u3 1.6.39-2+deb12u4

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http CVE-2026-33870 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.10.Final
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 CVE-2026-33871 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.11.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.spark:spark-core_2.12 CVE-2025-54920 🚨 HIGH 3.5.6 3.5.7
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (13)

Package Vulnerability ID Severity Installed Version Fixed Version
apache-airflow CVE-2026-26929 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-28779 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-30911 🚨 HIGH 3.1.7 3.1.8
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
pyOpenSSL CVE-2026-27459 🚨 HIGH 24.1.0 26.0.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (37)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http CVE-2026-33870 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.10.Final
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 CVE-2026-33871 🚨 HIGH 4.1.96.Final 4.1.132.Final, 4.2.11.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.spark:spark-core_2.12 CVE-2025-54920 🚨 HIGH 3.5.6 3.5.7
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (27)

Package Vulnerability ID Severity Installed Version Fixed Version
Authlib CVE-2026-27962 🔥 CRITICAL 1.6.6 1.6.9
Authlib CVE-2026-28490 🚨 HIGH 1.6.6 1.6.9
Authlib CVE-2026-28498 🚨 HIGH 1.6.6 1.6.9
Authlib CVE-2026-28802 🚨 HIGH 1.6.6 1.6.7
PyJWT CVE-2026-32597 🚨 HIGH 2.11.0 2.12.0
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
apache-airflow CVE-2026-26929 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-28779 🚨 HIGH 3.1.7 3.1.8
apache-airflow CVE-2026-30911 🚨 HIGH 3.1.7 3.1.8
apache-airflow-providers-http CVE-2025-69219 🚨 HIGH 5.6.4 6.0.0
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
litellm CVE-2026-35030 🔥 CRITICAL 1.81.6 1.83.0
litellm CVE-2026-35029 🚨 HIGH 1.81.6 1.83.0
protobuf CVE-2026-0994 🚨 HIGH 4.25.8 6.33.5, 5.29.6
pyOpenSSL CVE-2026-27459 🚨 HIGH 24.1.0 26.0.0
pyasn1 CVE-2026-30922 🚨 HIGH 0.6.2 0.6.3
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
tornado CVE-2026-31958 🚨 HIGH 6.5.4 6.5.5
tornado CVE-2026-35536 🚨 HIGH 6.5.4 6.5.5
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: usr/bin/docker

Vulnerabilities (2)

Package Vulnerability ID Severity Installed Version Fixed Version
stdlib CVE-2025-68121 🔥 CRITICAL v1.25.6 1.24.13, 1.25.7, 1.26.0-rc.3
stdlib CVE-2026-25679 🚨 HIGH v1.25.6 1.25.8, 1.26.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

Copilot AI review requested due to automatic review settings April 1, 2026 08:00
Comment thread ingestion/tests/integration/test_suite/test_e2e_workflow.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Comment on lines 143 to 147
@classmethod
def setUpClass(cls):
"""set up class"""
cls.addClassCleanup(cls.tearDownClass)
service: DatabaseService = cls.metadata.create_or_update(
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setUpClass already triggers tearDownClass automatically in unittest. Adding cls.addClassCleanup(cls.tearDownClass) will cause tearDownClass to run twice on a normal successful run, which will likely error here because tearDownClass assumes the service still exists (get_by_name(...).id.root without a None check) and also removes the sqlite file. Instead, register idempotent cleanup callbacks for the individual resources (service deletion, file removal), or make tearDownClass safe to call multiple times and avoid registering it as a class cleanup.

Copilot uses AI. Check for mistakes.

@classmethod
def setUpClass(cls) -> None:
cls.addClassCleanup(cls.tearDownClass)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unittest will call tearDownClass automatically; registering cls.addClassCleanup(cls.tearDownClass) means the class teardown runs twice on success. Even if most operations are wrapped in accumulate_errors, this can still introduce redundant work and potential flakiness (e.g., stopping an already-stopped container). Prefer adding class cleanups for the specific resources created in setUpClass (container stop + service deletion) or ensure tearDownClass is fully idempotent and do not register it as a class cleanup.

Suggested change
cls.addClassCleanup(cls.tearDownClass)

Copilot uses AI. Check for mistakes.

- name: Install msodbcsql18
if: ${{ inputs.install-mode == 'full' }}
run: ACCEPT_EULA=Y sudo apt-get -qq install -y msodbcsql18
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The msodbcsql18 install step no longer runs apt-get update (the previous combined install did). If the package lists are stale or the caching action doesn't refresh them, this can fail with "Unable to locate package" and the -qq flag will make diagnosis harder. Consider running sudo apt-get update (or ensuring the cache step refreshes indexes) before installing msodbcsql18, and/or avoid -qq for this step so failures are debuggable.

Suggested change
run: ACCEPT_EULA=Y sudo apt-get -qq install -y msodbcsql18
run: |
sudo apt-get update
ACCEPT_EULA=Y sudo apt-get install -y msodbcsql18

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/py-tests.yml
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Comment thread .github/workflows/py-tests.yml
Comment thread ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Comment thread ingestion/noxfile.py Outdated
Comment on lines 152 to 156
@@ -154,19 +154,37 @@ def unit_tests(session):
test_paths = [a for a in args if not a.startswith("-")]
extra_flags = [a for a in args if a.startswith("-")]

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current posargs parsing splits flags vs paths by startswith('-'), which breaks common pytest flags that take a separate value (e.g. the docstring example -k test_name becomes ['test_name', '-k'] and will be interpreted as a path + a flag missing its value). It also breaks --dist load (the load token is treated as a path). Consider preserving argument order and/or using an argparse-style parser for the handful of flags you need to intercept (--no-cov, --dist) while passing the rest through unchanged.

Copilot uses AI. Check for mistakes.
Comment thread ingestion/noxfile.py
Comment on lines 218 to +223
if standalone:
args.remove("--standalone")

skip_cov = "--no-cov" in args
if skip_cov:
args.remove("--no-cov")
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --no-cov handling here relies on later posargs processing that splits flags vs paths using startswith('-'). That approach breaks pytest flags that take a separate value (e.g. -k expr, -m marker, --maxfail 1), since the value token is treated as a test path and the flag loses its value. Consider switching to an order-preserving pass-through (only removing --standalone, --no-cov, and --workers from the args list) to avoid reordering or losing paired values.

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +39
- name: Cache and install Ubuntu dependencies
uses: awalsh128/cache-apt-pkgs-action@latest
with:
packages: unixodbc-dev python3-venv librdkafka-dev gcc libsasl2-dev build-essential libssl-dev libffi-dev libevent-dev python3-dev libkrb5-dev tdsodbc
version: 1.0
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using awalsh128/cache-apt-pkgs-action@latest makes the CI supply chain non-deterministic (the referenced code can change over time) and is generally discouraged for security/reproducibility. Pin this action to an immutable version tag or, ideally, a commit SHA.

Suggested change
- name: Cache and install Ubuntu dependencies
uses: awalsh128/cache-apt-pkgs-action@latest
with:
packages: unixodbc-dev python3-venv librdkafka-dev gcc libsasl2-dev build-essential libssl-dev libffi-dev libevent-dev python3-dev libkrb5-dev tdsodbc
version: 1.0
- name: Install Ubuntu dependencies
run: |
sudo apt-get update -qq
sudo apt-get install -y \
unixodbc-dev \
python3-venv \
librdkafka-dev \
gcc \
libsasl2-dev \
build-essential \
libssl-dev \
libffi-dev \
libevent-dev \
python3-dev \
libkrb5-dev \
tdsodbc
shell: bash

Copilot uses AI. Check for mistakes.
Comment on lines +167 to +192
while status not in ("success", "completed"):
try:
response = metadata.client.get(
"/apps/name/SearchIndexingApplication/status?offset=0&limit=1"
)
except Exception as e:
if "TimeoutError" in str(type(e).__name__) or "RuntimeError" in str(
type(e).__name__
):
raise
time.sleep(1)
continue
time.sleep(2)
if len(response["data"]) == 0:
raise RuntimeError("No reindexing status found after triggering")

status = response["data"][0]["status"]

if status in ("failed", "error"):
reindex_failed = True
break

if time.time() - start_complete > complete_timeout:
raise TimeoutError(
f"Timed out waiting for reindexing to complete. "
f"Current status: {status}, elapsed: {int(time.time() - start_complete)}s"
)
except Exception as e:
if "TimeoutError" in str(type(e).__name__) or "RuntimeError" in str(
type(e).__name__
):
raise
time.sleep(1)
continue
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the polling loop, the timeout check is only evaluated inside the try block after a successful response parse. If metadata.client.get(...) repeatedly raises a non-TimeoutError/non-RuntimeError exception (e.g. transient network/JSON/KeyError), the except path will continue indefinitely without ever checking complete_timeout, so this loop can hang forever. Consider checking elapsed time on every iteration (including exception paths) and failing after complete_timeout (and optionally counting consecutive failures).

Copilot uses AI. Check for mistakes.
Comment thread ingestion/tests/integration/trino/conftest.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Comment thread ingestion/noxfile.py Outdated
Comment on lines +152 to +167
# pytest doesn't receive conflicting values.
has_dist_override = any(
f == "--dist" or f.startswith("--dist=") for f in extra_flags
)
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The posargs parsing splits session.posargs into test_paths vs extra_flags based only on a leading -. This breaks for pytest flags that take a separate value (e.g. --dist load, -k expr, --maxfail 1): the value is incorrectly treated as a test path, and for --dist specifically pytest will receive --dist without its required argument. Consider either (a) not splitting at all and just append session.posargs as-is, or (b) use a proper argparse-like scan that keeps flag/value pairs together (handling --flag=value and --flag value).

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +87
try:
cm = try_bind(container, 1433, 1433)
container = cm.__enter__()
except (docker.errors.BuildError, docker.errors.APIError) as exc:
pytest.skip(f"MSSQL container unavailable: {exc}")

try:
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixture manually calls cm.__enter__() / cm.__exit__(None, None, None) instead of using a with statement. In the finally block, __exit__ is always invoked with (None, None, None) even if an exception occurred inside the fixture body; this can prevent context managers from handling cleanup correctly based on the exception info. Prefer with try_bind(...) as container: and do the skip logic around that, or call cm.__exit__(*sys.exc_info()) in finally so the context manager receives the real exception details.

Copilot uses AI. Check for mistakes.
Comment thread ingestion/noxfile.py
has_dist_override = any(a == "--dist" or a.startswith("--dist=") for a in args)

# If the caller provided no explicit test paths, inject the default.
has_explicit_paths = any(not a.startswith("-") for a in args)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: has_explicit_paths false-positive on paired-flag values

The comment at lines 152-156 correctly explains why splitting args by startswith("-") is dangerous — paired options like -k expr or --maxfail 1 have value tokens that don't start with -. However, the has_explicit_paths check on line 170 does exactly that split:

has_explicit_paths = any(not a.startswith("-") for a in args)

If a developer runs nox -s unit-tests -- -k test_something, args is ["-k", "test_something"]. The value "test_something" doesn't start with -, so has_explicit_paths is True, and the default tests/unit/ path is never injected. Pytest then receives no test path and falls back to testpaths from config or . — which may discover wrong tests or none at all.

The same issue exists in integration_tests at line 244.

This doesn't affect CI today because all shards pass explicit test paths in nox-args, but it breaks local developer workflows using -k, --maxfail N, etc.

Suggested fix:

Use a more robust heuristic, e.g. check if any arg
looks like a path (contains '/' or os.path.exists):

    has_explicit_paths = any(
        not a.startswith("-") and ("/" in a or os.path.exists(a))
        for a in args
    )

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Comment on lines +85 to +99
try:
with try_bind(container, 1433, 1433) as container:
docker_container = container.get_wrapped_container()
copy_dir_to_container(str(data_dir), docker_container, "/data")
res = docker_container.exec_run(
[
"bash",
"-c",
" ".join(
[
"/opt/mssql-tools*/bin/sqlcmd",
"-U",
container.username,
"-P",
f"'{container.password}'",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: Wider exception catch may skip setup failures as skipped tests

The refactored mssql_container fixture now wraps the entire setup (docker exec, engine creation, SQL execution) inside try: with try_bind(...): ... except (docker.errors.BuildError, docker.errors.APIError). Previously, only the try_bind / __enter__ call was guarded by the docker-error catch; setup failures like exec_run or create_engine raising an APIError would propagate as real failures.

Now, any docker.errors.APIError raised during data copying, SQL execution, or engine creation would be caught and turned into pytest.skip(), masking legitimate setup bugs. While these specific exception types are uncommon after container start, the catch scope is broader than intended.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Comment thread ingestion/noxfile.py
Comment on lines +169 to +193
# If the caller provided no explicit test paths, inject the default.
has_explicit_paths = any(not a.startswith("-") for a in args)

pytest_args = [
"-c",
"pyproject.toml",
"--cov=metadata",
"--cov-branch",
"--cov-config=pyproject.toml",
"--junitxml=junit/test-results-unit.xml",
"-n",
"auto",
"--dist",
"loadfile",
"--durations=20",
]

pytest_args.extend(test_paths or ["tests/unit/"])
pytest_args.extend(extra_flags)
if not has_dist_override:
pytest_args += ["--dist", "loadfile"]

if not skip_cov:
pytest_args += [
"--cov=metadata",
"--cov-branch",
"--cov-config=pyproject.toml",
]

if not has_explicit_paths:
pytest_args.append("tests/unit/")

Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_explicit_paths = any(not a.startswith('-') for a in args) will incorrectly treat values for paired pytest options (e.g. -k expr, --maxfail 1) as “explicit paths”. As a result, nox -s unit-tests -- -k foo won’t inject the default tests/unit/ path and will run collection from the repo root instead. Consider detecting explicit paths via filesystem existence / globbing (e.g. Path(a).exists() for non-flag tokens) or a stricter heuristic (tokens containing / or ending with .py) so flag values don’t suppress the default path.

Copilot uses AI. Check for mistakes.
Comment thread ingestion/noxfile.py
Comment on lines +223 to +268
# We only strip our custom flags (--standalone, --no-cov, --workers) from posargs
# and pass everything else through to pytest in original order. We avoid splitting
# by startswith("-") because that breaks paired options like "-k expr" or
# "--maxfail 1" — the value token would be misclassified as a test path.
args = list(session.posargs)
standalone = "--standalone" in args
if standalone:
args.remove("--standalone")

# --no-cov is a CI-only flag to skip coverage on non-main Python versions.
skip_cov = "--no-cov" in args
if skip_cov:
args.remove("--no-cov")

workers = os.environ.get("PYTEST_INTEGRATION_WORKERS", "0")
if "--workers" in args:
idx = args.index("--workers")
workers = args[idx + 1]
args = args[:idx] + args[idx + 2 :]

# Separate test paths from pytest flags in posargs
test_paths = [a for a in args if not a.startswith("-")]
extra_flags = [a for a in args if a.startswith("-")]
# If no explicit test paths, inject the default directory.
has_explicit_paths = any(not a.startswith("-") for a in args)

pytest_args = [
"-c",
"pyproject.toml",
"--cov=metadata",
"--cov-branch",
"--cov-config=pyproject.toml",
"--junitxml=junit/test-results-integration.xml",
]

if not standalone:
if not skip_cov:
pytest_args += [
"--cov=metadata",
"--cov-branch",
"--cov-config=pyproject.toml",
]

if not standalone and not skip_cov:
pytest_args.append("--cov-append")

use_xdist = int(workers) > 0
if use_xdist:
pytest_args.extend([f"-n{workers}", "--dist=loadgroup"])

pytest_args.extend(test_paths or ["tests/integration/"])
pytest_args.extend(extra_flags)
if not has_explicit_paths:
pytest_args.append("tests/integration/")

Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as unit-tests: has_explicit_paths = any(not a.startswith('-') for a in args) will be true for paired-option values like -k expr or --maxfail 1, which prevents injecting the default tests/integration/ path. This can unexpectedly run all tests when callers only intended to filter. Use a stricter “is this a path?” check (e.g. Path(token).exists()/glob, or //.py heuristics) before deciding to omit the default path.

Copilot uses AI. Check for mistakes.
Comment on lines +233 to +237
logger = logging.getLogger(__name__)


def _verify_tables_readable(engine: Engine, table_names: list):
"""Verify all tables are readable after data loading.
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger is defined but never used. If logging isn’t needed here, remove import logging and the logger = logging.getLogger(__name__) assignment; otherwise, use logger in the retry loop (e.g., to log attempts/exceptions) to justify keeping it.

Copilot uses AI. Check for mistakes.
Comment on lines +118 to +124
transaciton = conn.begin()
conn.execute(
text(
f"SELECT * INTO {db_name}.SalesLT.CustomerCopy FROM {db_name}.SalesLT.Customer;"
)
)
transaciton.commit()
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in variable name transaciton makes the fixture harder to read/maintain. Rename to transaction (and update uses) to avoid confusion.

Suggested change
transaciton = conn.begin()
conn.execute(
text(
f"SELECT * INTO {db_name}.SalesLT.CustomerCopy FROM {db_name}.SalesLT.Customer;"
)
)
transaciton.commit()
transaction = conn.begin()
conn.execute(
text(
f"SELECT * INTO {db_name}.SalesLT.CustomerCopy FROM {db_name}.SalesLT.Customer;"
)
)
transaction.commit()

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +92
- name: Restore uv package cache
id: cache-uv
uses: actions/cache/restore@v4
with:
path: ${{ env.UV_CACHE_DIR }}
key: uv-${{ runner.os }}-py${{ inputs.python-version }}-${{ inputs.install-mode }}-${{ hashFiles('ingestion/setup.py') }}
restore-keys: |
uv-${{ runner.os }}-py${{ inputs.python-version }}-${{ inputs.install-mode }}-
uv-${{ runner.os }}-py${{ inputs.python-version }}-

Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache steps use path: ${{ env.UV_CACHE_DIR }}, but this composite action doesn’t define UV_CACHE_DIR anywhere. If astral-sh/setup-uv doesn’t export it on the runner, this resolves to an empty path and can cause actions/cache to error or silently skip caching. Consider setting UV_CACHE_DIR explicitly (e.g. ~/.cache/uv) and/or using a known output/input from setup-uv to determine the cache directory.

Copilot uses AI. Check for mistakes.
@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 6, 2026

Code Review ⚠️ Changes requested 6 resolved / 8 findings

Python CI optimizations resolved six issues including dev dependency installation and fixture teardown problems, but four findings remain open: has_explicit_paths misidentifies paired-flag values as test paths in both test shards, exception handling in the MSSQL fixture may skip setup failures, and the docker error catch is both too narrow and too broad for safe operation.

⚠️ Bug: has_explicit_paths false-positive on paired-flag values

📄 ingestion/noxfile.py:170 📄 ingestion/noxfile.py:244

The comment at lines 152-156 correctly explains why splitting args by startswith("-") is dangerous — paired options like -k expr or --maxfail 1 have value tokens that don't start with -. However, the has_explicit_paths check on line 170 does exactly that split:

has_explicit_paths = any(not a.startswith("-") for a in args)

If a developer runs nox -s unit-tests -- -k test_something, args is ["-k", "test_something"]. The value "test_something" doesn't start with -, so has_explicit_paths is True, and the default tests/unit/ path is never injected. Pytest then receives no test path and falls back to testpaths from config or . — which may discover wrong tests or none at all.

The same issue exists in integration_tests at line 244.

This doesn't affect CI today because all shards pass explicit test paths in nox-args, but it breaks local developer workflows using -k, --maxfail N, etc.

Suggested fix
Use a more robust heuristic, e.g. check if any arg
looks like a path (contains '/' or os.path.exists):

    has_explicit_paths = any(
        not a.startswith("-") and ("/" in a or os.path.exists(a))
        for a in args
    )
💡 Edge Case: Wider exception catch may skip setup failures as skipped tests

📄 ingestion/tests/integration/sql_server/conftest.py:85-99

The refactored mssql_container fixture now wraps the entire setup (docker exec, engine creation, SQL execution) inside try: with try_bind(...): ... except (docker.errors.BuildError, docker.errors.APIError). Previously, only the try_bind / __enter__ call was guarded by the docker-error catch; setup failures like exec_run or create_engine raising an APIError would propagate as real failures.

Now, any docker.errors.APIError raised during data copying, SQL execution, or engine creation would be caught and turned into pytest.skip(), masking legitimate setup bugs. While these specific exception types are uncommon after container start, the catch scope is broader than intended.

✅ 6 resolved
Bug: dev-only install mode doesn't install dev deps for static checks

📄 .github/actions/setup-openmetadata-test-environment/action.yml:99-104 📄 .github/workflows/py-tests.yml:95 📄 ingestion/noxfile.py:41-42 📄 ingestion/noxfile.py:126-128
The new dev-only install mode (action.yml lines 99-104) only installs nox, but the static-checks nox session requires basedpyright (and other tools from the [dev] extra). Since the workflow runs with --no-venv, the noxfile's install() function is a no-op (it skips installs for PassthroughEnv), so the session cannot install its own dependencies either. This means the py-static-checks job will fail at runtime with a "basedpyright not found" error.

The fix is to have the dev-only mode install the [dev] extra in addition to nox.

Bug: Missing service_name in deepcopy'd config may affect test_ingestion

📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:204-209 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:249 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:320 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:352 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:438 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:524 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:618 📄 ingestion/tests/integration/orm_profiler/test_orm_profiler_e2e.py:727
In test_orm_profiler_e2e.py, the ingest fixture now correctly avoids mutating the global ingestion_config by creating a shallow copy with spread syntax. The subsequent test functions (test_profiler_workflow, test_workflow_sample_profile, etc.) each add workflow_config["source"]["serviceName"] = service_name after deepcopy(ingestion_config). However, ingestion_config is a module-level dict that previously had serviceName set by the ingest fixture (via mutation). Now that ingest no longer mutates it, the deepcopy in each test will get a config without serviceName set — which is correctly fixed by the added lines. This looks correct as-is. However, check that there are no other test functions in this file that deepcopy(ingestion_config) without adding serviceName.

Security: Third-party GitHub Action pinned to @latest tag

📄 .github/actions/setup-openmetadata-test-environment/action.yml:36
The new awalsh128/cache-apt-pkgs-action@latest usage pins to a mutable tag that can change at any time without notice. If the upstream repository is compromised or a breaking change is pushed, CI pipelines would be affected immediately. Every other action in the file and workflow uses a pinned version (e.g., @v4, @v1.3.4), so this breaks the existing security posture.

This is a supply chain risk: a malicious update to the action could exfiltrate secrets or inject code into the build environment.

Bug: addClassCleanup(tearDownClass) causes double teardown execution

📄 ingestion/tests/integration/test_suite/test_e2e_workflow.py:146 📄 ingestion/tests/integration/test_suite/test_e2e_workflow.py:227-241 📄 ingestion/tests/integration/profiler/test_nosql_profiler.py:134 📄 ingestion/tests/integration/profiler/test_nosql_profiler.py:155-159
In Python's unittest, when setUpClass succeeds, the framework automatically calls tearDownClass after all tests, and then runs any registered addClassCleanup callbacks. So tearDownClass will be called twice in the normal (success) path.

In test_e2e_workflow.py, this will crash on the second invocation because:

  • metadata.get_by_name(...) returns None after the service was already deleted, causing AttributeError: 'NoneType' object has no attribute 'id' on line 234
  • os.remove(cls.db_path) raises FileNotFoundError since the file was already removed

In test_nosql_profiler.py, the accumulate_errors context manager may swallow errors, but mongo_container.stop() on an already-stopped container is still wasteful/error-prone.

The correct pattern is to either:

  1. Extract cleanup into a separate helper and register that with addClassCleanup instead of tearDownClass itself, removing the tearDownClass override entirely
  2. Or make tearDownClass fully idempotent with guard checks
Edge Case: Workflow expression comparison to env var may not work

📄 .github/workflows/py-tests.yml:158 📄 .github/workflows/py-tests.yml:308
In the GitHub Actions workflow, the expression ${{ matrix.py-version != env.MAIN_PYTHON_VERSION && '--no-cov' || '' }} compares matrix.py-version against env.MAIN_PYTHON_VERSION. However, in GitHub Actions expression contexts, env.* references may not resolve correctly inside ${{ }} expressions — the env context is available but the comparison depends on the exact context where it's evaluated. If MAIN_PYTHON_VERSION is defined as a workflow-level env var (which it appears to be), this should work, but it's worth verifying in CI that non-3.10 versions actually receive --no-cov and 3.10 does not. A safer approach would be to use a matrix include to set a skip-cov flag explicitly per version.

...and 1 more resolved from earlier reviews

🤖 Prompt for agents
Code Review: Python CI optimizations resolved six issues including dev dependency installation and fixture teardown problems, but four findings remain open: `has_explicit_paths` misidentifies paired-flag values as test paths in both test shards, exception handling in the MSSQL fixture may skip setup failures, and the docker error catch is both too narrow and too broad for safe operation.

1. ⚠️ Bug: `has_explicit_paths` false-positive on paired-flag values
   Files: ingestion/noxfile.py:170, ingestion/noxfile.py:244

   The comment at lines 152-156 correctly explains why splitting args by `startswith("-")` is dangerous — paired options like `-k expr` or `--maxfail 1` have value tokens that don't start with `-`. However, the `has_explicit_paths` check on line 170 does exactly that split:
   
   ```python
   has_explicit_paths = any(not a.startswith("-") for a in args)
   ```
   
   If a developer runs `nox -s unit-tests -- -k test_something`, `args` is `["-k", "test_something"]`. The value `"test_something"` doesn't start with `-`, so `has_explicit_paths` is `True`, and the default `tests/unit/` path is never injected. Pytest then receives no test path and falls back to `testpaths` from config or `.` — which may discover wrong tests or none at all.
   
   The same issue exists in `integration_tests` at line 244.
   
   This doesn't affect CI today because all shards pass explicit test paths in `nox-args`, but it breaks local developer workflows using `-k`, `--maxfail N`, etc.

   Suggested fix:
   Use a more robust heuristic, e.g. check if any arg
   looks like a path (contains '/' or os.path.exists):
   
       has_explicit_paths = any(
           not a.startswith("-") and ("/" in a or os.path.exists(a))
           for a in args
       )

2. 💡 Edge Case: Wider exception catch may skip setup failures as skipped tests
   Files: ingestion/tests/integration/sql_server/conftest.py:85-99

   The refactored `mssql_container` fixture now wraps the entire setup (docker exec, engine creation, SQL execution) inside `try: with try_bind(...): ... except (docker.errors.BuildError, docker.errors.APIError)`. Previously, only the `try_bind` / `__enter__` call was guarded by the docker-error catch; setup failures like `exec_run` or `create_engine` raising an `APIError` would propagate as real failures.
   
   Now, any `docker.errors.APIError` raised during data copying, SQL execution, or engine creation would be caught and turned into `pytest.skip()`, masking legitimate setup bugs. While these specific exception types are uncommon after container start, the catch scope is broader than intended.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated no new comments.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 6, 2026

@IceS2
Copy link
Copy Markdown
Contributor

IceS2 commented Apr 7, 2026

Hey @SumanMaharana, I have only one question...

What are we gaining from splitting the static checks and unit tests jobs if the bottleneck for the whole workflow are the integration tests? Aren't we just repeating some steps and using more resources for no gain?

@SumanMaharana
Copy link
Copy Markdown
Contributor Author

Hey @SumanMaharana, I have only one question...

What are we gaining from splitting the static checks and unit tests jobs if the bottleneck for the whole workflow are the integration tests? Aren't we just repeating some steps and using more resources for no gain?

Before this PR, static checks ran inside the py-unit-tests matrix across all 3 Python versions. Since linting and type checking are not Python-version-sensitive, we were running identical static checks 3 times (~5 min × 3 runners = ~15 runner-minutes), and the unit tests couldn't start until static checks on each runner finished.

After this PR, static checks run exactly once on Python 3.10 with a lighter install-mode: dev-only (~6 runner-minutes total), while the unit test shards and integration test shards all start at T=0 in parallel. The CI critical path looks like:

T=0 ──┬── py-static-checks   (~6 min)      ──┐
      ├── py-unit-tests      (~12-15 min)  ──┤── py-tests-status
      └── py-integration-tests(~40-43 min) ──┘

The "repeated steps" (checkout, wait-for-label, env setup) are all sub-minute cached operations — they don't materially contribute to runner-minutes. The expensive step, running nox, is cheaper for py-static-checks than it used to be (dev-only deps, no test collection overhead).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python CI tests optimization

4 participants