Conversation
Check failure
Code scanning / CodeQL
Cache Poisoning via execution of untrusted code High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 12 days ago
General approach: Avoid running workflows that both (a) execute untrusted PR code and (b) use shared caches in the default-branch security context (pull_request_target). The standard fix is to change such workflows to trigger on pull_request instead of pull_request_target, so they run in the PR’s own context and cannot poison default-branch caches. This aligns with GitHub’s guidance for running untrusted code and with the background you provided.
Best fix here: Convert the three workflows that invoke or gate Python tests/builds from pull_request_target to pull_request, keeping their behavior otherwise the same. This removes the privileged default-branch context from these jobs while preserving label-based gating, change detection, and all existing steps. The composite action already avoids saving caches for pull_request_target events; after the fix, these workflows will never run as pull_request_target, so they will operate entirely in the unprivileged PR context and any caching will be scoped to the PR branch.
Concretely:
-
In
.github/workflows/py-operator-build-test.yml- Change the
on:section to usepull_requestinstead ofpull_request_target, preserving the sametypesandpathsfilters. - No other changes are strictly needed; the checkout of
github.event.pull_request.head.shais still valid underpull_request.
- Change the
-
In
.github/workflows/py-tests-postgres.yml- Change the
on:section trigger frompull_request_targettopull_requestwith the sametypes. - The rest of the workflow still functions: it already assumes
github.event.pull_requestis present.
- Change the
-
In
.github/workflows/py-tests.yml- Similarly, update the
on:section frompull_request_targettopull_request. - Environment variables and jobs that reference
github.event.pull_requestcontinue to work forpull_request.
- Similarly, update the
These changes eliminate the CodeQL-reported “privileged checkout of untrusted code in the default branch context” for the composite action path, and also address all the alert variants pointing at the pull_request_target triggers.
| @@ -12,7 +12,7 @@ | ||
| name: py-operator-build-test | ||
| on: | ||
| workflow_dispatch: | ||
| pull_request_target: | ||
| pull_request: | ||
| types: [labeled, opened, synchronize, reopened, ready_for_review] | ||
| paths: | ||
| - "ingestion/**" |
| @@ -12,7 +12,7 @@ | ||
| name: py-tests-postgres | ||
| on: | ||
| workflow_dispatch: | ||
| pull_request_target: | ||
| pull_request: | ||
| types: [labeled, opened, synchronize, reopened, ready_for_review] | ||
|
|
||
| permissions: |
| @@ -12,7 +12,7 @@ | ||
| name: py-tests | ||
| on: | ||
| workflow_dispatch: | ||
| pull_request_target: | ||
| pull_request: | ||
| types: [labeled, opened, synchronize, reopened, ready_for_review] | ||
|
|
||
| permissions: |
There was a problem hiding this comment.
Pull request overview
This PR optimizes Python CI by splitting static analysis from unit tests, adding a lighter dependency-install path for static checks, and refining how coverage is produced/combined to reduce redundant work across the test matrix.
Changes:
- Split the Python test workflow into
py-static-checks(single main Python) andpy-unit-tests(matrix) to speed up static checks and reduce installs. - Add
install-modeto the composite test-environment action to support a lighter “dev-only” install path. - Add a
--no-covoption to theunit-testsnox session so only the main Python version collects coverage; update coverage config to omit generated code.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
ingestion/pyproject.toml |
Omits generated code from coverage collection for more meaningful metrics. |
ingestion/noxfile.py |
Adds --no-cov support and durations reporting to reduce redundant coverage runs in CI. |
.github/workflows/py-tests.yml |
Splits static checks vs unit tests jobs; adjusts sharding and job dependencies for coverage/status aggregation. |
.github/actions/setup-openmetadata-test-environment/action.yml |
Introduces install-mode and improves uv caching behavior to speed CI setup. |
| @@ -104,11 +101,52 @@ | |||
| nox --no-venv -s static-checks | |||
There was a problem hiding this comment.
The static-checks nox session is parameterized with python=get_python_versions(), so it may attempt to run for multiple Python versions. Since this job intentionally sets up only MAIN_PYTHON_VERSION, consider setting PYTHON_VERSIONS=${{ env.MAIN_PYTHON_VERSION }} for this step/job (the noxfile already honors PYTHON_VERSIONS) to ensure only one session runs and avoid redundant/incorrect interpreter selection when using --no-venv.
| nox --no-venv -s static-checks | |
| nox --no-venv -s static-checks | |
| env: | |
| PYTHON_VERSIONS: ${{ env.MAIN_PYTHON_VERSION }} |
| - name: Run Unit Tests | ||
| run: | | ||
| source env/bin/activate | ||
| cd ingestion | ||
| nox --no-venv -s unit-tests | ||
| nox --no-venv -s unit-tests -- ${{ matrix.py-version != env.MAIN_PYTHON_VERSION && '--no-cov' || '' }} | ||
| shell: bash |
There was a problem hiding this comment.
This job runs under a single matrix.py-version, but the nox sessions are defined with python=get_python_versions() (multiple versions). With --no-venv, nox may still enumerate/run/skip other versioned sessions depending on what interpreters are present on the runner. Setting PYTHON_VERSIONS=${{ matrix.py-version }} for this step/job would make the executed nox session unambiguous and avoid extra work.
| matrix: | ||
| py-version: ["3.10", "3.11", "3.12"] | ||
| shard: | ||
| - name: "shard-1" | ||
| nox-args: >- | ||
| tests/integration/ometa | ||
| - name: "shard-2" | ||
| nox-args: >- | ||
| tests/integration/postgres |
There was a problem hiding this comment.
The integration job is matrixed by py-version, but the nox sessions are defined with python=get_python_versions() (multiple versions). Consider setting PYTHON_VERSIONS=${{ matrix.py-version }} at the job/step level so nox selects only the intended versioned session when running with --no-venv.
| skip_cov = "--no-cov" in args | ||
| if skip_cov: | ||
| extra_flags = [f for f in extra_flags if f != "--no-cov"] | ||
|
|
There was a problem hiding this comment.
--no-cov is treated as a custom flag for this nox session and stripped before passing args to pytest. Please add a short inline comment explaining that this flag is meant for CI to skip coverage on non-main Python versions (and that it is not a pytest option) to avoid confusion for local runs.
🛡️ TRIVY SCAN RESULT 🛡️ Target:
|
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
libpng-dev |
CVE-2026-33416 | 🚨 HIGH | 1.6.39-2+deb12u3 | 1.6.39-2+deb12u4 |
libpng-dev |
CVE-2026-33636 | 🚨 HIGH | 1.6.39-2+deb12u3 | 1.6.39-2+deb12u4 |
libpng16-16 |
CVE-2026-33416 | 🚨 HIGH | 1.6.39-2+deb12u3 | 1.6.39-2+deb12u4 |
libpng16-16 |
CVE-2026-33636 | 🚨 HIGH | 1.6.39-2+deb12u3 | 1.6.39-2+deb12u4 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: Java
Vulnerabilities (37)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.12.7 | 2.15.0 |
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.13.4 | 2.15.0 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42003 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4.2 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42004 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4 |
com.google.code.gson:gson |
CVE-2022-25647 | 🚨 HIGH | 2.2.4 | 2.8.9 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.3.0 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.3.0 | 3.25.5, 4.27.5, 4.28.2 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.7.1 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.7.1 | 3.25.5, 4.27.5, 4.28.2 |
com.nimbusds:nimbus-jose-jwt |
CVE-2023-52428 | 🚨 HIGH | 9.8.1 | 9.37.2 |
com.squareup.okhttp3:okhttp |
CVE-2021-0341 | 🚨 HIGH | 3.12.12 | 4.9.2 |
commons-beanutils:commons-beanutils |
CVE-2025-48734 | 🚨 HIGH | 1.9.4 | 1.11.0 |
commons-io:commons-io |
CVE-2024-47554 | 🚨 HIGH | 2.8.0 | 2.14.0 |
dnsjava:dnsjava |
CVE-2024-25638 | 🚨 HIGH | 2.1.7 | 3.6.0 |
io.airlift:aircompressor |
CVE-2025-67721 | 🚨 HIGH | 0.27 | 2.0.3 |
io.netty:netty-codec-http |
CVE-2026-33870 | 🚨 HIGH | 4.1.96.Final | 4.1.132.Final, 4.2.10.Final |
io.netty:netty-codec-http2 |
CVE-2025-55163 | 🚨 HIGH | 4.1.96.Final | 4.2.4.Final, 4.1.124.Final |
io.netty:netty-codec-http2 |
CVE-2026-33871 | 🚨 HIGH | 4.1.96.Final | 4.1.132.Final, 4.2.11.Final |
io.netty:netty-codec-http2 |
GHSA-xpw8-rcwv-8f8p | 🚨 HIGH | 4.1.96.Final | 4.1.100.Final |
io.netty:netty-handler |
CVE-2025-24970 | 🚨 HIGH | 4.1.96.Final | 4.1.118.Final |
net.minidev:json-smart |
CVE-2021-31684 | 🚨 HIGH | 1.3.2 | 1.3.3, 2.4.4 |
net.minidev:json-smart |
CVE-2023-1370 | 🚨 HIGH | 1.3.2 | 2.4.9 |
org.apache.avro:avro |
CVE-2024-47561 | 🔥 CRITICAL | 1.7.7 | 1.11.4 |
org.apache.avro:avro |
CVE-2023-39410 | 🚨 HIGH | 1.7.7 | 1.11.3 |
org.apache.derby:derby |
CVE-2022-46337 | 🔥 CRITICAL | 10.14.2.0 | 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0 |
org.apache.ivy:ivy |
CVE-2022-46751 | 🚨 HIGH | 2.5.1 | 2.5.2 |
org.apache.mesos:mesos |
CVE-2018-1330 | 🚨 HIGH | 1.4.3 | 1.6.0 |
org.apache.spark:spark-core_2.12 |
CVE-2025-54920 | 🚨 HIGH | 3.5.6 | 3.5.7 |
org.apache.thrift:libthrift |
CVE-2019-0205 | 🚨 HIGH | 0.12.0 | 0.13.0 |
org.apache.thrift:libthrift |
CVE-2020-13949 | 🚨 HIGH | 0.12.0 | 0.14.0 |
org.apache.zookeeper:zookeeper |
CVE-2023-44981 | 🔥 CRITICAL | 3.6.3 | 3.7.2, 3.8.3, 3.9.1 |
org.eclipse.jetty:jetty-server |
CVE-2024-13009 | 🚨 HIGH | 9.4.56.v20240826 | 9.4.57.v20241219 |
org.lz4:lz4-java |
CVE-2025-12183 | 🚨 HIGH | 1.8.0 | 1.8.1 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: Node.js
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: Python
Vulnerabilities (13)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
apache-airflow |
CVE-2026-26929 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow |
CVE-2026-28779 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow |
CVE-2026-30911 | 🚨 HIGH | 3.1.7 | 3.1.8 |
cryptography |
CVE-2026-26007 | 🚨 HIGH | 42.0.8 | 46.0.5 |
jaraco.context |
CVE-2026-23949 | 🚨 HIGH | 5.3.0 | 6.1.0 |
jaraco.context |
CVE-2026-23949 | 🚨 HIGH | 6.0.1 | 6.1.0 |
pyOpenSSL |
CVE-2026-27459 | 🚨 HIGH | 24.1.0 | 26.0.0 |
starlette |
CVE-2025-62727 | 🚨 HIGH | 0.48.0 | 0.49.1 |
urllib3 |
CVE-2025-66418 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2025-66471 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2026-21441 | 🚨 HIGH | 1.26.20 | 2.6.3 |
wheel |
CVE-2026-24049 | 🚨 HIGH | 0.45.1 | 0.46.2 |
wheel |
CVE-2026-24049 | 🚨 HIGH | 0.45.1 | 0.46.2 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: /etc/ssl/private/ssl-cert-snakeoil.key
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/extended_sample_data.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/lineage.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_data.json
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_data.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_data_aut.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_usage.json
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_usage.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_usage_aut.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️ Target:
|
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.12.7 | 2.15.0 |
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.13.4 | 2.15.0 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42003 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4.2 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42004 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4 |
com.google.code.gson:gson |
CVE-2022-25647 | 🚨 HIGH | 2.2.4 | 2.8.9 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.3.0 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.3.0 | 3.25.5, 4.27.5, 4.28.2 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.7.1 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.7.1 | 3.25.5, 4.27.5, 4.28.2 |
com.nimbusds:nimbus-jose-jwt |
CVE-2023-52428 | 🚨 HIGH | 9.8.1 | 9.37.2 |
com.squareup.okhttp3:okhttp |
CVE-2021-0341 | 🚨 HIGH | 3.12.12 | 4.9.2 |
commons-beanutils:commons-beanutils |
CVE-2025-48734 | 🚨 HIGH | 1.9.4 | 1.11.0 |
commons-io:commons-io |
CVE-2024-47554 | 🚨 HIGH | 2.8.0 | 2.14.0 |
dnsjava:dnsjava |
CVE-2024-25638 | 🚨 HIGH | 2.1.7 | 3.6.0 |
io.airlift:aircompressor |
CVE-2025-67721 | 🚨 HIGH | 0.27 | 2.0.3 |
io.netty:netty-codec-http |
CVE-2026-33870 | 🚨 HIGH | 4.1.96.Final | 4.1.132.Final, 4.2.10.Final |
io.netty:netty-codec-http2 |
CVE-2025-55163 | 🚨 HIGH | 4.1.96.Final | 4.2.4.Final, 4.1.124.Final |
io.netty:netty-codec-http2 |
CVE-2026-33871 | 🚨 HIGH | 4.1.96.Final | 4.1.132.Final, 4.2.11.Final |
io.netty:netty-codec-http2 |
GHSA-xpw8-rcwv-8f8p | 🚨 HIGH | 4.1.96.Final | 4.1.100.Final |
io.netty:netty-handler |
CVE-2025-24970 | 🚨 HIGH | 4.1.96.Final | 4.1.118.Final |
net.minidev:json-smart |
CVE-2021-31684 | 🚨 HIGH | 1.3.2 | 1.3.3, 2.4.4 |
net.minidev:json-smart |
CVE-2023-1370 | 🚨 HIGH | 1.3.2 | 2.4.9 |
org.apache.avro:avro |
CVE-2024-47561 | 🔥 CRITICAL | 1.7.7 | 1.11.4 |
org.apache.avro:avro |
CVE-2023-39410 | 🚨 HIGH | 1.7.7 | 1.11.3 |
org.apache.derby:derby |
CVE-2022-46337 | 🔥 CRITICAL | 10.14.2.0 | 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0 |
org.apache.ivy:ivy |
CVE-2022-46751 | 🚨 HIGH | 2.5.1 | 2.5.2 |
org.apache.mesos:mesos |
CVE-2018-1330 | 🚨 HIGH | 1.4.3 | 1.6.0 |
org.apache.spark:spark-core_2.12 |
CVE-2025-54920 | 🚨 HIGH | 3.5.6 | 3.5.7 |
org.apache.thrift:libthrift |
CVE-2019-0205 | 🚨 HIGH | 0.12.0 | 0.13.0 |
org.apache.thrift:libthrift |
CVE-2020-13949 | 🚨 HIGH | 0.12.0 | 0.14.0 |
org.apache.zookeeper:zookeeper |
CVE-2023-44981 | 🔥 CRITICAL | 3.6.3 | 3.7.2, 3.8.3, 3.9.1 |
org.eclipse.jetty:jetty-server |
CVE-2024-13009 | 🚨 HIGH | 9.4.56.v20240826 | 9.4.57.v20241219 |
org.lz4:lz4-java |
CVE-2025-12183 | 🚨 HIGH | 1.8.0 | 1.8.1 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: Node.js
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: Python
Vulnerabilities (27)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
Authlib |
CVE-2026-27962 | 🔥 CRITICAL | 1.6.6 | 1.6.9 |
Authlib |
CVE-2026-28490 | 🚨 HIGH | 1.6.6 | 1.6.9 |
Authlib |
CVE-2026-28498 | 🚨 HIGH | 1.6.6 | 1.6.9 |
Authlib |
CVE-2026-28802 | 🚨 HIGH | 1.6.6 | 1.6.7 |
PyJWT |
CVE-2026-32597 | 🚨 HIGH | 2.11.0 | 2.12.0 |
Werkzeug |
CVE-2024-34069 | 🚨 HIGH | 2.2.3 | 3.0.3 |
aiohttp |
CVE-2025-69223 | 🚨 HIGH | 3.12.12 | 3.13.3 |
apache-airflow |
CVE-2026-26929 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow |
CVE-2026-28779 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow |
CVE-2026-30911 | 🚨 HIGH | 3.1.7 | 3.1.8 |
apache-airflow-providers-http |
CVE-2025-69219 | 🚨 HIGH | 5.6.4 | 6.0.0 |
cryptography |
CVE-2026-26007 | 🚨 HIGH | 42.0.8 | 46.0.5 |
jaraco.context |
CVE-2026-23949 | 🚨 HIGH | 5.3.0 | 6.1.0 |
jaraco.context |
CVE-2026-23949 | 🚨 HIGH | 6.0.1 | 6.1.0 |
litellm |
CVE-2026-35030 | 🔥 CRITICAL | 1.81.6 | 1.83.0 |
litellm |
CVE-2026-35029 | 🚨 HIGH | 1.81.6 | 1.83.0 |
protobuf |
CVE-2026-0994 | 🚨 HIGH | 4.25.8 | 6.33.5, 5.29.6 |
pyOpenSSL |
CVE-2026-27459 | 🚨 HIGH | 24.1.0 | 26.0.0 |
pyasn1 |
CVE-2026-30922 | 🚨 HIGH | 0.6.2 | 0.6.3 |
ray |
CVE-2025-62593 | 🔥 CRITICAL | 2.47.1 | 2.52.0 |
starlette |
CVE-2025-62727 | 🚨 HIGH | 0.48.0 | 0.49.1 |
tornado |
CVE-2026-31958 | 🚨 HIGH | 6.5.4 | 6.5.5 |
tornado |
CVE-2026-35536 | 🚨 HIGH | 6.5.4 | 6.5.5 |
urllib3 |
CVE-2025-66418 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2025-66471 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2026-21441 | 🚨 HIGH | 1.26.20 | 2.6.3 |
wheel |
CVE-2026-24049 | 🚨 HIGH | 0.45.1 | 0.46.2 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: usr/bin/docker
Vulnerabilities (2)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
stdlib |
CVE-2025-68121 | 🔥 CRITICAL | v1.25.6 | 1.24.13, 1.25.7, 1.26.0-rc.3 |
stdlib |
CVE-2026-25679 | 🚨 HIGH | v1.25.6 | 1.25.8, 1.26.1 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: /etc/ssl/private/ssl-cert-snakeoil.key
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO
No Vulnerabilities Found
| @classmethod | ||
| def setUpClass(cls): | ||
| """set up class""" | ||
| cls.addClassCleanup(cls.tearDownClass) | ||
| service: DatabaseService = cls.metadata.create_or_update( |
There was a problem hiding this comment.
setUpClass already triggers tearDownClass automatically in unittest. Adding cls.addClassCleanup(cls.tearDownClass) will cause tearDownClass to run twice on a normal successful run, which will likely error here because tearDownClass assumes the service still exists (get_by_name(...).id.root without a None check) and also removes the sqlite file. Instead, register idempotent cleanup callbacks for the individual resources (service deletion, file removal), or make tearDownClass safe to call multiple times and avoid registering it as a class cleanup.
|
|
||
| @classmethod | ||
| def setUpClass(cls) -> None: | ||
| cls.addClassCleanup(cls.tearDownClass) |
There was a problem hiding this comment.
unittest will call tearDownClass automatically; registering cls.addClassCleanup(cls.tearDownClass) means the class teardown runs twice on success. Even if most operations are wrapped in accumulate_errors, this can still introduce redundant work and potential flakiness (e.g., stopping an already-stopped container). Prefer adding class cleanups for the specific resources created in setUpClass (container stop + service deletion) or ensure tearDownClass is fully idempotent and do not register it as a class cleanup.
| cls.addClassCleanup(cls.tearDownClass) |
|
|
||
| - name: Install msodbcsql18 | ||
| if: ${{ inputs.install-mode == 'full' }} | ||
| run: ACCEPT_EULA=Y sudo apt-get -qq install -y msodbcsql18 |
There was a problem hiding this comment.
The msodbcsql18 install step no longer runs apt-get update (the previous combined install did). If the package lists are stale or the caching action doesn't refresh them, this can fail with "Unable to locate package" and the -qq flag will make diagnosis harder. Consider running sudo apt-get update (or ensuring the cache step refreshes indexes) before installing msodbcsql18, and/or avoid -qq for this step so failures are debuggable.
| run: ACCEPT_EULA=Y sudo apt-get -qq install -y msodbcsql18 | |
| run: | | |
| sudo apt-get update | |
| ACCEPT_EULA=Y sudo apt-get install -y msodbcsql18 |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
| @@ -154,19 +154,37 @@ def unit_tests(session): | |||
| test_paths = [a for a in args if not a.startswith("-")] | |||
| extra_flags = [a for a in args if a.startswith("-")] | |||
|
|
|||
There was a problem hiding this comment.
The current posargs parsing splits flags vs paths by startswith('-'), which breaks common pytest flags that take a separate value (e.g. the docstring example -k test_name becomes ['test_name', '-k'] and will be interpreted as a path + a flag missing its value). It also breaks --dist load (the load token is treated as a path). Consider preserving argument order and/or using an argparse-style parser for the handful of flags you need to intercept (--no-cov, --dist) while passing the rest through unchanged.
| if standalone: | ||
| args.remove("--standalone") | ||
|
|
||
| skip_cov = "--no-cov" in args | ||
| if skip_cov: | ||
| args.remove("--no-cov") |
There was a problem hiding this comment.
The --no-cov handling here relies on later posargs processing that splits flags vs paths using startswith('-'). That approach breaks pytest flags that take a separate value (e.g. -k expr, -m marker, --maxfail 1), since the value token is treated as a test path and the flag loses its value. Consider switching to an order-preserving pass-through (only removing --standalone, --no-cov, and --workers from the args list) to avoid reordering or losing paired values.
| - name: Cache and install Ubuntu dependencies | ||
| uses: awalsh128/cache-apt-pkgs-action@latest | ||
| with: | ||
| packages: unixodbc-dev python3-venv librdkafka-dev gcc libsasl2-dev build-essential libssl-dev libffi-dev libevent-dev python3-dev libkrb5-dev tdsodbc | ||
| version: 1.0 |
There was a problem hiding this comment.
Using awalsh128/cache-apt-pkgs-action@latest makes the CI supply chain non-deterministic (the referenced code can change over time) and is generally discouraged for security/reproducibility. Pin this action to an immutable version tag or, ideally, a commit SHA.
| - name: Cache and install Ubuntu dependencies | |
| uses: awalsh128/cache-apt-pkgs-action@latest | |
| with: | |
| packages: unixodbc-dev python3-venv librdkafka-dev gcc libsasl2-dev build-essential libssl-dev libffi-dev libevent-dev python3-dev libkrb5-dev tdsodbc | |
| version: 1.0 | |
| - name: Install Ubuntu dependencies | |
| run: | | |
| sudo apt-get update -qq | |
| sudo apt-get install -y \ | |
| unixodbc-dev \ | |
| python3-venv \ | |
| librdkafka-dev \ | |
| gcc \ | |
| libsasl2-dev \ | |
| build-essential \ | |
| libssl-dev \ | |
| libffi-dev \ | |
| libevent-dev \ | |
| python3-dev \ | |
| libkrb5-dev \ | |
| tdsodbc | |
| shell: bash |
| while status not in ("success", "completed"): | ||
| try: | ||
| response = metadata.client.get( | ||
| "/apps/name/SearchIndexingApplication/status?offset=0&limit=1" | ||
| ) | ||
| except Exception as e: | ||
| if "TimeoutError" in str(type(e).__name__) or "RuntimeError" in str( | ||
| type(e).__name__ | ||
| ): | ||
| raise | ||
| time.sleep(1) | ||
| continue | ||
| time.sleep(2) | ||
| if len(response["data"]) == 0: | ||
| raise RuntimeError("No reindexing status found after triggering") | ||
|
|
||
| status = response["data"][0]["status"] | ||
|
|
||
| if status in ("failed", "error"): | ||
| reindex_failed = True | ||
| break | ||
|
|
||
| if time.time() - start_complete > complete_timeout: | ||
| raise TimeoutError( | ||
| f"Timed out waiting for reindexing to complete. " | ||
| f"Current status: {status}, elapsed: {int(time.time() - start_complete)}s" | ||
| ) | ||
| except Exception as e: | ||
| if "TimeoutError" in str(type(e).__name__) or "RuntimeError" in str( | ||
| type(e).__name__ | ||
| ): | ||
| raise | ||
| time.sleep(1) | ||
| continue |
There was a problem hiding this comment.
In the polling loop, the timeout check is only evaluated inside the try block after a successful response parse. If metadata.client.get(...) repeatedly raises a non-TimeoutError/non-RuntimeError exception (e.g. transient network/JSON/KeyError), the except path will continue indefinitely without ever checking complete_timeout, so this loop can hang forever. Consider checking elapsed time on every iteration (including exception paths) and failing after complete_timeout (and optionally counting consecutive failures).
| # pytest doesn't receive conflicting values. | ||
| has_dist_override = any( | ||
| f == "--dist" or f.startswith("--dist=") for f in extra_flags | ||
| ) |
There was a problem hiding this comment.
The posargs parsing splits session.posargs into test_paths vs extra_flags based only on a leading -. This breaks for pytest flags that take a separate value (e.g. --dist load, -k expr, --maxfail 1): the value is incorrectly treated as a test path, and for --dist specifically pytest will receive --dist without its required argument. Consider either (a) not splitting at all and just append session.posargs as-is, or (b) use a proper argparse-like scan that keeps flag/value pairs together (handling --flag=value and --flag value).
| try: | ||
| cm = try_bind(container, 1433, 1433) | ||
| container = cm.__enter__() | ||
| except (docker.errors.BuildError, docker.errors.APIError) as exc: | ||
| pytest.skip(f"MSSQL container unavailable: {exc}") | ||
|
|
||
| try: |
There was a problem hiding this comment.
The fixture manually calls cm.__enter__() / cm.__exit__(None, None, None) instead of using a with statement. In the finally block, __exit__ is always invoked with (None, None, None) even if an exception occurred inside the fixture body; this can prevent context managers from handling cleanup correctly based on the exception info. Prefer with try_bind(...) as container: and do the skip logic around that, or call cm.__exit__(*sys.exc_info()) in finally so the context manager receives the real exception details.
| has_dist_override = any(a == "--dist" or a.startswith("--dist=") for a in args) | ||
|
|
||
| # If the caller provided no explicit test paths, inject the default. | ||
| has_explicit_paths = any(not a.startswith("-") for a in args) |
There was a problem hiding this comment.
⚠️ Bug: has_explicit_paths false-positive on paired-flag values
The comment at lines 152-156 correctly explains why splitting args by startswith("-") is dangerous — paired options like -k expr or --maxfail 1 have value tokens that don't start with -. However, the has_explicit_paths check on line 170 does exactly that split:
has_explicit_paths = any(not a.startswith("-") for a in args)If a developer runs nox -s unit-tests -- -k test_something, args is ["-k", "test_something"]. The value "test_something" doesn't start with -, so has_explicit_paths is True, and the default tests/unit/ path is never injected. Pytest then receives no test path and falls back to testpaths from config or . — which may discover wrong tests or none at all.
The same issue exists in integration_tests at line 244.
This doesn't affect CI today because all shards pass explicit test paths in nox-args, but it breaks local developer workflows using -k, --maxfail N, etc.
Suggested fix:
Use a more robust heuristic, e.g. check if any arg
looks like a path (contains '/' or os.path.exists):
has_explicit_paths = any(
not a.startswith("-") and ("/" in a or os.path.exists(a))
for a in args
)
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
| try: | ||
| with try_bind(container, 1433, 1433) as container: | ||
| docker_container = container.get_wrapped_container() | ||
| copy_dir_to_container(str(data_dir), docker_container, "/data") | ||
| res = docker_container.exec_run( | ||
| [ | ||
| "bash", | ||
| "-c", | ||
| " ".join( | ||
| [ | ||
| "/opt/mssql-tools*/bin/sqlcmd", | ||
| "-U", | ||
| container.username, | ||
| "-P", | ||
| f"'{container.password}'", |
There was a problem hiding this comment.
💡 Edge Case: Wider exception catch may skip setup failures as skipped tests
The refactored mssql_container fixture now wraps the entire setup (docker exec, engine creation, SQL execution) inside try: with try_bind(...): ... except (docker.errors.BuildError, docker.errors.APIError). Previously, only the try_bind / __enter__ call was guarded by the docker-error catch; setup failures like exec_run or create_engine raising an APIError would propagate as real failures.
Now, any docker.errors.APIError raised during data copying, SQL execution, or engine creation would be caught and turned into pytest.skip(), masking legitimate setup bugs. While these specific exception types are uncommon after container start, the catch scope is broader than intended.
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
| # If the caller provided no explicit test paths, inject the default. | ||
| has_explicit_paths = any(not a.startswith("-") for a in args) | ||
|
|
||
| pytest_args = [ | ||
| "-c", | ||
| "pyproject.toml", | ||
| "--cov=metadata", | ||
| "--cov-branch", | ||
| "--cov-config=pyproject.toml", | ||
| "--junitxml=junit/test-results-unit.xml", | ||
| "-n", | ||
| "auto", | ||
| "--dist", | ||
| "loadfile", | ||
| "--durations=20", | ||
| ] | ||
|
|
||
| pytest_args.extend(test_paths or ["tests/unit/"]) | ||
| pytest_args.extend(extra_flags) | ||
| if not has_dist_override: | ||
| pytest_args += ["--dist", "loadfile"] | ||
|
|
||
| if not skip_cov: | ||
| pytest_args += [ | ||
| "--cov=metadata", | ||
| "--cov-branch", | ||
| "--cov-config=pyproject.toml", | ||
| ] | ||
|
|
||
| if not has_explicit_paths: | ||
| pytest_args.append("tests/unit/") | ||
|
|
There was a problem hiding this comment.
has_explicit_paths = any(not a.startswith('-') for a in args) will incorrectly treat values for paired pytest options (e.g. -k expr, --maxfail 1) as “explicit paths”. As a result, nox -s unit-tests -- -k foo won’t inject the default tests/unit/ path and will run collection from the repo root instead. Consider detecting explicit paths via filesystem existence / globbing (e.g. Path(a).exists() for non-flag tokens) or a stricter heuristic (tokens containing / or ending with .py) so flag values don’t suppress the default path.
| # We only strip our custom flags (--standalone, --no-cov, --workers) from posargs | ||
| # and pass everything else through to pytest in original order. We avoid splitting | ||
| # by startswith("-") because that breaks paired options like "-k expr" or | ||
| # "--maxfail 1" — the value token would be misclassified as a test path. | ||
| args = list(session.posargs) | ||
| standalone = "--standalone" in args | ||
| if standalone: | ||
| args.remove("--standalone") | ||
|
|
||
| # --no-cov is a CI-only flag to skip coverage on non-main Python versions. | ||
| skip_cov = "--no-cov" in args | ||
| if skip_cov: | ||
| args.remove("--no-cov") | ||
|
|
||
| workers = os.environ.get("PYTEST_INTEGRATION_WORKERS", "0") | ||
| if "--workers" in args: | ||
| idx = args.index("--workers") | ||
| workers = args[idx + 1] | ||
| args = args[:idx] + args[idx + 2 :] | ||
|
|
||
| # Separate test paths from pytest flags in posargs | ||
| test_paths = [a for a in args if not a.startswith("-")] | ||
| extra_flags = [a for a in args if a.startswith("-")] | ||
| # If no explicit test paths, inject the default directory. | ||
| has_explicit_paths = any(not a.startswith("-") for a in args) | ||
|
|
||
| pytest_args = [ | ||
| "-c", | ||
| "pyproject.toml", | ||
| "--cov=metadata", | ||
| "--cov-branch", | ||
| "--cov-config=pyproject.toml", | ||
| "--junitxml=junit/test-results-integration.xml", | ||
| ] | ||
|
|
||
| if not standalone: | ||
| if not skip_cov: | ||
| pytest_args += [ | ||
| "--cov=metadata", | ||
| "--cov-branch", | ||
| "--cov-config=pyproject.toml", | ||
| ] | ||
|
|
||
| if not standalone and not skip_cov: | ||
| pytest_args.append("--cov-append") | ||
|
|
||
| use_xdist = int(workers) > 0 | ||
| if use_xdist: | ||
| pytest_args.extend([f"-n{workers}", "--dist=loadgroup"]) | ||
|
|
||
| pytest_args.extend(test_paths or ["tests/integration/"]) | ||
| pytest_args.extend(extra_flags) | ||
| if not has_explicit_paths: | ||
| pytest_args.append("tests/integration/") | ||
|
|
There was a problem hiding this comment.
Same issue as unit-tests: has_explicit_paths = any(not a.startswith('-') for a in args) will be true for paired-option values like -k expr or --maxfail 1, which prevents injecting the default tests/integration/ path. This can unexpectedly run all tests when callers only intended to filter. Use a stricter “is this a path?” check (e.g. Path(token).exists()/glob, or //.py heuristics) before deciding to omit the default path.
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def _verify_tables_readable(engine: Engine, table_names: list): | ||
| """Verify all tables are readable after data loading. |
There was a problem hiding this comment.
logger is defined but never used. If logging isn’t needed here, remove import logging and the logger = logging.getLogger(__name__) assignment; otherwise, use logger in the retry loop (e.g., to log attempts/exceptions) to justify keeping it.
| transaciton = conn.begin() | ||
| conn.execute( | ||
| text( | ||
| f"SELECT * INTO {db_name}.SalesLT.CustomerCopy FROM {db_name}.SalesLT.Customer;" | ||
| ) | ||
| ) | ||
| transaciton.commit() |
There was a problem hiding this comment.
Typo in variable name transaciton makes the fixture harder to read/maintain. Rename to transaction (and update uses) to avoid confusion.
| transaciton = conn.begin() | |
| conn.execute( | |
| text( | |
| f"SELECT * INTO {db_name}.SalesLT.CustomerCopy FROM {db_name}.SalesLT.Customer;" | |
| ) | |
| ) | |
| transaciton.commit() | |
| transaction = conn.begin() | |
| conn.execute( | |
| text( | |
| f"SELECT * INTO {db_name}.SalesLT.CustomerCopy FROM {db_name}.SalesLT.Customer;" | |
| ) | |
| ) | |
| transaction.commit() |
| - name: Restore uv package cache | ||
| id: cache-uv | ||
| uses: actions/cache/restore@v4 | ||
| with: | ||
| path: ${{ env.UV_CACHE_DIR }} | ||
| key: uv-${{ runner.os }}-py${{ inputs.python-version }}-${{ inputs.install-mode }}-${{ hashFiles('ingestion/setup.py') }} | ||
| restore-keys: | | ||
| uv-${{ runner.os }}-py${{ inputs.python-version }}-${{ inputs.install-mode }}- | ||
| uv-${{ runner.os }}-py${{ inputs.python-version }}- | ||
|
|
There was a problem hiding this comment.
The cache steps use path: ${{ env.UV_CACHE_DIR }}, but this composite action doesn’t define UV_CACHE_DIR anywhere. If astral-sh/setup-uv doesn’t export it on the runner, this resolves to an empty path and can cause actions/cache to error or silently skip caching. Consider setting UV_CACHE_DIR explicitly (e.g. ~/.cache/uv) and/or using a known output/input from setup-uv to determine the cache directory.
Code Review
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
|
|
Hey @SumanMaharana, I have only one question... What are we gaining from splitting the static checks and unit tests jobs if the bottleneck for the whole workflow are the integration tests? Aren't we just repeating some steps and using more resources for no gain? |
Before this PR, static checks ran inside the py-unit-tests matrix across all 3 Python versions. Since linting and type checking are not Python-version-sensitive, we were running identical static checks 3 times (~5 min × 3 runners = ~15 runner-minutes), and the unit tests couldn't start until static checks on each runner finished. After this PR, static checks run exactly once on Python 3.10 with a lighter install-mode: dev-only (~6 runner-minutes total), while the unit test shards and integration test shards all start at T=0 in parallel. The CI critical path looks like: The "repeated steps" (checkout, wait-for-label, env setup) are all sub-minute cached operations — they don't materially contribute to runner-minutes. The expensive step, running nox, is cheaper for py-static-checks than it used to be (dev-only deps, no test collection overhead). |



Describe your changes:
Fixes #26898
This pull request refactors the Python test workflow to separate static checks from unit tests, introduces a more flexible dependency installation mode for the test environment setup, and improves test coverage handling. The changes streamline CI runs, reduce resource usage for static checks, and make coverage reporting more accurate.
Workflow and CI improvements:
py-static-checksfor static analysis (using only the main Python version and minimal dependencies) andpy-unit-testsfor running unit tests across multiple Python versions. This reduces unnecessary dependency installation and speeds up static checks. [1] [2] [3]install-modeinput to.github/actions/setup-openmetadata-test-environment, allowing either a full install (all dependencies) or a lightweight "dev-only" install for static checks. The static check job now uses the "dev-only" mode. [1] [2] [3]Dependency and caching enhancements:
Test coverage and reporting:
unit_testsinnoxfile.pyto allow skipping coverage collection via a--no-covflag, which is now used for non-main Python versions to avoid redundant coverage runs. [1] [2]pyproject.tomlto omit generated code from coverage reports, making coverage metrics more meaningful.Other workflow tweaks:
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>