Skip to content

fix: route ONNX pb files by content#1029

Merged
mldangelo-oai merged 2 commits intomainfrom
mdangelo/codex/route-onnx-pb-by-content
Apr 16, 2026
Merged

fix: route ONNX pb files by content#1029
mldangelo-oai merged 2 commits intomainfrom
mdangelo/codex/route-onnx-pb-by-content

Conversation

@mldangelo-oai
Copy link
Copy Markdown
Contributor

Summary

  • detect ONNX content hints before falling back to the generic .pb/protobuf extension route
  • treat ONNX-in-.pb as valid file-type routing and suppress the expected mismatch warning
  • add filetype and core dispatch regressions for ONNX payloads saved with a .pb suffix

Validation

  • uv run ruff format modelaudit/utils/file/detection.py modelaudit/core.py tests/utils/file/test_filetype.py tests/test_core.py
  • uv run ruff check modelaudit/utils/file/detection.py modelaudit/core.py tests/utils/file/test_filetype.py tests/test_core.py
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/utils/file/test_filetype.py tests/test_core.py -q
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2026

Workflow run and artifacts

Performance Benchmarks

Compared 19 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 19 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 167.39ms -> 168.15ms (+0.5%).

Benchmark Target Size Files Baseline Current Change Status
tests/benchmarks/test_scan_benchmarks.py::test_detect_file_format_safe_pickle safe_model.pkl 49.4 KiB 1 21.0us 22.5us +7.1% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_directory duplicate-corpus 840.0 KiB 81 39.80ms 40.92ms +2.8% stable
tests/benchmarks/test_scan_benchmarks.py::test_validate_file_type_pytorch_zip state_dict.pt 1.5 MiB 1 37.4us 38.2us +2.0% stable
tests/benchmarks/test_scan_benchmarks.py::test_skip_filter_plain_text_files - 4.6 KiB 256 10.35ms 10.55ms +1.9% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[safe_small] safe_small 68 B 1 52.6us 53.6us +1.9% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_multi_stream_padded_payload multi_stream_padded 4.1 KiB 1 110.8us 112.6us +1.6% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_mixed_directory mixed-corpus 1.7 MiB 54 66.53ms 65.60ms -1.4% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_safe_pickle safe_model.pkl 49.4 KiB 1 9.91ms 10.04ms +1.4% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_hex] nested_hex 130 B 1 119.8us 121.3us +1.2% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_opcode_budget_tail_payload opcode_budget_tail 14 B 1 69.8us 69.0us -1.2% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_dangerous_global_payloads[stack_global] stack_global 21 B 1 61.5us 60.9us -1.0% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_stream chunked_stream 278.2 KiB 1 6.85ms 6.92ms +1.0% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_base64] nested_base64 98 B 1 105.8us 104.8us -0.9% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_pytorch_zip state_dict.pt 1.5 MiB 1 27.79ms 27.98ms +0.7% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[safe_large] safe_large 278.2 KiB 1 3.93ms 3.91ms -0.6% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_hidden_suspicious_string_budget hidden_suspicious_string 8.0 KiB 1 508.6us 511.6us +0.6% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_dangerous_global_payloads[malicious_reduce] malicious_reduce 52 B 1 72.2us 71.8us -0.5% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[long_benign_string] long_benign_string 1.0 MiB 1 972.6us 971.2us -0.1% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_raw] nested_raw 78 B 1 89.4us 89.4us -0.1% stable

@mldangelo-oai mldangelo-oai marked this pull request as ready for review April 15, 2026 23:31
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca7e41601e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


def _looks_like_onnx_model_candidate_file(path: Path, size: int, header: bytes) -> bool:
"""Run the bounded ONNX parser only for plausible protobuf tag starts."""
return _has_onnx_model_tag_start(header) and _looks_like_onnx_model_file(path, size)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Parse ONNX candidates without first-byte whitelisting

_looks_like_onnx_model_candidate_file short-circuits on _has_onnx_model_tag_start(header). Protobuf allows unknown/reordered fields, so a valid ONNX ModelProto can legally start with a different tag (e.g., prefixed unknown field). In that case this returns False, routing a malicious .pb as generic protobuf and bypassing ONNX-specific checks (like PythonOp detection).

Useful? React with 👍 / 👎.

@mldangelo-oai mldangelo-oai force-pushed the mdangelo/codex/route-onnx-pb-by-content branch from ca7e416 to 7a711c2 Compare April 16, 2026 06:32
@mldangelo-oai mldangelo-oai merged commit 6e9aa45 into main Apr 16, 2026
8 checks passed
@mldangelo-oai mldangelo-oai deleted the mdangelo/codex/route-onnx-pb-by-content branch April 16, 2026 06:32
@github-actions github-actions bot mentioned this pull request Apr 16, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7a711c276a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


def _looks_like_onnx_model_candidate_file(path: Path, size: int, header: bytes) -> bool:
"""Run the bounded ONNX parser only for plausible protobuf tag starts."""
return _has_onnx_model_tag_start(header) and _looks_like_onnx_model_file(path, size)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Parse ONNX candidates without first-byte whitelist

_looks_like_onnx_model_candidate_file() only runs the ONNX structural parser when header[0] is in _ONNX_MODEL_TOP_LEVEL_TAG_START_BYTES. Protobuf permits reordered/unknown fields, so valid ModelProto payloads can start with other tags (e.g., unknown-field prefixes or other top-level fields). Those .pb files are then routed as generic protobuf, skipping ONNX-specific checks like PythonOp detection.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant