fix: bound pytorch zip jit reads by mldangelo-oai · Pull Request #1048 · promptfoo/modelaudit

mldangelo-oai · 2026-04-17T00:50:30Z

Summary

bound PyTorch ZIP JIT/network member reads with a configurable per-member cap
skip pickle members and numeric storage blobs in the JIT pass to avoid duplicate/raw memory-heavy scans
mark oversized JIT/network coverage inconclusive and finish unsuccessful when coverage is incomplete

Finding

Fixes finding 6: JIT scan reads full ZIP members into memory.

Validation

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py::test_pytorch_zip_jit_scan_size_limit_marks_inconclusive --maxfail=1
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1

github-actions · 2026-04-17T00:55:50Z

Workflow run and artifacts

Performance Benchmarks

Compared 19 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 19 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 194.79ms -> 191.82ms (-1.5%).

Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_opcode_budget_tail_payload`	`opcode_budget_tail`	14 B	1	71.0us	68.5us	-3.6%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_detect_file_format_safe_pickle`	`safe_model.pkl`	49.4 KiB	1	31.8us	30.6us	-3.6%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_validate_file_type_pytorch_zip`	`state_dict.pt`	1.5 MiB	1	53.3us	51.4us	-3.4%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_multi_stream_padded_payload`	`multi_stream_padded`	4.1 KiB	1	134.5us	130.0us	-3.4%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_dangerous_global_payloads[stack_global]`	`stack_global`	21 B	1	67.6us	65.5us	-3.2%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_skip_filter_plain_text_files`	`-`	4.6 KiB	256	13.64ms	13.24ms	-2.9%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_base64]`	`nested_base64`	98 B	1	102.9us	100.6us	-2.2%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_mixed_directory`	`mixed-corpus`	1.7 MiB	54	77.33ms	75.89ms	-1.9%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_raw]`	`nested_raw`	78 B	1	101.2us	99.4us	-1.7%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_directory`	`duplicate-corpus`	840.0 KiB	81	48.03ms	47.27ms	-1.6%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_stream`	`chunked_stream`	278.2 KiB	1	6.62ms	6.51ms	-1.5%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[safe_large]`	`safe_large`	278.2 KiB	1	3.47ms	3.51ms	+1.3%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_hex]`	`nested_hex`	130 B	1	109.2us	108.1us	-1.0%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_hidden_suspicious_string_budget`	`hidden_suspicious_string`	8.0 KiB	1	584.9us	590.5us	+0.9%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_pytorch_zip`	`state_dict.pt`	1.5 MiB	1	31.55ms	31.26ms	-0.9%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[long_benign_string]`	`long_benign_string`	1.0 MiB	1	1.09ms	1.08ms	-0.9%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[safe_small]`	`safe_small`	68 B	1	55.9us	55.8us	-0.3%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_safe_pickle`	`safe_model.pkl`	49.4 KiB	1	11.68ms	11.66ms	-0.2%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_dangerous_global_payloads[malicious_reduce]`	`malicious_reduce`	52 B	1	78.2us	78.1us	-0.1%	stable

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 92b7eda723

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 392aba22d4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d28f386e00

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ants Address the three nice-to-fix items from review: 1. Aggregate JIT/network-pass oversize and read-failure events into a single summary INFO check per kind. Adversarial archives with many unreachable members previously produced one INFO finding per member (verified live: 11 oversize members → 11 checks), flooding SARIF output and dashboards. Collect entries into `details["zip_entries"]` and `details["entries"]` instead, with a `skipped_count` / `failed_count` summary. The `mark_inconclusive_scan_result` call is hoisted out of the loop so the metadata reason is recorded once. 2. Document the identity-based pickle dedup. `pickle_files` and `safe_entries` share `ZipInfo` instances because both come from the same `infolist()` walk upstream, which is what makes `id()` work. A future refactor that rebuilds `pickle_files` from filenames or from a separate `infolist()` call would silently defeat the dedup; the inline comment now calls that out and suggests a fallback key. 3. Document the `pickle_members_scanned` proxy. It means the scanner is wired up, not that every pickle member was actually processed — if the pickle scanner crashed mid-scan on a member, that member is still skipped here. The trade-off is intentional; the comment makes it explicit. Also document why `max_jit_scan_member_bytes=0` falls back to the default (32 MiB) instead of meaning "unlimited" the way `ZipScanner.max_entry_size=0` does: this pass cannot safely run unbounded. Expand the CHANGELOG to mention the aggregation, duplicate-name handling, pickle dedup, and directory-entry skip. Test expectations updated: the existing size-limit and read-failure tests now assert aggregation shape (single check, per-entry list), and a new `test_pytorch_zip_jit_scan_aggregates_many_oversize_members_ into_one_check` proves 25 generated oversize members collapse to a single check with a deduplicated reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8e0cd8574

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Apply the same refinements this reviewer just landed on PR #1048 to the hidden-pickle discovery path: 1. Aggregate probe-failure INFO checks. An adversarial archive with many members that raise on decompression (for example, unsupported methods or intermittent I/O) previously produced one `Pickle Discovery` INFO finding per member, flooding the checks list. Collect failures into a single summary check carrying the per-member exceptions under `details["entries"]`, with `details["zip_entries"]` and `details["failed_count"]` for quick consumers. `mark_inconclusive_ scan_result` is hoisted out of the loop so the metadata reason is recorded exactly once even if many members fail. 2. Document the identity-based dedup invariant in `_discover_pickle_files`. Both passes iterate the same `safe_entries` list, so `id(ZipInfo)` is stable for the duration of discovery; a future refactor that rebuilds the list from separate `infolist()` walks or fresh `ZipInfo` constructions would silently defeat the dedup. Call that out inline so we don't re-learn it the hard way. 3. Explain the magic thresholds in `_looks_like_binary_pickle_prefix`. The `>= 4` clean-parse and `>= 2` truncation thresholds are tuned to balance tensor-storage false positives against real pickle prefixes; undocumented they read like arbitrary numbers. 4. Add "keep in sync" comments on the standalone picklescan copies of `_looks_like_binary_pickle_prefix` and `_looks_like_proto0_or_1_ pickle` so the duplication (intentional for the standalone package) is at least signposted. Expand the CHANGELOG bullet to describe the always-on second-pass sniff, the fail-closed aggregation behavior, and the standalone-package mirror. Adds a new `test_pytorch_zip_discovery_aggregates_probe_failures_ into_single_check` regression test that proves 5 failing members collapse to one `Pickle Discovery` check with a deduplicated reason and all five per-member records under `details["entries"]`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: detect hidden pytorch zip pickles * fix: fail closed on hidden pickle probe errors * fix: aggregate hidden-pickle probe failures and document invariants Apply the same refinements this reviewer just landed on PR #1048 to the hidden-pickle discovery path: 1. Aggregate probe-failure INFO checks. An adversarial archive with many members that raise on decompression (for example, unsupported methods or intermittent I/O) previously produced one `Pickle Discovery` INFO finding per member, flooding the checks list. Collect failures into a single summary check carrying the per-member exceptions under `details["entries"]`, with `details["zip_entries"]` and `details["failed_count"]` for quick consumers. `mark_inconclusive_ scan_result` is hoisted out of the loop so the metadata reason is recorded exactly once even if many members fail. 2. Document the identity-based dedup invariant in `_discover_pickle_files`. Both passes iterate the same `safe_entries` list, so `id(ZipInfo)` is stable for the duration of discovery; a future refactor that rebuilds the list from separate `infolist()` walks or fresh `ZipInfo` constructions would silently defeat the dedup. Call that out inline so we don't re-learn it the hard way. 3. Explain the magic thresholds in `_looks_like_binary_pickle_prefix`. The `>= 4` clean-parse and `>= 2` truncation thresholds are tuned to balance tensor-storage false positives against real pickle prefixes; undocumented they read like arbitrary numbers. 4. Add "keep in sync" comments on the standalone picklescan copies of `_looks_like_binary_pickle_prefix` and `_looks_like_proto0_or_1_ pickle` so the duplication (intentional for the standalone package) is at least signposted. Expand the CHANGELOG bullet to describe the always-on second-pass sniff, the fail-closed aggregation behavior, and the standalone-package mirror. Adds a new `test_pytorch_zip_discovery_aggregates_probe_failures_ into_single_check` regression test that proves 5 failing members collapse to one `Pickle Discovery` check with a deduplicated reason and all five per-member records under `details["entries"]`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: align picklescan proto0 probe trivia --------- Co-authored-by: mldangelo <michael.l.dangelo@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…torch-jit-bounded-reads # Conflicts: # CHANGELOG.md # modelaudit/scanners/pytorch_zip_scanner.py # tests/scanners/test_pytorch_zip_scanner.py

mldangelo-oai mentioned this pull request Apr 17, 2026

fix: close archive payload scan gaps #1042

Closed

fix: bound pytorch zip jit reads

92b7eda

mldangelo-oai force-pushed the mdangelo/codex/fix-pytorch-jit-bounded-reads branch from c7a2d07 to 92b7eda Compare April 17, 2026 00:54

mldangelo-oai marked this pull request as ready for review April 17, 2026 01:10

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

Comment thread modelaudit/scanners/pytorch_zip_scanner.py Outdated

fix: fail closed on pytorch jit read gaps

392aba2

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

Comment thread modelaudit/scanners/pytorch_zip_scanner.py Outdated

fix: preserve pytorch pickle jit scans under selection

d28f386

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

Comment thread modelaudit/scanners/pytorch_zip_scanner.py Outdated

mldangelo-oai and others added 2 commits April 16, 2026 21:34

fix: scan duplicate-name pytorch jit members

90a277e

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

Comment thread modelaudit/scanners/pytorch_zip_scanner.py Outdated

fix: keep pytorch pickle jit coverage bounded

3e8d60f

Merge remote-tracking branch 'origin/main' into mdangelo/codex/fix-py…

7b11a6d

…torch-jit-bounded-reads # Conflicts: # CHANGELOG.md # modelaudit/scanners/pytorch_zip_scanner.py # tests/scanners/test_pytorch_zip_scanner.py

mldangelo-oai merged commit f920d76 into main Apr 17, 2026
28 checks passed

mldangelo-oai deleted the mdangelo/codex/fix-pytorch-jit-bounded-reads branch April 17, 2026 14:59

github-actions bot mentioned this pull request Apr 17, 2026

chore: release main #1007

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: bound pytorch zip jit reads#1048

fix: bound pytorch zip jit reads#1048
mldangelo-oai merged 7 commits intomainfrom
mdangelo/codex/fix-pytorch-jit-bounded-reads

mldangelo-oai commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mldangelo-oai commented Apr 17, 2026

Summary

Finding

Validation

Uh oh!

github-actions bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Apr 17, 2026 •

edited

Loading