fix: detect hidden pytorch zip pickles by mldangelo-oai · Pull Request #1043 · promptfoo/modelaudit

mldangelo-oai · 2026-04-17T00:33:18Z

Summary

always bounded-sniff unselected PyTorch ZIP members for hidden pickle streams, even after discovering data.pkl
route extensionless and storage-path pickle payloads into the pickle scanner while keeping benign storage blobs and pickle-ish text clean
mirror hidden-member discovery in the standalone modelaudit-picklescan PyTorch ZIP path

Finding

Fixes finding 1: hidden PyTorch ZIP pickles are skipped after data.pkl.

Validation

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py packages/modelaudit-picklescan/tests/test_api.py --maxfail=1
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1

github-actions · 2026-04-17T00:34:24Z

Workflow run and artifacts

Performance Benchmarks

Compared 19 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 19 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 190.45ms -> 192.14ms (+0.9%).

Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[safe_small]`	`safe_small`	68 B	1	44.8us	48.1us	+7.3%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_multi_stream_padded_payload`	`multi_stream_padded`	4.1 KiB	1	103.8us	107.2us	+3.3%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_pytorch_zip`	`state_dict.pt`	1.5 MiB	1	32.16ms	33.04ms	+2.8%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_raw]`	`nested_raw`	78 B	1	80.0us	78.0us	-2.5%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_detect_file_format_safe_pickle`	`safe_model.pkl`	49.4 KiB	1	23.9us	24.3us	+1.7%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_validate_file_type_pytorch_zip`	`state_dict.pt`	1.5 MiB	1	43.2us	43.9us	+1.6%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[safe_large]`	`safe_large`	278.2 KiB	1	3.77ms	3.72ms	-1.4%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_skip_filter_plain_text_files`	`-`	4.6 KiB	256	10.14ms	10.02ms	-1.3%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_opcode_budget_tail_payload`	`opcode_budget_tail`	14 B	1	54.9us	55.6us	+1.2%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_mixed_directory`	`mixed-corpus`	1.7 MiB	54	77.06ms	77.91ms	+1.1%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_stream`	`chunked_stream`	278.2 KiB	1	6.73ms	6.66ms	-1.0%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_dangerous_global_payloads[malicious_reduce]`	`malicious_reduce`	52 B	1	61.3us	60.7us	-1.0%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_safe_pickle`	`safe_model.pkl`	49.4 KiB	1	12.03ms	11.95ms	-0.7%	stable
`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_directory`	`duplicate-corpus`	840.0 KiB	81	46.15ms	46.43ms	+0.6%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_base64]`	`nested_base64`	98 B	1	80.4us	79.9us	-0.6%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_hidden_suspicious_string_budget`	`hidden_suspicious_string`	8.0 KiB	1	597.4us	599.8us	+0.4%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_dangerous_global_payloads[stack_global]`	`stack_global`	21 B	1	52.0us	52.1us	+0.3%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[long_benign_string]`	`long_benign_string`	1.0 MiB	1	1.17ms	1.18ms	+0.1%	stable
`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_hex]`	`nested_hex`	130 B	1	83.8us	83.7us	-0.1%	stable

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4e6c684e3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Apply the same refinements this reviewer just landed on PR #1048 to the hidden-pickle discovery path: 1. Aggregate probe-failure INFO checks. An adversarial archive with many members that raise on decompression (for example, unsupported methods or intermittent I/O) previously produced one `Pickle Discovery` INFO finding per member, flooding the checks list. Collect failures into a single summary check carrying the per-member exceptions under `details["entries"]`, with `details["zip_entries"]` and `details["failed_count"]` for quick consumers. `mark_inconclusive_ scan_result` is hoisted out of the loop so the metadata reason is recorded exactly once even if many members fail. 2. Document the identity-based dedup invariant in `_discover_pickle_files`. Both passes iterate the same `safe_entries` list, so `id(ZipInfo)` is stable for the duration of discovery; a future refactor that rebuilds the list from separate `infolist()` walks or fresh `ZipInfo` constructions would silently defeat the dedup. Call that out inline so we don't re-learn it the hard way. 3. Explain the magic thresholds in `_looks_like_binary_pickle_prefix`. The `>= 4` clean-parse and `>= 2` truncation thresholds are tuned to balance tensor-storage false positives against real pickle prefixes; undocumented they read like arbitrary numbers. 4. Add "keep in sync" comments on the standalone picklescan copies of `_looks_like_binary_pickle_prefix` and `_looks_like_proto0_or_1_ pickle` so the duplication (intentional for the standalone package) is at least signposted. Expand the CHANGELOG bullet to describe the always-on second-pass sniff, the fail-closed aggregation behavior, and the standalone-package mirror. Adds a new `test_pytorch_zip_discovery_aggregates_probe_failures_ into_single_check` regression test that proves 5 failing members collapse to one `Pickle Discovery` check with a deduplicated reason and all five per-member records under `details["entries"]`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 234dfe66d3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…torch-hidden-pickles # Conflicts: # CHANGELOG.md

…torch-hidden-pickles # Conflicts: # CHANGELOG.md # modelaudit/scanners/pytorch_zip_scanner.py

mldangelo-oai marked this pull request as ready for review April 17, 2026 00:48

mldangelo-oai mentioned this pull request Apr 17, 2026

fix: close archive payload scan gaps #1042

Closed

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

Comment thread packages/modelaudit-picklescan/src/modelaudit_picklescan/api.py Outdated

fix: detect hidden pytorch zip pickles

a274fc8

mldangelo-oai force-pushed the mdangelo/codex/fix-pytorch-hidden-pickles branch from f4e6c68 to a274fc8 Compare April 17, 2026 00:56

mldangelo-oai and others added 2 commits April 16, 2026 18:10

fix: fail closed on hidden pickle probe errors

a1cef40

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

Comment thread packages/modelaudit-picklescan/src/modelaudit_picklescan/api.py Outdated

mldangelo-oai added 3 commits April 17, 2026 07:30

fix: align picklescan proto0 probe trivia

e3d779e

Merge remote-tracking branch 'origin/main' into mdangelo/codex/fix-py…

2c7153e

…torch-hidden-pickles # Conflicts: # CHANGELOG.md

Merge remote-tracking branch 'origin/main' into mdangelo/codex/fix-py…

ad90f1d

…torch-hidden-pickles # Conflicts: # CHANGELOG.md # modelaudit/scanners/pytorch_zip_scanner.py

mldangelo-oai merged commit 19b6ebe into main Apr 17, 2026
31 checks passed

mldangelo-oai deleted the mdangelo/codex/fix-pytorch-hidden-pickles branch April 17, 2026 14:50

This was referenced Apr 17, 2026

chore: release main #1007

Merged

chore: release main #1050

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: detect hidden pytorch zip pickles#1043

fix: detect hidden pytorch zip pickles#1043
mldangelo-oai merged 6 commits intomainfrom
mdangelo/codex/fix-pytorch-hidden-pickles

mldangelo-oai commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mldangelo-oai commented Apr 17, 2026

Summary

Finding

Validation

Uh oh!

github-actions bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Apr 17, 2026 •

edited

Loading