Conversation
c7a2d07 to
92b7eda
Compare
Performance BenchmarksCompared
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 92b7eda723
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 392aba22d4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d28f386e00
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ants Address the three nice-to-fix items from review: 1. Aggregate JIT/network-pass oversize and read-failure events into a single summary INFO check per kind. Adversarial archives with many unreachable members previously produced one INFO finding per member (verified live: 11 oversize members → 11 checks), flooding SARIF output and dashboards. Collect entries into `details["zip_entries"]` and `details["entries"]` instead, with a `skipped_count` / `failed_count` summary. The `mark_inconclusive_scan_result` call is hoisted out of the loop so the metadata reason is recorded once. 2. Document the identity-based pickle dedup. `pickle_files` and `safe_entries` share `ZipInfo` instances because both come from the same `infolist()` walk upstream, which is what makes `id()` work. A future refactor that rebuilds `pickle_files` from filenames or from a separate `infolist()` call would silently defeat the dedup; the inline comment now calls that out and suggests a fallback key. 3. Document the `pickle_members_scanned` proxy. It means the scanner is wired up, not that every pickle member was actually processed — if the pickle scanner crashed mid-scan on a member, that member is still skipped here. The trade-off is intentional; the comment makes it explicit. Also document why `max_jit_scan_member_bytes=0` falls back to the default (32 MiB) instead of meaning "unlimited" the way `ZipScanner.max_entry_size=0` does: this pass cannot safely run unbounded. Expand the CHANGELOG to mention the aggregation, duplicate-name handling, pickle dedup, and directory-entry skip. Test expectations updated: the existing size-limit and read-failure tests now assert aggregation shape (single check, per-entry list), and a new `test_pytorch_zip_jit_scan_aggregates_many_oversize_members_ into_one_check` proves 25 generated oversize members collapse to a single check with a deduplicated reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b8e0cd8574
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Apply the same refinements this reviewer just landed on PR #1048 to the hidden-pickle discovery path: 1. Aggregate probe-failure INFO checks. An adversarial archive with many members that raise on decompression (for example, unsupported methods or intermittent I/O) previously produced one `Pickle Discovery` INFO finding per member, flooding the checks list. Collect failures into a single summary check carrying the per-member exceptions under `details["entries"]`, with `details["zip_entries"]` and `details["failed_count"]` for quick consumers. `mark_inconclusive_ scan_result` is hoisted out of the loop so the metadata reason is recorded exactly once even if many members fail. 2. Document the identity-based dedup invariant in `_discover_pickle_files`. Both passes iterate the same `safe_entries` list, so `id(ZipInfo)` is stable for the duration of discovery; a future refactor that rebuilds the list from separate `infolist()` walks or fresh `ZipInfo` constructions would silently defeat the dedup. Call that out inline so we don't re-learn it the hard way. 3. Explain the magic thresholds in `_looks_like_binary_pickle_prefix`. The `>= 4` clean-parse and `>= 2` truncation thresholds are tuned to balance tensor-storage false positives against real pickle prefixes; undocumented they read like arbitrary numbers. 4. Add "keep in sync" comments on the standalone picklescan copies of `_looks_like_binary_pickle_prefix` and `_looks_like_proto0_or_1_ pickle` so the duplication (intentional for the standalone package) is at least signposted. Expand the CHANGELOG bullet to describe the always-on second-pass sniff, the fail-closed aggregation behavior, and the standalone-package mirror. Adds a new `test_pytorch_zip_discovery_aggregates_probe_failures_ into_single_check` regression test that proves 5 failing members collapse to one `Pickle Discovery` check with a deduplicated reason and all five per-member records under `details["entries"]`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: detect hidden pytorch zip pickles * fix: fail closed on hidden pickle probe errors * fix: aggregate hidden-pickle probe failures and document invariants Apply the same refinements this reviewer just landed on PR #1048 to the hidden-pickle discovery path: 1. Aggregate probe-failure INFO checks. An adversarial archive with many members that raise on decompression (for example, unsupported methods or intermittent I/O) previously produced one `Pickle Discovery` INFO finding per member, flooding the checks list. Collect failures into a single summary check carrying the per-member exceptions under `details["entries"]`, with `details["zip_entries"]` and `details["failed_count"]` for quick consumers. `mark_inconclusive_ scan_result` is hoisted out of the loop so the metadata reason is recorded exactly once even if many members fail. 2. Document the identity-based dedup invariant in `_discover_pickle_files`. Both passes iterate the same `safe_entries` list, so `id(ZipInfo)` is stable for the duration of discovery; a future refactor that rebuilds the list from separate `infolist()` walks or fresh `ZipInfo` constructions would silently defeat the dedup. Call that out inline so we don't re-learn it the hard way. 3. Explain the magic thresholds in `_looks_like_binary_pickle_prefix`. The `>= 4` clean-parse and `>= 2` truncation thresholds are tuned to balance tensor-storage false positives against real pickle prefixes; undocumented they read like arbitrary numbers. 4. Add "keep in sync" comments on the standalone picklescan copies of `_looks_like_binary_pickle_prefix` and `_looks_like_proto0_or_1_ pickle` so the duplication (intentional for the standalone package) is at least signposted. Expand the CHANGELOG bullet to describe the always-on second-pass sniff, the fail-closed aggregation behavior, and the standalone-package mirror. Adds a new `test_pytorch_zip_discovery_aggregates_probe_failures_ into_single_check` regression test that proves 5 failing members collapse to one `Pickle Discovery` check with a deduplicated reason and all five per-member records under `details["entries"]`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: align picklescan proto0 probe trivia --------- Co-authored-by: mldangelo <michael.l.dangelo@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…torch-jit-bounded-reads # Conflicts: # CHANGELOG.md # modelaudit/scanners/pytorch_zip_scanner.py # tests/scanners/test_pytorch_zip_scanner.py
Summary
Finding
Fixes finding 6: JIT scan reads full ZIP members into memory.
Validation