Skip to content

fix: route disguised nested archives in sevenzip scans#1017

Merged
mldangelo-oai merged 2 commits intomainfrom
mdangelo/codex/harden-sevenzip-nested-routing
Apr 16, 2026
Merged

fix: route disguised nested archives in sevenzip scans#1017
mldangelo-oai merged 2 commits intomainfrom
mdangelo/codex/harden-sevenzip-nested-routing

Conversation

@mldangelo-oai
Copy link
Copy Markdown
Contributor

Summary

Teach the 7z scanner to probe likely disguised members for bounded, directly routable header formats instead of checking only for nested 7z magic. This lets nested ZIP and TAR payloads without trustworthy suffixes become scannable, and keeps oversized disguised members in the explicit inconclusive path.

Security impact

Before this change, the 7z scanner only treated probed members as nested content when the first bytes matched 7z magic. A hidden ZIP or TAR inside a .7z file with no extension or a misleading suffix could be skipped entirely, which meant the nested scanner dispatch never ran. The new probe keeps the same budgeted workflow, but it recognizes directly routable header formats from a bounded prefix, including TAR headers at the standard offset.

Validation

  • uv run ruff format modelaudit/scanners/sevenzip_scanner.py tests/scanners/test_sevenzip_scanner.py
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_sevenzip_scanner.py -q
  • uv run mypy modelaudit/scanners/sevenzip_scanner.py tests/scanners/test_sevenzip_scanner.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2026

Workflow run and artifacts

Performance Benchmarks

Compared 19 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 19 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 185.22ms -> 183.71ms (-0.8%).

Benchmark Target Size Files Baseline Current Change Status
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_dangerous_global_payloads[malicious_reduce] malicious_reduce 52 B 1 84.3us 77.0us -8.6% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[safe_small] safe_small 68 B 1 59.4us 56.0us -5.8% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_base64] nested_base64 98 B 1 102.6us 106.3us +3.5% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_multi_stream_padded_payload multi_stream_padded 4.1 KiB 1 137.5us 133.0us -3.3% stable
tests/benchmarks/test_scan_benchmarks.py::test_detect_file_format_safe_pickle safe_model.pkl 49.4 KiB 1 30.8us 29.8us -3.2% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_pytorch_zip state_dict.pt 1.5 MiB 1 29.98ms 29.10ms -2.9% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_safe_pickle safe_model.pkl 49.4 KiB 1 10.53ms 10.34ms -1.8% stable
tests/benchmarks/test_scan_benchmarks.py::test_validate_file_type_pytorch_zip state_dict.pt 1.5 MiB 1 50.6us 51.4us +1.6% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_hidden_suspicious_string_budget hidden_suspicious_string 8.0 KiB 1 574.3us 582.8us +1.5% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_stream chunked_stream 278.2 KiB 1 6.71ms 6.64ms -1.0% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_dangerous_global_payloads[stack_global] stack_global 21 B 1 67.1us 67.7us +1.0% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[long_benign_string] long_benign_string 1.0 MiB 1 1.10ms 1.09ms -0.7% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_mixed_directory mixed-corpus 1.7 MiB 54 72.03ms 71.67ms -0.5% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_safe_payloads[safe_large] safe_large 278.2 KiB 1 3.43ms 3.44ms +0.4% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_opcode_budget_tail_payload opcode_budget_tail 14 B 1 79.2us 79.5us +0.3% stable
tests/benchmarks/test_scan_benchmarks.py::test_skip_filter_plain_text_files - 4.6 KiB 256 13.44ms 13.41ms -0.2% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_raw] nested_raw 78 B 1 102.9us 102.8us -0.1% stable
tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payloads[nested_hex] nested_hex 130 B 1 108.9us 108.8us -0.1% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_directory duplicate-corpus 840.0 KiB 81 46.60ms 46.62ms +0.0% stable

@mldangelo-oai mldangelo-oai changed the title [codex] fix: route disguised nested archives in sevenzip scans fix: route disguised nested archives in sevenzip scans Apr 15, 2026
@mldangelo-oai mldangelo-oai marked this pull request as ready for review April 15, 2026 23:31
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a311bcb34c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +589 to +591
def _probe_extensionless_members(self, archive: Any, file_names: list[str]) -> dict[str, str | None]:
"""Probe disguised members while stopping each extraction at the header budget."""
return {file_name: self._member_probe_detected_format(archive, file_name) for file_name in file_names}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore single-pass probe extraction for candidate members

_probe_extensionless_members() now iterates members and calls _member_probe_detected_format() per file, and each call performs archive.extract(targets=[...]) then archive.reset(). On solid 7z archives, this restarts decompression from the beginning for each probe, turning probing into repeated work (up to 100 probes) and enabling severe scan slowdowns/timeouts on crafted inputs.

Useful? React with 👍 / 👎.

@mldangelo-oai mldangelo-oai force-pushed the mdangelo/codex/harden-sevenzip-nested-routing branch from a311bcb to de6ad24 Compare April 16, 2026 06:56
@mldangelo-oai mldangelo-oai merged commit cb2572e into main Apr 16, 2026
8 checks passed
@mldangelo-oai mldangelo-oai deleted the mdangelo/codex/harden-sevenzip-nested-routing branch April 16, 2026 06:57
@github-actions github-actions bot mentioned this pull request Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant