fix: route disguised nested archives in sevenzip scans#1017
fix: route disguised nested archives in sevenzip scans#1017mldangelo-oai merged 2 commits intomainfrom
Conversation
Performance BenchmarksCompared
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a311bcb34c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def _probe_extensionless_members(self, archive: Any, file_names: list[str]) -> dict[str, str | None]: | ||
| """Probe disguised members while stopping each extraction at the header budget.""" | ||
| return {file_name: self._member_probe_detected_format(archive, file_name) for file_name in file_names} |
There was a problem hiding this comment.
Restore single-pass probe extraction for candidate members
_probe_extensionless_members() now iterates members and calls _member_probe_detected_format() per file, and each call performs archive.extract(targets=[...]) then archive.reset(). On solid 7z archives, this restarts decompression from the beginning for each probe, turning probing into repeated work (up to 100 probes) and enabling severe scan slowdowns/timeouts on crafted inputs.
Useful? React with 👍 / 👎.
a311bcb to
de6ad24
Compare
Summary
Teach the 7z scanner to probe likely disguised members for bounded, directly routable header formats instead of checking only for nested 7z magic. This lets nested ZIP and TAR payloads without trustworthy suffixes become scannable, and keeps oversized disguised members in the explicit inconclusive path.
Security impact
Before this change, the 7z scanner only treated probed members as nested content when the first bytes matched 7z magic. A hidden ZIP or TAR inside a
.7zfile with no extension or a misleading suffix could be skipped entirely, which meant the nested scanner dispatch never ran. The new probe keeps the same budgeted workflow, but it recognizes directly routable header formats from a bounded prefix, including TAR headers at the standard offset.Validation
uv run ruff format modelaudit/scanners/sevenzip_scanner.py tests/scanners/test_sevenzip_scanner.pyPROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_sevenzip_scanner.py -quv run mypy modelaudit/scanners/sevenzip_scanner.py tests/scanners/test_sevenzip_scanner.py