fix: reject compressed wrapper raw trailers#905
Conversation
Use bounded low-level bzip2 and xz decompression so valid first members cannot hide raw unscanned trailing bytes. Preserve concatenated valid-member support and add fail-closed regressions.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Performance BenchmarksCompared
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b4a44225d0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Cap bzip2/xz per-call decompression max_length to a chunk-sized probe so a single compressed chunk cannot allocate up to the full remaining scan budget before limit checks run.
Use the same chunk-sized decompression probe for zlib streams so per-call output is bounded before size and ratio limit checks run.
* fix: harden lz4 wrapper decoding Use the lz4 frame decompressor API through the bounded concatenated-stream loop so raw trailers fail closed and per-call output probes stay chunk-sized. Add fake-frame regressions for optional lz4 coverage. * fix: support legacy lz4 chunk decoding Fall back to lz4.frame decompression contexts when LZ4FrameDecompressor is unavailable while keeping bounded output probes and raw-trailer rejection. Add fallback-specific lz4 regressions for bounded reads, raw trailers, and concatenated frames.
ec6c194
into
mdangelo/codex/archive-container-routing-audit
* fix: route misnamed compressed wrappers by header Detect gzip, bzip2, xz, lz4, and zlib wrappers from magic bytes even when the filename uses a misleading extension. Preserve existing joblib and R serialized scanner precedence, and add positive/negative routing regressions. * fix: reject compressed wrapper raw trailers (#905) * fix: reject compressed wrapper raw trailers Use bounded low-level bzip2 and xz decompression so valid first members cannot hide raw unscanned trailing bytes. Preserve concatenated valid-member support and add fail-closed regressions. * fix: bound compressed decompression probes Cap bzip2/xz per-call decompression max_length to a chunk-sized probe so a single compressed chunk cannot allocate up to the full remaining scan budget before limit checks run. * fix: bound zlib decompression probes Use the same chunk-sized decompression probe for zlib streams so per-call output is bounded before size and ratio limit checks run. * fix: harden lz4 wrapper decoding (#906) * fix: harden lz4 wrapper decoding Use the lz4 frame decompressor API through the bounded concatenated-stream loop so raw trailers fail closed and per-call output probes stay chunk-sized. Add fake-frame regressions for optional lz4 coverage. * fix: support legacy lz4 chunk decoding Fall back to lz4.frame decompression contexts when LZ4FrameDecompressor is unavailable while keeping bounded output probes and raw-trailer rejection. Add fallback-specific lz4 regressions for bounded reads, raw trailers, and concatenated frames.
* fix: narrow legacy pickle stdlib policy Remove broad CRITICAL treatment for noisy stdlib modules in the root pickle scanner while preserving exact dangerous helper coverage and warning-level suspicious refs. * fix: preserve exact stdlib helper detections Keep logging configuration loaders and uuid subprocess getnode helpers as exact dangerous callables in both root and standalone pickle policies. * fix: flag uuid getnode pickle calls Keep uuid.getnode in the exact dangerous-call policy after narrowing broad uuid module severity, since it can dispatch to platform helper probes that invoke subprocess-backed paths. Cover both root PickleScanner and standalone picklescan policy regressions. * fix: route misnamed compressed wrappers by header (#904) * fix: route misnamed compressed wrappers by header Detect gzip, bzip2, xz, lz4, and zlib wrappers from magic bytes even when the filename uses a misleading extension. Preserve existing joblib and R serialized scanner precedence, and add positive/negative routing regressions. * fix: reject compressed wrapper raw trailers (#905) * fix: reject compressed wrapper raw trailers Use bounded low-level bzip2 and xz decompression so valid first members cannot hide raw unscanned trailing bytes. Preserve concatenated valid-member support and add fail-closed regressions. * fix: bound compressed decompression probes Cap bzip2/xz per-call decompression max_length to a chunk-sized probe so a single compressed chunk cannot allocate up to the full remaining scan budget before limit checks run. * fix: bound zlib decompression probes Use the same chunk-sized decompression probe for zlib streams so per-call output is bounded before size and ratio limit checks run. * fix: harden lz4 wrapper decoding (#906) * fix: harden lz4 wrapper decoding Use the lz4 frame decompressor API through the bounded concatenated-stream loop so raw trailers fail closed and per-call output probes stay chunk-sized. Add fake-frame regressions for optional lz4 coverage. * fix: support legacy lz4 chunk decoding Fall back to lz4.frame decompression contexts when LZ4FrameDecompressor is unavailable while keeping bounded output probes and raw-trailer rejection. Add fallback-specific lz4 regressions for bounded reads, raw trailers, and concatenated frames.
* fix: add standalone-primary pickle migration mode * fix: isolate standalone pickle primary fallback state * fix: narrow legacy pickle stdlib policy (#903) * fix: narrow legacy pickle stdlib policy Remove broad CRITICAL treatment for noisy stdlib modules in the root pickle scanner while preserving exact dangerous helper coverage and warning-level suspicious refs. * fix: preserve exact stdlib helper detections Keep logging configuration loaders and uuid subprocess getnode helpers as exact dangerous callables in both root and standalone pickle policies. * fix: flag uuid getnode pickle calls Keep uuid.getnode in the exact dangerous-call policy after narrowing broad uuid module severity, since it can dispatch to platform helper probes that invoke subprocess-backed paths. Cover both root PickleScanner and standalone picklescan policy regressions. * fix: route misnamed compressed wrappers by header (#904) * fix: route misnamed compressed wrappers by header Detect gzip, bzip2, xz, lz4, and zlib wrappers from magic bytes even when the filename uses a misleading extension. Preserve existing joblib and R serialized scanner precedence, and add positive/negative routing regressions. * fix: reject compressed wrapper raw trailers (#905) * fix: reject compressed wrapper raw trailers Use bounded low-level bzip2 and xz decompression so valid first members cannot hide raw unscanned trailing bytes. Preserve concatenated valid-member support and add fail-closed regressions. * fix: bound compressed decompression probes Cap bzip2/xz per-call decompression max_length to a chunk-sized probe so a single compressed chunk cannot allocate up to the full remaining scan budget before limit checks run. * fix: bound zlib decompression probes Use the same chunk-sized decompression probe for zlib streams so per-call output is bounded before size and ratio limit checks run. * fix: harden lz4 wrapper decoding (#906) * fix: harden lz4 wrapper decoding Use the lz4 frame decompressor API through the bounded concatenated-stream loop so raw trailers fail closed and per-call output probes stay chunk-sized. Add fake-frame regressions for optional lz4 coverage. * fix: support legacy lz4 chunk decoding Fall back to lz4.frame decompression contexts when LZ4FrameDecompressor is unavailable while keeping bounded output probes and raw-trailer rejection. Add fallback-specific lz4 regressions for bounded reads, raw trailers, and concatenated frames.
* fix: harden standalone pickle scanner * fix: fail closed on truncated pickle literal scans * fix: fail closed on oversized encoded pickle literals * fix: deep-freeze standalone pickle reports * fix: mark incomplete nested pickle scans inconclusive * fix: narrow standalone pickle wildcard globals * fix: preserve deep nested pickle findings * fix: bound nested pickle recursion resources * fix: add standalone-primary pickle migration mode (#902) * fix: add standalone-primary pickle migration mode * fix: isolate standalone pickle primary fallback state * fix: narrow legacy pickle stdlib policy (#903) * fix: narrow legacy pickle stdlib policy Remove broad CRITICAL treatment for noisy stdlib modules in the root pickle scanner while preserving exact dangerous helper coverage and warning-level suspicious refs. * fix: preserve exact stdlib helper detections Keep logging configuration loaders and uuid subprocess getnode helpers as exact dangerous callables in both root and standalone pickle policies. * fix: flag uuid getnode pickle calls Keep uuid.getnode in the exact dangerous-call policy after narrowing broad uuid module severity, since it can dispatch to platform helper probes that invoke subprocess-backed paths. Cover both root PickleScanner and standalone picklescan policy regressions. * fix: route misnamed compressed wrappers by header (#904) * fix: route misnamed compressed wrappers by header Detect gzip, bzip2, xz, lz4, and zlib wrappers from magic bytes even when the filename uses a misleading extension. Preserve existing joblib and R serialized scanner precedence, and add positive/negative routing regressions. * fix: reject compressed wrapper raw trailers (#905) * fix: reject compressed wrapper raw trailers Use bounded low-level bzip2 and xz decompression so valid first members cannot hide raw unscanned trailing bytes. Preserve concatenated valid-member support and add fail-closed regressions. * fix: bound compressed decompression probes Cap bzip2/xz per-call decompression max_length to a chunk-sized probe so a single compressed chunk cannot allocate up to the full remaining scan budget before limit checks run. * fix: bound zlib decompression probes Use the same chunk-sized decompression probe for zlib streams so per-call output is bounded before size and ratio limit checks run. * fix: harden lz4 wrapper decoding (#906) * fix: harden lz4 wrapper decoding Use the lz4 frame decompressor API through the bounded concatenated-stream loop so raw trailers fail closed and per-call output probes stay chunk-sized. Add fake-frame regressions for optional lz4 coverage. * fix: support legacy lz4 chunk decoding Fall back to lz4.frame decompression contexts when LZ4FrameDecompressor is unavailable while keeping bounded output probes and raw-trailer rejection. Add fallback-specific lz4 regressions for bounded reads, raw trailers, and concatenated frames. * docs: normalize unreleased changelog * fix: address pickle scanner review feedback * fix: surface nested pickle incomplete notices
Summary
This PR continues the picklescan/container audit stack with a fail-closed compressed-wrapper fix. Python bzip2 and xz readers can return the valid first decompressed member while silently consuming raw trailing bytes. For .pkl.bz2 and .pkl.xz wrappers, that could let a benign first pickle hide an unscanned malicious pickle trailer.
Root cause: CompressedScanner used high-level BZ2File and LZMAFile readers for standalone bzip2/xz wrappers. Those readers support convenient streaming, but they do not expose enough member-boundary state for us to distinguish valid concatenated members from raw trailing data. The new low-level loops also needed chunk-sized per-call output probes so configured decompression limits cannot still permit one oversized allocation.
Fix:
Validation
Full local result: 3376 passed, 75 skipped, 16 warnings.