Skip to content

Streaming filter decompression #3429

@jakiki6

Description

@jakiki6

Explanation

I've recently played around with stacked filters for streams and I've realized that you can build enormous zip bombs by nesting FlateDecode filters like in this file: bomb.pdf

The file isn't actually a valid PDF file and most PDF parsers I tested recognize that without decompressing the whole content first. pypdf however tries to decompress the whole stream before parsing it. This normally works but the file I provided unpacks to over 1PB of zero bytes.

A fix for this would be to stream decompression as you can process decompressed data from zlib before you've finished decompressing the whole thing.

I'm pretty sure that this would require a significant amount of changes to the decompression logic but this could also be seen as a security flaw as parsing a small untrusted PDF file could lead to a DOS.

I'm not really sure what policy you have for these kinds of issues but I wanted to report it in case someone might want to fix it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions