Skip to content

Excess Base64 data ignored after padding by default #145264

@serhiy-storchaka

Description

@serhiy-storchaka

Bug report

After adding the ignorechars parameter for the Base64 decoder (see #144001), decoding in non-strict mode is almost equivalent to decoding with ignorechars including all characters. Except for one detail -- in non-strict mode the first valid padding stops decoding. Any following data is silently ignored. This leads to issues like #137687.

This contradicts RFC 4648, section 3.3 which only allows to ignore the pad character if it is present before the end of the encoded data.

Furthermore, such specifications MAY ignore the pad
character, "=", treating it as non-alphabet data, if it is present
before the end of the encoded data. If more than the allowed number
of pad characters is found at the end of the string (e.g., a base 64
string terminated with "==="), the excess pad characters MAY also be
ignored.

b'YW==Jj' and b'YWJ=j' should be decoded to b'abc', not to b'a' or b'ab'.

So, how are we going to fix this issue? We can simply change the behavior by default -- this may be a breaking change, but it is a bugfix, it breaks incorrect behavior. We can start long process of emitting a FutureWarning, and then changing the behavior few releases later. We can add a new option to alter the behavior and start emitting a FutureWarning by default.

In 3.15+ we can simply pass the ignorechars argument to enable RFC 4648 complaining lenient behavior. The question is about the default behavior.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.13bugs and security fixes3.14bugs and security fixes3.15new features, bugs and security fixestype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions