-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base64 throws 'incorrect padding' exception when the issue is NOT with the padding #77951
Comments
All base64 decoding methods fail to decode a valid base64 string, throwing 'incorrect padding' regardless of the string padding. Here's an example: >>> base64.urlsafe_b64decode('AQAAQDhAAMAAQAAAAAAAAthAAAAJDczODFmZDM2LTNiOTYtNDVmYS04MjQ2LWRkYzJkMmViYjQ2YQ===')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/export/apps/python/3.6/lib/python3.6/base64.py", line 133, in urlsafe_b64decode
return b64decode(s)
File "/export/apps/python/3.6/lib/python3.6/base64.py", line 87, in b64decode
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding The same string gets decoded without any issues using Perl's MIME::Base64 module or Java. So far Python's base64 module is the only one that fails to decode it. |
This doesn't look like a valid base64 string to me: the padding (if present) at the end of the string should be either "=" or "==", never "===". Is the length of the decoded string equal to 58? If so, what's the last character of that decoded string? Whatever it is, it should end up being encoded as "xx==" (for some values of "x"), not as "Q===". |
It doesn’t matter how many “=“ characters are appended - it’s always throwing the same exception :( |
Apologies: apparently this particular string was given with one character missing in the beginning - should be "AAQAAQDhAAMAAQAAAAAAAAthAAAAJDczODFmZDM2LTNiOTYtNDVmYS04MjQ2LWRkYzJkMmViYjQ2YQ". In this case, the problem is the exception itself: it's not an "incorrect padding" issue. |
I always assumed that any string composed of valid base64 characters could be decoded to *something* if you added some padding characters, but apparently that was an invalid assumption. I agree that the message is incorrect for this case. |
Ideally there needs to be an option to ignore non-fatal errors. Indeed, pretty much any string should be base64-decodeable. Take a look at Perl's MIME::Base64 - it can decode pretty much any random string (so long as it only contains valid base64 characters). |
A base64-encoded string's length can only have a remainder of 0, 2 or 3, but never 1. This is why the padding you get when encoding can only be '=' or '==' but never '==='. |
Oops: when I wrote "length can only have a remainder of ..." I meant remainder upon division by 4. |
I'm not sure that "Invalid length" is better than "Invalid padding". Doesn't it mean the same? |
They're not the same. When the encoded string's length modulu 4 is 2 or 3, there just need to be (at least) 2 or 1 padding characters ('=') for decoding to be successful, due to our decoder being rather strict. Less strict decoders may ignore the missing padding and successfully decode the encoded string. When the remainder is 0, no padding is needed and everything is fine. When the remainder is 1, the encoded string is simply invalid. It is not a padding issue. There is no valid way to decode the encoded string. |
I think something like “invalid length” message does make more sense in this case than the misleading “incorrect padding”. It would also be great if there was a way to force it to try its best at decoding the string. :) |
I agree, but that would be a separate enhancement PR. The email library would use that capability if it existed...currently it adds padding in a loop trying to do the decode if it gets the invalid padding message. Which of course is going to fail for this case. |
Would an exception message as following be acceptable? "Invalid base64-encoded string: length cannot be 1 more than a multiple of 4" |
@taleinat - yes, that does look much better! |
Is the new message a clarification, in which case we would typically not backport, or a correction, which we usually would? |
It is more a correction than a clarification. After looking through the module some more, I'm considering using binascii.Incomplete for this case, since it seems appropriate enough that it's worth using it rather than adding another, separate exception. |
I think that would move it even more into the realm of a bugfix, then, since code that cared about specific binascii exceptions could be looking for that one already. |
Code using only a2b_base64() would likely not be expecting this exception. We have at least one such example in the stdlib: decode_b() in Lib/email/_encoded_words.py (which needs to be fixed regardless; see bpo-27397). |
PR is ready with an improved exception message and raising binascii.Incomplete rather than binascii.Error. A final review would be welcome! |
I backported Tal's fix for 3.7.0rc1. I am less certain about backporting to 3.6 and 2.7 at this stage of their lives but I don't have a strong feeling about it so I'll leave the issue open for that. |
The change is not entirely backward-compatible, so not back-porting before 3.7 seems good to me. IMO this should be closed. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: