Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't decode bytes (which may not be UTF8) when displaying SecretBytes #8012

Merged
merged 1 commit into from Nov 6, 2023

Conversation

alexmojaki
Copy link
Contributor

@alexmojaki alexmojaki commented Nov 4, 2023

Change Summary

Removes unneeded .decode().

I couldn't find any justification for decoding the bytes. The code was introduced in this commit in #5674 and I don't see any PR comments explaining it. @adriangb any idea?

There are some encodings where a non-empty byte string can decode to any empty string:

>>> ''.encode('utf16')
b'\xff\xfe'
>>> ''.encode('utf16').decode('utf16')
''
>>> ''.encode('utf16').decode('utf16') == ''
True

but .decode() without arguments should always use UTF8 in Python 3 regardless of locale settings.

I've checked that any non-empty byte string with up to 3 bytes decodes to a non-empty string if it can be decoded at all:

import itertools

count = 0
total = 0
for length in [1, 2, 3]:
    for b in itertools.product(range(256), repeat=length):
        total += 1
        b = bytes(b)
        assert b
        assert len(b) == length > 0
        try:
            s = b.decode()
        except UnicodeDecodeError:
            continue
        assert s
        assert len(s) > 0
        assert b == s.encode()
        count += 1

print(count)  # 2668544
print(total)  # 16843008

I also think that any non-empty byte string should be treated as non-empty when displaying even if it can be decoded to an empty unicode string. Although I'm also not sure about revealing whether any secret value is empty.

Related issue number

Closes #7971

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @lig

@alexmojaki alexmojaki marked this pull request as ready for review November 4, 2023 19:51
@alexmojaki
Copy link
Contributor Author

please review

@alexmojaki
Copy link
Contributor Author

I couldn't find any justification for decoding the bytes. The code was introduced in this commit in #5674 and I don't see any PR comments explaining it. @adriangb any idea?

Pinging @adriangb because I wrote the above in an edit and I don't know if that triggers a notification.

Copy link
Member

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t see any problem with this. In not sure why I did that in that commit, may have been carryover from moving files or something. I don’t see the decode() call in v1

@adriangb
Copy link
Member

adriangb commented Nov 4, 2023

Mind adding a small release note as a bug fix?

@alexmojaki
Copy link
Contributor Author

Mind adding a small release note as a bug fix?

Where would I add that? I can't see anything in https://docs.pydantic.dev/latest/contributing/ and adding to HISTORY.md doesn't look right.

@adriangb adriangb added the relnotes-fix Used for bugfixes. label Nov 4, 2023
@adriangb adriangb merged commit 833faaf into pydantic:main Nov 6, 2023
59 of 61 checks passed
@adriangb
Copy link
Member

adriangb commented Nov 6, 2023

I guess we just use a tag on the PR now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'utf-8' codec can't decode byte error when serializing SecretBytes object
3 participants