-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-94684 uuid3/5 support name argument as bytes #94709
gh-94684 uuid3/5 support name argument as bytes #94709
Conversation
RFC 4122 does not specify that name should be a string, so for completness the functions should also support a name given as a raw byte sequence.
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
To amend the discussion from #94684, RFC 4122 does explicitly state in section 4.3 that "The concept of name and name space should be broadly construed, and not limited to textual names." (emphasis added) So this makes the current implementation non-compliant. The documentation also explicitly states that the X.500 DN name space may be used with DER-encoded names, which are also byte sequences rather than strings, making the implementation inconsistent with the documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs a new test.
Doc/library/uuid.rst
Outdated
@@ -186,7 +186,7 @@ The :mod:`uuid` module defines the following functions: | |||
.. function:: uuid3(namespace, name) | |||
|
|||
Generate a UUID based on the MD5 hash of a namespace identifier (which is a | |||
UUID) and a name (which is a string). | |||
UUID) and a name (which is a byte sequence or a UTF-8-encoded string). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UUID) and a name (which is a byte sequence or a UTF-8-encoded string). | |
UUID) and a name (which is a byte sequence or a string). |
Strings aren't encoded in any encoding, they are sequences of characters.
Also, I'd prefer to be precise and say "bytes object" (linking to :class:bytes
) instead of the vague term "byte sequence".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to be explicit that the string would then be UTF-8 encoded, if provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed "bytes sequence" to "bytes object" (with link). For the second part, I removed the word "encoded" and left "UTF-8", is it a valid compromise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"UTF-8 string" is still not a meaningful statement. Strings in Python aren't UTF-8 or ASCII or any other specific encoding; they are a sequence of abstract characters. We can say something like "a string which will be encoded using UTF-8".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explaination. I've updated the lines with your suggestion.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
I have made the requested changes; please review again |
Thanks for making the requested changes! @JelleZijlstra: please review the changes made to this pull request. |
Misc/NEWS.d/next/Library/2022-07-09-13-07-30.gh-issue-94684.nV5yno.rst
Outdated
Show resolved
Hide resolved
Is there anything we can do to move this forward? Release 3.11 has happened and 3.12 is about to happen since the test and documentation changes were completed, and it looks like it hasn't been pulled in yet. This bug is a regression from Python 2. |
This PR python/cpython#94709 allowed `bytes` to be passed as the second argument on 3.12+
…ython#94709) RFC 4122 does not specify that name should be a string, so for completness the functions should also support a name given as a raw byte sequence.
…ython#94709) RFC 4122 does not specify that name should be a string, so for completness the functions should also support a name given as a raw byte sequence.
As discussed in #94684, RFC 4122 does not specify that the name argument in uuid3/uui5 functions should be a string, so for completness the functions should also support a name given as a raw byte sequence.