Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-94684 uuid3/5 support name argument as bytes #94709

Merged

Conversation

MonadChains
Copy link
Contributor

@MonadChains MonadChains commented Jul 9, 2022

As discussed in #94684, RFC 4122 does not specify that the name argument in uuid3/uui5 functions should be a string, so for completness the functions should also support a name given as a raw byte sequence.

RFC 4122 does not specify that name should be a string, so for completness the functions should also support a name given as a raw byte sequence.
@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@thehatmakesbling
Copy link

To amend the discussion from #94684, RFC 4122 does explicitly state in section 4.3 that "The concept of name and name space should be broadly construed, and not limited to textual names." (emphasis added) So this makes the current implementation non-compliant.

The documentation also explicitly states that the X.500 DN name space may be used with DER-encoded names, which are also byte sequences rather than strings, making the implementation inconsistent with the documentation.

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs a new test.

@@ -186,7 +186,7 @@ The :mod:`uuid` module defines the following functions:
.. function:: uuid3(namespace, name)

Generate a UUID based on the MD5 hash of a namespace identifier (which is a
UUID) and a name (which is a string).
UUID) and a name (which is a byte sequence or a UTF-8-encoded string).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UUID) and a name (which is a byte sequence or a UTF-8-encoded string).
UUID) and a name (which is a byte sequence or a string).

Strings aren't encoded in any encoding, they are sequences of characters.

Also, I'd prefer to be precise and say "bytes object" (linking to :class:bytes) instead of the vague term "byte sequence".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to be explicit that the string would then be UTF-8 encoded, if provided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed "bytes sequence" to "bytes object" (with link). For the second part, I removed the word "encoded" and left "UTF-8", is it a valid compromise?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"UTF-8 string" is still not a meaningful statement. Strings in Python aren't UTF-8 or ASCII or any other specific encoding; they are a sequence of abstract characters. We can say something like "a string which will be encoded using UTF-8".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explaination. I've updated the lines with your suggestion.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@MonadChains
Copy link
Contributor Author

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@JelleZijlstra: please review the changes made to this pull request.

@thehatmakesbling
Copy link

Is there anything we can do to move this forward? Release 3.11 has happened and 3.12 is about to happen since the test and documentation changes were completed, and it looks like it hasn't been pulled in yet. This bug is a regression from Python 2.

@JelleZijlstra JelleZijlstra merged commit 413b7db into python:main Mar 23, 2023
sobolevn added a commit to python/typeshed that referenced this pull request Mar 24, 2023
This PR python/cpython#94709 allowed `bytes` to be passed as the second argument on 3.12+
Fidget-Spinner pushed a commit to Fidget-Spinner/cpython that referenced this pull request Mar 27, 2023
…ython#94709)

RFC 4122 does not specify that name should be a string, so for completness the functions should also support a name given as a raw byte sequence.
warsaw pushed a commit to warsaw/cpython that referenced this pull request Apr 11, 2023
…ython#94709)

RFC 4122 does not specify that name should be a string, so for completness the functions should also support a name given as a raw byte sequence.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants