Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uuid(3|5) generation does not accept names which are not utf-8 decodable #94684

Closed
marla396 opened this issue Jul 8, 2022 · 7 comments
Closed

Comments

@marla396
Copy link

marla396 commented Jul 8, 2022

From the documentation:

uuid.uuid3(namespace, name)
Generate a UUID based on the MD5 hash of a namespace identifier (which is a UUID) and a name (which is a string).

As far as I can tell from RFC4122, name being a string is not a requirement and therefore there are UUIDs which we cannot generate. Is this a bug or am I missing something from the RFC?

@yourlefthandman
Copy link
Contributor

yourlefthandman commented Jul 8, 2022

** Not a UUID expert here **
I'm not sure I understand the issue here. Since the input is a string, it can be any sequence of unicode letters. Therefore, I don't see any inputs which you are prevented from inserting.
Can you give an example of a name \ use case which you would like to give as an input and cannot?

When looking at the RFC examples-

For example, some name spaces are the domain name system, URLs, ISO Object IDs (OIDs), X.500 Distinguished Names (DNs), and reserved words in a programming language.

All examples can have a string equivilent.

@MonadChains
Copy link
Contributor

Looking at the source code of the function:

cpython/Lib/uuid.py

Lines 704 to 711 in da49128

def uuid3(namespace, name):
"""Generate a UUID from the MD5 hash of a namespace UUID and a name."""
from hashlib import md5
digest = md5(
namespace.bytes + bytes(name, "utf-8"),
usedforsecurity=False
).digest()
return UUID(bytes=digest[:16], version=3)

If you need to pass a bytes object as name you can decode it into a UTF-8 string with bytes.decode.
Maybe the function could be modified to make it accept also bytes objects as the name argument.

@marla396
Copy link
Author

marla396 commented Jul 9, 2022

Sorry for the confusion should have posted a better example. But as @MonadChains states, if name can not be represented as an UTF-8 string, we cannot create a uuid3 for that particular name.

We can with the current implementation for example create uuid3 with a namespace of arbitrary data. Seems reasonable that the same should apply to name

@yourlefthandman
Copy link
Contributor

yourlefthandman commented Jul 9, 2022

I think there may be an issue with the definitions here, since name needs to be a string (=unicode). You may mean that bytes may not represent utf-8 sequence.
In the case you are mentioning is name is a bytes sequence which cannot be decoded as utf-8? For example b'\xff'?
If so, you can decode the byte stream using a different encoding (such as latin1 - which will always work for bytes as far as I know) and that way you can ensure that you will have a relevant unicode string for the function. Does that solve your issue?

@tiran
Copy link
Member

tiran commented Jul 9, 2022

I don't see a specification of string in RFC 4122. That leads to the conclusion that the RFC treats string as a C string (bytes). IMHO it would be reasonable to support both str and bytes types for name here.

@marla396
Copy link
Author

marla396 commented Jul 9, 2022

@yourlefthandman The problem isn't that it's not solvable, one could just overwrite the uuid3 function to do what you want, it's a matter of reporting a possible improvement.

JelleZijlstra pushed a commit that referenced this issue Mar 23, 2023
RFC 4122 does not specify that name should be a string, so for completness the functions should also support a name given as a raw byte sequence.
@hauntsaninja
Copy link
Contributor

Thanks, looks like this was completed

Fidget-Spinner pushed a commit to Fidget-Spinner/cpython that referenced this issue Mar 27, 2023
…ython#94709)

RFC 4122 does not specify that name should be a string, so for completness the functions should also support a name given as a raw byte sequence.
warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
…ython#94709)

RFC 4122 does not specify that name should be a string, so for completness the functions should also support a name given as a raw byte sequence.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants