Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation regex doesn't validate against RFC #172

Closed
Garbee opened this issue Jun 21, 2017 · 12 comments
Closed

Validation regex doesn't validate against RFC #172

Garbee opened this issue Jun 21, 2017 · 12 comments
Labels
contact after release Respond to issue after functionality is released. enhancement
Milestone

Comments

@Garbee
Copy link

Garbee commented Jun 21, 2017

The validation regex was changed in commit 77c9c15 to let invalid UUIDs through. I'm not seeing in the RFC where this is allowed.

The version number is in the most significant 4 bits of the time stamp (bits 4 through 7 of the time_hi_and_version field).

From section 4.1.3 which also defines only [1-5] as valid versions. Therefore, the library currently isn't validating UUIDs properly.

Attempting to make the proper reversal on the regex, yields a handful of broken tests. For example tests/Codec/GuidStringCodecTest.php uses the following array for the fields:

[
    'time_low' => '12345678',
    'time_mid' => '1234',
    'time_hi_and_version' => 'abcd',
    'clock_seq_hi_and_reserved' => 'ab',
    'clock_seq_low' => 'ef',
    'node' => '1234abcd4321'
]

The time_hi_and_version here is peculiar, as it doesn't have a valid version. Of course the tests here will fail. However, I'm not sure why the regex was changed originally.

What was the intended action by making the validation less strict to allow non-RFC valid UUIDs?

We should at least have the original regex as a constant if the current isn't going to change. Something like RFC_VALID_PATTERN. However, if the package is for UUIDs it should work by default strictly against the RFC first with opt-in mechanisms to looser restrictions.

@ramsey
Copy link
Owner

ramsey commented Jun 21, 2017

The regex was changed to allow any GUID/UUID to be validated. RFC 4122 is a specific variant of UUID. Others may exist, so the regex was relaxed to allow others to be parsed. In these cases, the getVariant() and getVersion() methods may be used to determine that it’s not an RFC 4122 UUID. Maybe a good idea would be for the library to add a validator that fails if the string is not a valid RFC 4122 UUID.

@ramsey
Copy link
Owner

ramsey commented Jun 21, 2017

Sorry. Your last paragraph suggests what I said. :-)

I’ll add this to the list for version 4.

@ramsey ramsey added this to the Version 4.0.0 milestone Jun 21, 2017
@Garbee
Copy link
Author

Garbee commented Jun 21, 2017

Cool thank you! I was just surprised that it wasn't strictly validating how I expected. I understand others may exist, but IMO (which ofc when talking about strictness is always debateable) strict should be default with a loose opt-in. Which currently there is no option to even opt-in to strictness.

@ramsey
Copy link
Owner

ramsey commented Jun 21, 2017

I'll be introducing some BC breaks in version 4, so that’ll be a good time to introduce strictness here. I’ll just need to come up with an upgrade path that I can use to deprecate existing functionality in an upcoming 3.x release.

@Garbee
Copy link
Author

Garbee commented Jun 21, 2017

I don't think the current method really needs to be deprecated outright. There may be some very valid use-case for it that people need. V4 would look like this (with regards to this issue) in my mind:

  • Current constant is changed to validate the version number as well.
  • New constant is added for a "Non-strict" validation
  • Something is put into place (possibly a static boolean property) to toggle strict validation. Defaulting to true.
  • The previous something is checked against in the isValid method to pick which constant to use for the regex check.

That way when you upgrade if you don't need the loose checking, v4 "just works" for you. While if you relied upon the loose checking for some reason, it's easy to turn on in your application.

@jmauerhan
Copy link
Contributor

Hi @Garbee! I noticed this because I wrote the test you referenced that broke when you tried to make the changes. As far as I can tell "12345678-1234-abcd-abef-1234abcd4321" is a valid UUID, it just also happens to be easy to type and remember. I did a lot of research when I worked on those but (a) it's been a while and (b) I would never claim I understood everything I read :-P

However, after reviewing it again it sounds like basically since the time_hi_and_version should contain the version number, it should always have at least one number in it? (the first?) What will happen if they come up with 10 versions?

Stupid question - can someone explain to me what "most significant" means here?

The version number is in the most significant 4 bits of the time stamp (bits 4 through 7 of the time_hi_and_version field).

@Garbee
Copy link
Author

Garbee commented Jun 25, 2017

it should always have at least one number in it? (the first?)

Yes and Yes.

What will happen if they come up with 10 versions?

Then I'd assume they'd figure that out then. It's hardly an issue though considering how many UUID possibilities there are. It would be a long time before any more official UUID version are necessary.

@Garbee
Copy link
Author

Garbee commented Jun 25, 2017

On the MSB part I believe this is what it is meaning:

uuid-bit-diagram

With this diagram in mind the spec says the MSB of the time_hi_and_version field. Most Significant Bit is simply saying that the data will "arrive" first. This would mean it is placed first in the bytes for that section of the UUID. In this case, since it is the version number it essentially saying the diagram would be version_and_time_hi if it were to be written as things are ordered.

Knowing this it also helps answer the "what about more than 10 versions" question. As 4 bits has 16 total possible values, simple hex. Which is where a can technically fit the bits needed since that is equal to what would be version 10. However, 10 doesn't exist so... It's incorrect by the current RFC. But if needed they could create different UUID schema's all the way from 6-17 if they needed various forms. Since it would be a simple hexadecimal view of the data at that point. So while a currently does fit the bit requirements, it is not a technically valid UUID version (10) by any RFC specification. In the future 6-17 (so 6-9, a-f) could be valid versions. However, those are capable of being created today by the loose validation, which means in the future if those versions get specified, anything made with the current system could be invalid and cause breakage in the future.

@ramsey
Copy link
Owner

ramsey commented Jun 25, 2017

The new RFC 4122 validation should also account for the variant bits, as well as the version we've been discussing. According to the RFC,

Set the two most significant bits (bits 6 and 7) of the clock_seq_hi_and_reserved to zero and one, respectively.

This will set the variant properly in the UUID (see section 4.1.1).

For validation purposes, in hexadecimal format, this position will always be 8, 9, a, or b (I think). If it’s not one of these, then it’s not an RFC 4122 UUID.

More reference: https://speakerdeck.com/ramsey/identify-all-the-things-with-uuids-true-north-php-2016?slide=20

@jmauerhan
Copy link
Contributor

jmauerhan commented Jun 26, 2017

@Garbee Thanks for the very detailed explanation!

@ramsey do you want me to update the tests with these rules for the version and variant? Or this will only apply to the 4.0 updates?

@ramsey
Copy link
Owner

ramsey commented Jun 26, 2017

@jmauerhan We don't have any code for the validation rules, yet, but if you want to change the existing tests to account for it in the future, that'd be fine.

@ramsey ramsey added the contact after release Respond to issue after functionality is released. label Jan 12, 2020
@ramsey
Copy link
Owner

ramsey commented Jan 22, 2020

Please see version 4.0.0-alpha1. This version includes a validator for RFC 4122 variant UUIDs. It is not the default validator, but you may swap out default validator used by the factory with this one, if you like.

$validator = new \Ramsey\Uuid\Rfc4122\Validator();

if ($validator->validate('8ed543d7-a6ad-4746-8c11-7b27437c0d38')) {
    echo 'The UUID is valid!';
}

@ramsey ramsey closed this as completed Jan 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contact after release Respond to issue after functionality is released. enhancement
Projects
None yet
Development

No branches or pull requests

3 participants