-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add URI Validation #457
Add URI Validation #457
Conversation
… validating some URIs. Add the error text for URI validation.
My preference would be for a single, well documented regex we can just copy into joi, over adding another dependency. isemail is an exception given the complexity of that workflow. |
Could you point me in the direction of what you are looking for in terms of documentation for a regular expression? The reason why I made it into its own module is because I felt it was more easily manageable to break it up in the module based on how the RFC defines a URI since the end result is a > 2100 character Regular Expression. |
If you can condense it more so that it looks like the ABNF block, it would help readability. I generally just use references to the RFC instead of big blocks of test. You just need to play with it and see if you can come up with something more compact. It's just very long right now. |
Gotcha. I can do that pretty easily, as I already have it in ABNF just spread out. Yeah, I understand it is long the specification itself is quite specific (as it should be) and accounting for that specificity can get pretty hairy. Even though it is long it is still quite fast, here are some benchmark numbers that I have on it. > Benchmarking complex-ipv4.js
Testing URI "http://asdf:qw%20er@127.0.0.1:8000?asdf=12345&asda=fc%2F#bacon"
isUri#test(uri) x 1,916,525 ops/sec ±0.22% (196 runs sampled)
> Benchmarking complex-ipv6.js
Testing URI "http://asdf:qw%20er@[FEDC:BA98:7654:3210:FEDC:BA98:7654:3210]:8000?asdf=12345&asda=fc%2F#bacon"
isUri#test(uri) x 1,367,745 ops/sec ±0.18% (196 runs sampled)
> Benchmarking complex-ipvFuture.js
Testing URI "http://asdf:qw%20er@[v1.09azAZ-._~!$&'()*+,;=:]:8000?asdf=12345&asda=fc%2F#bacon"
isUri#test(uri) x 1,485,227 ops/sec ±0.19% (195 runs sampled)
> Benchmarking complex-uri.js
Testing URI "http://asdf:qw%20er@localhost:8000?asdf=12345&asda=fc%2F#bacon"
isUri#test(uri) x 2,100,513 ops/sec ±0.18% (197 runs sampled)
> Benchmarking simple-uri.js
Testing URI "mailto:John.Doe@example.com"
isUri#test(uri) x 2,314,868 ops/sec ±0.19% (195 runs sampled) |
…oved the library. Added documentation for the regular expression and a few additional tests.
@Marsup and @hueniverse It took me a little bit to get to this, but I pulled in the regular expression and added some documentation to describe each of the blocks and what they do. The most complex piece of the Regex is the part dealing with IP addresses as the formats that are valid are more complex. |
Didn't @hueniverse have in mind a variable decomposition of that ? |
It seems like I didn't understand correctly. If you take a look at the library I already put together I think it documents the pieces as you are describing it. Take the host for example, it is described in the same style as the ABNF: /**
* Host (http://tools.ietf.org/html/rfc3986#page-18)
*
* host = IP-literal / IPv4address / reg-name
*
* @type {string}
*/
var host = '(' + IPLiteral + or + IPv4address + or + regName + ')'; Is that more what you are looking for just where all of that is pulled into |
@hueniverse can you check if that's what you had in mind ? |
Almost. The regex construction needs to move out into an internals value (no need to construct that every time). I also think that 99% of use cases care about a specific scheme or schemes like http and https. You usually don't want a mailto or ftp links in a form asking for a home page. So I think there should be an optional argument taking a single or list of schemes. |
V8 usually does regex caching but yeah that makes sense. I agree on the scheme part as well. |
Thanks @hueniverse and @Marsup I'll pull in the checks with the accompanying ABNF descriptions. Providing the ability to specify schemes makes a lot of sense, I'll go ahead and add that as well. It will just mean that when someone specifies a scheme a new Regex will be compiled since it's being created dynamically. If you guys know something I don't about that just let me know :) |
…e more to the ABNF order instead of prematurely grouping. Reduced number of capture groups.
Don't know about performance but splitting the uri string on |
Thanks @hueniverse I'll do a benchmark test on it. During initial development Regex proved to be faster, but that was tested with a less complex Regex. I definitely want to go through and re-evaluate a little to see if I can simplify/gain processing velocity. |
This would be really useful. I'm currently trying to validate URL's sent to me in query parameters. |
@@ -295,5 +316,181 @@ internals.String.prototype.trim = function () { | |||
return obj; | |||
}; | |||
|
|||
function createUriRegex(optionalScheme) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please attach this to internals as well. Also we put a space after function names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
return null; | ||
} | ||
|
||
return Errors.create('string.uri', null, state, options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be worth having 2 messages depending on whether we have schemes so that the error is more explicit on the expectations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I added one which includes the scheme that was passed. So providing http
and https?
would yield something similar to: must be a valid uri with a scheme matching the http|https? pattern
…dd the create Uri Regex method to internals (and call it from there). Fix spacing with Function and parameters (add space)
…String (which will be automatically escaped), or an Array of String/RegExp which work the same way as schemes.
…tional information.
Conflicts: README.md
Went ahead and caught this up to master. I think I just have the documentation left on it. Other than that I really want to see Eran's comment on benchmarks. I did them during the original development and saw more performance from RegExp than from String parsing/splitting but I'm interested in seeing where they land now with the additional complexity in the RegExp. |
I got the documentation setup and now I'm moving on to the benchmarking. I put together a simple project which runs the benchmarks on Travis for me with Node 0.10 (latest) and Node 0.11 (latest) through a few URI scenarios. I'm going to go through and add some different ways that I can think of parsing this and see how they benchmark. |
I went through a few iterations and was able to get a little more than double the performance by switching over the regular expression groups to non-capturing groups. Here's an example of the difference (the first one uses capturing the groups, the second does not): Testing URI "http://mail.google.com/"
createUriRegex()#test(uri) x 1,534,151 ops/sec ±0.73% (96 runs sampled)
createUriRegexImproved()#test(uri) x 3,754,082 ops/sec ±0.99% (92 runs sampled)
createUriRegex('https?')#test(uri) x 1,710,824 ops/sec ±0.41% (93 runs sampled)
createUriRegexImproved('https?')#test(uri) x 3,424,541 ops/sec ±1.20% (95 runs sampled)
Testing URI "http://asdf:qw%20er@mail.google.com"
createUriRegex()#test(uri) x 1,672,742 ops/sec ±0.48% (96 runs sampled)
createUriRegexImproved()#test(uri) x 4,623,601 ops/sec ±1.15% (94 runs sampled)
createUriRegex('https?')#test(uri) x 1,928,301 ops/sec ±0.64% (96 runs sampled)
createUriRegexImproved('https?')#test(uri) x 4,949,729 ops/sec ±0.62% (90 runs sampled)
Testing URI "http://asdf:qw%20er@mail.google.com:8000"
createUriRegex()#test(uri) x 1,634,704 ops/sec ±0.64% (98 runs sampled)
createUriRegexImproved()#test(uri) x 4,438,129 ops/sec ±0.69% (93 runs sampled)
createUriRegex('https?')#test(uri) x 1,870,456 ops/sec ±0.51% (96 runs sampled)
createUriRegexImproved('https?')#test(uri) x 4,504,974 ops/sec ±1.40% (92 runs sampled) |
if (uriOptions.scheme) { | ||
Hoek.assert(uriOptions.scheme instanceof RegExp || typeof uriOptions.scheme === 'string' || Array.isArray(uriOptions.scheme), 'scheme must be a RegExp, String, or Array'); | ||
// If someone wants to match HTTP or HTTPS for example then we need to support both RegExp and String so we don't escape their pattern unknowingly. | ||
if (uriOptions.scheme instanceof RegExp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put both of these cases into an array and have a single parameter pre-processing path ? Trying to keep it DRY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. this changed the error messages a little bit, so now they will mention "at position 0" when only a single String is passed, but I think that's okay.
Didn't think of non-capturing groups, indeed makes sense, good job ! |
…always treated as an array and cast to an array if it is not currently.
All your feedback should be implemented now. I also added some Positive and Negative tests that I pulled from Node's URL parser tests to try to find anything not working like it should. But things seem good. |
I've fixed a few things and squashed it, I'll release soon, thanks a lot ! |
Good work guys! Really looking forward to joi v6 😄 |
This thread has been automatically locked due to inactivity. Please open a new issue for related bugs or questions following the new issue template instructions. |
Hi there! Just started using this library and really enjoying it. Came up with a use case where I need to validate some URIs and noticed there were a few Pulls that came up but they were missing functionality. I went ahead and created a new library isUri which implements the full RFC 3986 in a build-up to a regular expression which it then uses to validate the URIs.
I added accompanying tests (similar to how email is tested currently) and also the error message as needed. I included all of the URLs that I've gathered thus far for testing my library as part of the tests, if there are any additional I'd love to add them to both implementations.