Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ip validation #108

Merged
merged 6 commits into from
Feb 2, 2018
Merged

Add ip validation #108

merged 6 commits into from
Feb 2, 2018

Conversation

remusao
Copy link
Collaborator

@remusao remusao commented Sep 10, 2017

This PR addresses #104. Changes:

  • Introduce a fast ip validation to check if a hostname is also an IP, and return consistent results.
  • Extend benchmark.js to show performance when a valid hostname is directly given as input (no URL parsing necessary) vs. when a URL is given (parsing necessary).

Here is a more detailed view of the current performance:

>>> -------------------- <<<
>>> Only valid hostnames <<<
>>> -------------------- <<<
While interpreting the results, keep in mind that each "op" reported by the benchmark is processing 16 domains
tldjs#isIp x 4,543,750 ops/sec ±0.97% (91 runs sampled)
tldjs#isValid x 528,309 ops/sec ±0.48% (95 runs sampled)
tldjs#extractHostname x 485,646 ops/sec ±1.42% (93 runs sampled)
tldjs#tldExists x 125,752 ops/sec ±0.81% (90 runs sampled)
tldjs#getPublicSuffix x 65,652 ops/sec ±1.25% (90 runs sampled)
tldjs#getDomain x 62,128 ops/sec ±0.56% (95 runs sampled)
tldjs#getSubdomain x 58,663 ops/sec ±0.90% (93 runs sampled)
tldjs#parse x 49,101 ops/sec ±0.96% (92 runs sampled)

>>> ----------- <<<
>>> Random URLs <<<
>>> ----------- <<<
While interpreting the results, keep in mind that each "op" reported by the benchmark is processing 16 domains
tldjs#isIp x 4,270,354 ops/sec ±1.30% (87 runs sampled)
tldjs#isValid x 1,389,816 ops/sec ±0.70% (92 runs sampled)
tldjs#extractHostname x 23,037 ops/sec ±1.26% (86 runs sampled)
tldjs#tldExists x 13,854 ops/sec ±0.88% (94 runs sampled)
tldjs#getPublicSuffix x 10,905 ops/sec ±1.00% (96 runs sampled)
tldjs#getDomain x 11,018 ops/sec ±0.58% (92 runs sampled)
tldjs#getSubdomain x 11,081 ops/sec ±1.08% (96 runs sampled)
tldjs#parse x 10,665 ops/sec ±0.58% (95 runs sampled)

@remusao
Copy link
Collaborator Author

remusao commented Sep 10, 2017

If this seems reasonable to you, I will also need to update the README with the extra fields in .parse's result.

@remusao
Copy link
Collaborator Author

remusao commented Sep 10, 2017

It seems that whatwg-url needs ES6 :/

@thom4parisot
Copy link
Owner

@remusao maybe the URL parser could be configured as you did for the hostname extraction. This way the strictness cost is taken only by people who need id.

@remusao
Copy link
Collaborator Author

remusao commented Sep 10, 2017

@oncletom Yes why not! Although technically speaking, the url parsing is done inside of extractHostname, so it's already configurable, but at a high level only. What do you think should be the default implementation, if we were to propose several options?

@remusao
Copy link
Collaborator Author

remusao commented Sep 10, 2017

I will revert the change on the parser and only do the ip validation for this PR. Then we can change the parsing in another PR later since that does seem to require a bit more work than expected!

@remusao remusao changed the title Switch to whatwg-url for url parsing + add ip validation Add ip validation Sep 11, 2017
@remusao
Copy link
Collaborator Author

remusao commented Sep 11, 2017

@oncletom Done, but it seems that is-ip needs ES6 as well... So, I was thinking, do we actually need to know if a hostname is a valid ip address? The goal is just to be consistent and not try to extract a suffix on something that is not a hostname. We could then just check if the hostname is probably an ip, which would be very fast. I think it's ok if we don't expose this in the public API of the library and just use it internally. This way we also avoid introducing any extra dependency.

@thom4parisot
Copy link
Owner

Agreed, I was thinking about it earlier for #107/#106 and did not feel the need to add extra values in the response which are not really related to the purpose of tldjs.

The IP can be checked by a third party module on the hostname field.

@remusao
Copy link
Collaborator Author

remusao commented Sep 11, 2017

@oncletom Done, let me know what you think.

test/tld.js Outdated
it('should return true on valid ip addresses', function () {
expect(tld.isIp('::1')).to.be(true);
expect(tld.isIp('2001:0db8:85a3:0000:0000:8a2e:0370:7334')).to.be(true);
expect(tld.isIp('192.168.0.1')).to.be(true);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make a difference if there is a port in the IP address?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. isIp will just check that some properties of the hostname that usually indicates an ip address are met. The port would be stripped during the parsing of the URL itself, before we check if the input is an ip so that should not be an issue in practice.

@remusao
Copy link
Collaborator Author

remusao commented Sep 19, 2017

@oncletom Is there any other change needed for this PR?

@remusao
Copy link
Collaborator Author

remusao commented Oct 26, 2017

@oncletom Sorry to bother, I updated the README + added a few comments. Let me know if you think it could be merged, or if it requires any change.

@thom4parisot thom4parisot merged commit ac2768d into thom4parisot:master Feb 2, 2018
@remusao remusao mentioned this pull request Mar 16, 2018
@remusao remusao deleted the handle-ip branch January 10, 2019 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants