Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL validator invalid for IPs and localhost #2130

Closed
GiantCrocodile opened this issue Sep 19, 2019 · 31 comments

Comments

@GiantCrocodile
Copy link

@GiantCrocodile GiantCrocodile commented Sep 19, 2019

Describe the bug
The URL validator is partially broken. A field with URL validation in a structure returns false for the provided examples.

Expected behavior
These two entries should be valid URLs (with and without httpS):

  1. https://127.0.0.1/kirby/panel/pages/blog+vvvv
  2. https://localhost/kirby/panel/pages/blog+vvvv
  3. URLs with ports specified

Kirby Version
3.2.5-rc.2

Console output
No errors.

@lukasbestle

This comment has been minimized.

Copy link
Contributor

@lukasbestle lukasbestle commented Sep 19, 2019

URL validation is a pretty complex issue, see the comment in our implementation. 😢

Nr. 2 is a known issue (see our unit tests).

I can reproduce Nr. 1 and 3 but I'm wondering why those two happen. We do have unit tests for similar cases that pass... 🤔

@GiantCrocodile

This comment has been minimized.

Copy link
Author

@GiantCrocodile GiantCrocodile commented Sep 19, 2019

It's all about strict vs same ;) *.

  • just kidding, I know it's not the case here. Sadly I don't know why it happens.
@afbora

This comment has been minimized.

Copy link
Contributor

@afbora afbora commented Sep 19, 2019

I wonder that what is the FILTER_VALIDATE_URL filter not able to do?

@lukasbestle

This comment has been minimized.

Copy link
Contributor

@lukasbestle lukasbestle commented Sep 20, 2019

It doesn‘t support internationalized domain names (IDN) for example. If I remember correctly, it also has various other bugs so that it is sometimes too strict and sometimes accepts strings that aren‘t valid URLs.

@bastianallgeier

This comment has been minimized.

Copy link
Contributor

@bastianallgeier bastianallgeier commented Sep 20, 2019

I wonder if we could make it less strict by doing a three-tier layered validation approach.

1.) use filter_validate_url
2.) if that fails use our regex
3.) if that fails check for localhost addresses

We do something similar in the V::email validator.

@afbora

This comment has been minimized.

Copy link
Contributor

@afbora afbora commented Sep 20, 2019

@lukasbestle

This comment has been minimized.

Copy link
Contributor

@lukasbestle lukasbestle commented Sep 20, 2019

@bastianallgeier Hm, but that would break for false-positives.

We could try the Laravel/Symfony one, maybe it‘s better than ours.

@lukasbestle

This comment has been minimized.

Copy link
Contributor

@lukasbestle lukasbestle commented Sep 20, 2019

I love it how everyone copies implementations from somewhere else. The PHP team should really fix the official filter. 😅

@nilshoerrmann

This comment has been minimized.

Copy link
Contributor

@nilshoerrmann nilshoerrmann commented Sep 20, 2019

Hm, but that would break for false-positives.

Doesn't any validator break for false-positives? That's true for the Laravel/Symfony one, too, isn't it?

@lukasbestle

This comment has been minimized.

Copy link
Contributor

@lukasbestle lukasbestle commented Sep 20, 2019

Yes, but because we know that the PHP filter is buggy, we probably shouldn‘t use that one in our validator chain.

@bastianallgeier

This comment has been minimized.

Copy link
Contributor

@bastianallgeier bastianallgeier commented Sep 23, 2019

You are right. We should skip the filter. I don't mind testing the Symfony regex.

@lukasbestle lukasbestle self-assigned this Sep 23, 2019
@lukasbestle lukasbestle added this to the 3.3.0 milestone Sep 23, 2019
@GiantCrocodile

This comment has been minimized.

Copy link
Author

@GiantCrocodile GiantCrocodile commented Sep 23, 2019

Btw any clue why case 1 and 3 aren't triggered? Is it really related to the current (weak) regex or maybe some other reason?

@lukasbestle

This comment has been minimized.

Copy link
Contributor

@lukasbestle lukasbestle commented Sep 23, 2019

Probably some other reason, otherwise the tests wouldn't pass. I will do some research when implementing a fix for this issue.

@GiantCrocodile

This comment has been minimized.

Copy link
Author

@GiantCrocodile GiantCrocodile commented Sep 23, 2019

Thank you :) that's why I asked; I didn't want to risk that the regex is fixed but the 2 cases stay unresolved. I appreciate your afford.

@bastianallgeier

This comment has been minimized.

Copy link
Contributor

@bastianallgeier bastianallgeier commented Sep 28, 2019

First result of my tests with Symfony's validator pattern: their pattern does not accept URLs with unicode characters in host names.

@lukasbestle

This comment has been minimized.

Copy link
Contributor

@lukasbestle lukasbestle commented Sep 28, 2019

I have run the Symfony RegEx against our existing unit tests with the following results:

Should validate, but fails:

http://foo.com/unicode_(✪)_in_parens
http://foo.bar/?q=Test%20URL-encoded%20stuff
http://उदाहरण.परीक्षा
http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com

Should fail, but validates:

  http://-error-.invalid/
  http://a.b--c.de/
  http://-a.b.co
  http://a.b-.co
  http://0.0.0.0
  http://10.1.1.0
  http://10.1.1.255
  http://224.1.1.1
  http://1.1.1.1.1
  http://123.123.123
* http://3628126748
  http://.www.foo.bar/
  http://www.foo.bar./
  http://.www.foo.bar./
* http://10.1.1.1
* http://10.1.1.254

(URLs marked with a * are ones where I'm not sure why they should fail as those URLs should actually be valid.)

Works (unlike the current RegEx):

http://special---offer.com/
http://localhost/test/
http://localhost:8080/test
@lukasbestle

This comment has been minimized.

Copy link
Contributor

@lukasbestle lukasbestle commented Sep 28, 2019

And the results with filter_var($value, FILTER_VALIDATE_URL) !== false for comparison:

Should validate, but fails:

http://✪df.ws/123
http://➡.ws/䨹
http://⌘.ws
http://⌘.ws/
http://foo.com/unicode_(✪)_in_parens
http://☺.damowmow.com/
http://مثال.إختبار
http://例子.测试
http://उदाहरण.परीक्षा

Should fail, but validates:

  rdar://1234
  h://test
  ftps://foo.bar/
  http://a.b--c.de/
  http://0.0.0.0
  http://10.1.1.0
  http://10.1.1.255
  http://224.1.1.1
  http://1.1.1.1.1
  http://123.123.123
* http://3628126748
  http://www.foo.bar./
* http://10.1.1.1
* http://10.1.1.254

Works (unlike the current RegEx):

http://special---offer.com/
http://localhost/test/
http://localhost:8080/test
@afbora

This comment has been minimized.

Copy link
Contributor

@afbora afbora commented Sep 28, 2019

Here Laravel's method results based on Kirby unit test urls:

Status URL Expect Result
SUCCESS http://www.getkirby.com true true
SUCCESS http://www.getkirby.com/docs/param:value/?foo=bar/#anchor true true
SUCCESS https://www.getkirby.de.vu true true
SUCCESS http://foo.com/blah_blah true true
SUCCESS http://foo.com/blah_blah/ true true
SUCCESS http://foo.com/blah_blah_(wikipedia) true true
SUCCESS http://foo.com/blah_blah_(wikipedia)_(again) true true
SUCCESS http://www.example.com/wpstyle/?p=364 true true
SUCCESS https://www.example.com/foo/?bar=baz&inga=42&quux true true
SUCCESS http://✪df.ws/123 true true
SUCCESS http://userid:password@example.com:8080 true true
SUCCESS http://userid:password@example.com:8080/ true true
SUCCESS http://userid@example.com true true
SUCCESS http://userid@example.com/ true true
SUCCESS http://userid@example.com:8080 true true
SUCCESS http://userid@example.com:8080/ true true
SUCCESS http://userid:password@example.com true true
SUCCESS http://userid:password@example.com/ true true
SUCCESS http://142.42.1.1/ true true
SUCCESS http://142.42.1.1:8080/ true true
SUCCESS http://➡.ws/䨹 true true
SUCCESS http://⌘.ws true true
SUCCESS http://⌘.ws/ true true
SUCCESS http://foo.com/blah_(wikipedia)#cite-1 true true
SUCCESS http://foo.com/blah_(wikipedia)_blah#cite-1 true true
SUCCESS http://foo.com/unicode_(✪)_in_parens true true
SUCCESS http://foo.com/(something)?after=parens true true
SUCCESS http://☺.damowmow.com/ true true
SUCCESS http://code.google.com/events/#&product=browser true true
SUCCESS http://j.mp true true
SUCCESS ftp://foo.bar/baz true true
SUCCESS http://foo.bar/?q=Test%20URL-encoded%20stuff true true
SUCCESS http://مثال.إختبار true true
SUCCESS http://例子.测试 true true
FAIL http://उदाहरण.परीक्षा true false
FAIL http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com true false
SUCCESS http://1337.net true true
SUCCESS http://a.b-c.de true true
SUCCESS http://223.255.255.254 true true
SUCCESS http://special---offer.com/ true true
SUCCESS http://localhost/test/ true true
SUCCESS http://localhost:8080/test true true
SUCCESS foo false false
SUCCESS http:// false false
SUCCESS http://. false false
SUCCESS http://.. false false
SUCCESS http://../ false false
SUCCESS http://? false false
SUCCESS http://?? false false
SUCCESS http://??/ false false
SUCCESS http://# false false
SUCCESS http://## false false
SUCCESS http://##/ false false
SUCCESS http://foo.bar?q=Spaces should be encoded false false
SUCCESS // false false
SUCCESS //a false false
SUCCESS ///a false false
SUCCESS /// false false
SUCCESS http:///a false false
SUCCESS foo.com false false
SUCCESS rdar://1234 false false
SUCCESS h://test false false
SUCCESS http:// shouldfail.com false false
SUCCESS :// should fail false false
SUCCESS http://foo.bar/foo(bar)baz quux false false
SUCCESS ftps://foo.bar/ false false
FAIL http://-error-.invalid/ false true
FAIL http://a.b--c.de/ false true
FAIL http://-a.b.co false true
FAIL http://a.b-.co false true
FAIL http://0.0.0.0 false true
FAIL http://10.1.1.0 false true
FAIL http://10.1.1.255 false true
FAIL http://224.1.1.1 false true
SUCCESS http://1.1.1.1.1 false false
SUCCESS http://123.123.123 false false
SUCCESS http://3628126748 false false
FAIL http://.www.foo.bar/ false true
FAIL http://www.foo.bar./ false true
FAIL http://.www.foo.bar./ false true
FAIL http://10.1.1.1 false true
FAIL http://10.1.1.254 false true
@afbora

This comment has been minimized.

Copy link
Contributor

@afbora afbora commented Sep 29, 2019

I tried to rewrite (added localhost) our validation myself. What do you think?

'url' => function ($value): bool {
    // In search for the perfect regular expression: https://mathiasbynens.be/demo/url-regex
    $regex = '_^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:localhost)|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?$_iu';
    return preg_match($regex, $value) !== 0;
}

Results

Status URL Expect Result
SUCCESS http://www.getkirby.com true true
SUCCESS http://www.getkirby.com/docs/param:value/?foo=bar/#anchor true true
SUCCESS https://www.getkirby.de.vu true true
SUCCESS http://foo.com/blah_blah true true
SUCCESS http://foo.com/blah_blah/ true true
SUCCESS http://foo.com/blah_blah_(wikipedia) true true
SUCCESS http://foo.com/blah_blah_(wikipedia)_(again) true true
SUCCESS http://www.example.com/wpstyle/?p=364 true true
SUCCESS https://www.example.com/foo/?bar=baz&inga=42&quux true true
SUCCESS http://✪df.ws/123 true true
SUCCESS http://userid:password@example.com:8080 true true
SUCCESS http://userid:password@example.com:8080/ true true
SUCCESS http://userid@example.com true true
SUCCESS http://userid@example.com/ true true
SUCCESS http://userid@example.com:8080 true true
SUCCESS http://userid@example.com:8080/ true true
SUCCESS http://userid:password@example.com true true
SUCCESS http://userid:password@example.com/ true true
SUCCESS http://142.42.1.1/ true true
SUCCESS http://142.42.1.1:8080/ true true
SUCCESS http://➡.ws/䨹 true true
SUCCESS http://⌘.ws true true
SUCCESS http://⌘.ws/ true true
SUCCESS http://foo.com/blah_(wikipedia)#cite-1 true true
SUCCESS http://foo.com/blah_(wikipedia)_blah#cite-1 true true
SUCCESS http://foo.com/unicode_(✪)_in_parens true true
SUCCESS http://foo.com/(something)?after=parens true true
SUCCESS http://☺.damowmow.com/ true true
SUCCESS http://code.google.com/events/#&product=browser true true
SUCCESS http://j.mp true true
SUCCESS ftp://foo.bar/baz true true
SUCCESS http://foo.bar/?q=Test%20URL-encoded%20stuff true true
SUCCESS http://مثال.إختبار true true
SUCCESS http://例子.测试 true true
SUCCESS http://उदाहरण.परीक्षा true true
SUCCESS http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com true true
SUCCESS http://1337.net true true
SUCCESS http://a.b-c.de true true
SUCCESS http://223.255.255.254 true true
FAIL http://special---offer.com/ true false
SUCCESS https://127.0.0.1/kirby/ true true
SUCCESS http://localhost/test/ true true
SUCCESS http://localhost:8080/test true true
SUCCESS https://127.0.0.1/kirby/panel/pages/blog+vvvv true true
SUCCESS foo false false
SUCCESS http:// false false
SUCCESS http://. false false
SUCCESS http://.. false false
SUCCESS http://../ false false
SUCCESS http://? false false
SUCCESS http://?? false false
SUCCESS http://??/ false false
SUCCESS http://# false false
SUCCESS http://## false false
SUCCESS http://##/ false false
SUCCESS http://foo.bar?q=Spaces should be encoded false false
SUCCESS // false false
SUCCESS //a false false
SUCCESS ///a false false
SUCCESS /// false false
SUCCESS http:///a false false
SUCCESS foo.com false false
SUCCESS rdar://1234 false false
SUCCESS h://test false false
SUCCESS http:// shouldfail.com false false
SUCCESS :// should fail false false
SUCCESS http://foo.bar/foo(bar)baz quux false false
SUCCESS ftps://foo.bar/ false false
SUCCESS http://-error-.invalid/ false false
SUCCESS http://a.b--c.de/ false false
SUCCESS http://-a.b.co false false
SUCCESS http://a.b-.co false false
SUCCESS http://0.0.0.0 false false
SUCCESS http://10.1.1.0 false false
SUCCESS http://10.1.1.255 false false
SUCCESS http://224.1.1.1 false false
SUCCESS http://1.1.1.1.1 false false
SUCCESS http://123.123.123 false false
SUCCESS http://3628126748 false false
SUCCESS http://.www.foo.bar/ false false
SUCCESS http://www.foo.bar./ false false
SUCCESS http://.www.foo.bar./ false false
SUCCESS http://10.1.1.1 false false
SUCCESS http://10.1.1.254 false false

Note: I think, http://special---offer.com/ should be fail because of hyphen restrictions

The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
the third and fourth character positions and MUST NOT start or end
with a "-" (hyphen).

Reference: https://tools.ietf.org/html/rfc5891#section-4.2.3.1

@lukasbestle

This comment has been minimized.

Copy link
Contributor

@lukasbestle lukasbestle commented Sep 29, 2019

@afbora The restriction with the double hyphens only applies to the hyphens being in the third and fourth character positions, e.g. http://sp---ecialoffer.com. So the test case should succeed.

Have you tried the two other test cases that have been originally reported by @GiantCrocodile?

@afbora

This comment has been minimized.

Copy link
Contributor

@afbora afbora commented Sep 29, 2019

@lukasbestle Hmm, this is really hard. I understand the rule is missing.

SUCCESS http://localhost/test/ true true
SUCCESS http://localhost:8080/test true true
SUCCESS https://127.0.0.1/kirby/panel/pages/blog+vvvv true true

Yes, you can see them in the above results.

Only reported by @GiantCrocodile tests

Status URL Expect Result
SUCCESS https://127.0.0.1/kirby/panel/pages/blog+vvvv true true
SUCCESS https://127.0.0.1:8080/kirby/panel/pages/blog+vvvv true true
SUCCESS https://localhost/kirby/panel/pages/blog+vvvv true true
SUCCESS https://localhost:8080/kirby/panel/pages/blog+vvvv true true
SUCCESS http://www.getkirby.com:8080/kirby/panel/pages/blog+vvvv true true
@lukasbestle

This comment has been minimized.

Copy link
Contributor

@lukasbestle lukasbestle commented Sep 29, 2019

Awesome! Could you please send a PR with the changes you made (new RegEx, new tests)? It's already better than the current state, so it's no problem that that one strange test case doesn't pass.

@GiantCrocodile

This comment has been minimized.

Copy link
Author

@GiantCrocodile GiantCrocodile commented Sep 29, 2019

How comes that tests like http://10.1.1.255 should be false? I think that HTTP protocol is fine for IPs instead of domains. Is it about a / at the end of the IP? My browser detects them as valid links too.

@lukasbestle

This comment has been minimized.

Copy link
Contributor

@lukasbestle lukasbestle commented Sep 29, 2019

@GiantCrocodile 10.1.1.255 is a broadcast address (because of the 255 value in the last position). It's a valid IP address, but cannot be part of a valid URL as it won't point to a specific computer.

lukasbestle added a commit that referenced this issue Sep 29, 2019
lukasbestle added a commit that referenced this issue Sep 29, 2019
bastianallgeier added a commit that referenced this issue Sep 30, 2019
@bastianallgeier

This comment has been minimized.

Copy link
Contributor

@bastianallgeier bastianallgeier commented Sep 30, 2019

@GiantCrocodile

This comment has been minimized.

Copy link
Author

@GiantCrocodile GiantCrocodile commented Oct 27, 2019

I have an issue with this: I'm using latest RC and when I put in an URL like http://localhost/kirby/panel/pages/blog+cdwcwd the URL field is highlighted red (invalid input) but I can save it anyway. Some visual validator is still wrong. @bastianallgeier @afbora

grafik

@afbora

This comment has been minimized.

Copy link
Contributor

@afbora afbora commented Oct 28, 2019

@GiantCrocodile I guess this is vuelidate library bug: vuelidate/vuelidate#375
And i have no idea how to figure it out.

@nilshoerrmann

This comment has been minimized.

Copy link
Contributor

@nilshoerrmann nilshoerrmann commented Oct 28, 2019

Might be related to #1514 as well.

@bastianallgeier

This comment has been minimized.

Copy link
Contributor

@bastianallgeier bastianallgeier commented Oct 28, 2019

Yep, it's related. We wanted to be smart with the instant validation, but in this case it's sending mixed messages because the JS validator is different from the PHP validator. I guess it would be best to remove the instant validation entirely.

@GiantCrocodile

This comment has been minimized.

Copy link
Author

@GiantCrocodile GiantCrocodile commented Oct 31, 2019

I guess this is something which need a bigger refactoring @bastianallgeier. Shall I create a new issue about deleting the instant validation or do you keep care of tracking this?

@afbora afbora reopened this Oct 31, 2019
@bastianallgeier

This comment has been minimized.

Copy link
Contributor

@bastianallgeier bastianallgeier commented Nov 1, 2019

@GiantCrocodile a new issue for the instant validator would be fantastic. I will close this again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.