-
Notifications
You must be signed in to change notification settings - Fork 7.9k
New FILTER_VALIDATE_DOMAIN and better RFC conformance for FILTER_VALIDATE_URL #826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5c721f6
to
475238e
Compare
It's kind of hard to tell what you have changed because you moved the function. |
@datibbaw only in the last commit (IPv6 handling). It is necessary to move either |
@datibbaw do you want I put a diff of the function only in comments? |
You can add a forward declaration to keep it in place :) |
@datibbaw Isn't it too much just for using a diff UI? |
It would make it easier for somebody reviewing your code changes (i.e. yours truly); no changes are required if you don't care about that. |
I'll add the definition for reviewing and move the function before the merge if you agree with that. |
3594cde
to
6df900c
Compare
@datibbaw done. |
6df900c
to
81ac5b9
Compare
What should be done to get this PR merged? |
This applies specifically to a DNS name. A URI may contain a registered name that is not a DNS name. To quote the relevant passage of RFC 3986 sec 3.2.2 (emphasis mine):
Note "most common" and not "only". This also applies to the other DNS-specific changes here. The IPv6 change is valid, though. |
@DaveRandom according to the WHATWG URL Living Standard and RFC 2396 this is not true: an URI can have any valid host but an URL host must be a valid IP address (v4 or v6) or domain (DNS). Relevant quotes: https://url.spec.whatwg.org/#valid-domain
http://www.unicode.org/reports/tr46/tr46-12.html#ToASCII
RFC 2396 (RFC 1034 and RFC1123 specifies DNS)
Indeed, URI may contain other host types but this filter is called var_dump(filter_var("urn:ietf:rfc:2141", FILTER_VALIDATE_URL));
// bool(false) I think it make sense that |
And from the same RFC you quoted:
In my patch, DNS validity is checked only for HTTP and HTTPS schemes (it should be also applicable to FTP btw). |
@dunglas Indeed, I re-checked the PHP docs for what this filter actually does and I see it is validating specifically URLs and not generic URIs. In order to make this check valid and fill in a couple of blanks in the filter API, IMHO you should do the following things as part of this PR:
This would plug a couple of holes in the userland API - I know I've written code to work around the square bracket validation with values extracted via Thoughts? |
+1 for the new IMO |
While this is absolutely true and unarguable, all I'm suggesting is a convenience flag to make this userland code a bit tidier (slightly modified from real code, only error handling and class refs removed): $parts = parse_url($url);
if ($host = filter_var($parts['host'], FILTER_VALIDATE_IP, FILTER_FLAG_IPV4)) {
$mode = IPv4_ADDR;
} else if (($host = filter_var($parts['host'], FILTER_VALIDATE_IP, FILTER_FLAG_IPV6)) || ($parts['host'][0] == '[' && substr($parts['host'], -1) == ']' && $host = filter_var(substr($parts['host'], 1, -1), FILTER_VALIDATE_IP, FILTER_FLAG_IPV6))) {
$host = '[' . $host . ']';
$mode = IPv6_ADDR;
} else if (/* userland DNS name validation routine */) {
$mode = DNS_NAME;
} else {
// invalid
} As you can imagine, I would much rather have written something like: $parts = parse_url($url);
if ($host = filter_var($parts['host'], FILTER_VALIDATE_IP, FILTER_FLAG_IPV4)) {
$mode = IPv4_ADDR;
} else if ($host = filter_var($parts['host'], FILTER_VALIDATE_IP, FILTER_FLAG_IPV6 | FILTER_FLAG_IPV6_FROM_URL)) {
$mode = IPv6_ADDR;
} else if ($host = filter_var($parts['host'], FILTER_VALIDATE_DNS_NAME)) {
$mode = DNS_NAME;
} else {
// invalid
} All I'm suggesting is an optional, non-default flag to allow the square brackets. This code also shows where I would have found the DNS name validation useful.
Indeed. We have spoken about this before on-list, and as I mentioned at the time I have two major thoughts about this:
I know that when we initially spoke about this I said I would tidy my work up and make it public. Owing to various issues IRL I haven't yet had time to do this, I will either get to this in the next few days or hand it over to @rdlowrey or @datibbaw to do because I OWN THEM1. 1 Actually I just like to annoy them. |
An hostname validator can be useful in a lot of case. However all URL hostnames ( For this new flag (for the new filter), I suggest implementing this URL Living Standard algorithm: https://url.spec.whatwg.org/#host-parsing (for that too we can use ICU if we make it a core dependency of PHP: http://icu-project.org/apiref/icu4c432/uidna_8h.html#aaf3bec2415dd99b4221eeebb723eb082). About It can be used internaly by $parts = parse_url($url);
switch ($parts['host_type']) {
case PHP_URL_HOST_IPV6:
$mode = IPv6_ADDR;
// $ipv6 = substr($parts['host'], 1, -1);
break;
default:
if (filter_var($parts['host'], FILTER_VALIDATE_IP, FILTER_FLAG_IPV4)) {
$mode = IPv4_ADDR;
} else {
$mode = DNS_NAME;
}
} If everyone agree on this proposal, the work to do on PHP is:
Should we formalize that with a RFC or, as it's mostly fixing the current behavior, this issue is enough? |
When #890 will be merge, it will be easy to add IDN support to this new domain validator. What should be done next to have this code merged? |
7e410d3
to
728945d
Compare
|
@dunglas Should there be a flag for |
@whatthejeff Why not. Can be useful! |
It's very nice addition but it is missing documentation in manual... |
Where can we find the documentation of this page: http://php.net/manual/en/filter.filters.validate.php. I was looking for it so I could contribute the missing ones based on this PR. |
Hi, I don't recommend using that filter for now. |
…but never documented. This attempts to document it according to the notes on the original PR 826: php/php-src#826 Fixes #72013 -- Provided by anonymous 90461 (kevin.boyd@gmail.com) git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@344641 c90b9560-bf6c-de11-be94-00142212c4b1
…but never documented. This attempts to document it according to the notes on the original PR 826: php/php-src#826 Fixes #72013 -- Provided by anonymous 90461 (kevin.boyd@gmail.com) git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@344641 c90b9560-bf6c-de11-be94-00142212c4b1
…but never documented. This attempts to document it according to the notes on the original PR 826: php/php-src#826 Fixes #72013 -- Provided by anonymous 90461 (kevin.boyd@gmail.com) git-svn-id: http://svn.php.net/repository/phpdoc/en@344641 c90b9560-bf6c-de11-be94-00142212c4b1
- PHP way of doing so (php/php-src#826) - Improved and shortened regexp a bit
- PHP way of doing so (php/php-src#826) - Improved and shortened regexp a bit
- PHP way of doing so (php/php-src#826) - Improved and shortened regexp a bit
- PHP way of doing so (php/php-src#826) - Improved and shortened regexp a bit
- PHP way of doing so (php/php-src#826) - Improved and shortened regexp a bit
- PHP way of doing so (php/php-src#826) - Improved and shortened regexp a bit
- PHP way of doing so (php/php-src#826) - Improved and shortened regexp a bit
- PHP way of doing so (php/php-src#826) - Improved and shortened regexp a bit
- PHP way of doing so (php/php-src#826) - Improved and shortened regexp a bit
* Let isValidHost() determine validness of host - PHP way of doing so (php/php-src#826) - Improved and shortened regexp a bit * Give more specific debug message in case host(entry) is invalid * Rewrite filter_var checks * Host [[<ipv6>]] is not valid
…but never documented. This attempts to document it according to the notes on the original PR 826: php/php-src#826 Fixes #72013 -- Provided by anonymous 90461 (kevin.boyd@gmail.com) git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@344641 c90b9560-bf6c-de11-be94-00142212c4b1
@jaisato sorry for the very old reply but it's intended and stated in the PR description. Domains can contain special characters. Hostnames can't. In your case, you should use the |
* Let isValidHost() determine validness of host - PHP way of doing so (php/php-src#826) - Improved and shortened regexp a bit * Give more specific debug message in case host(entry) is invalid * Rewrite filter_var checks * Host [[<ipv6>]] is not valid
Introduce a new
FILTER_VALIDATE_DOMAIN
filter to validate domain name and label lengths according to RFCs. It does not check characters.A
FILTER_FLAG_HOSTNAME
is also available to specifically validate hostnames (they must start with an alphanumeric character and contains only[a-z-]
). This flag is used inFILTER_VALIDATE_URL
.See bug #68039