Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.7] [HttpFoundation] [Request] Added check for valid host size limit #13522

Closed
wants to merge 4 commits into from

Conversation

phansys
Copy link
Contributor

@phansys phansys commented Jan 26, 2015

Q A
Bug fix? no
New feature? yes
BC breaks? kinda
Deprecations? no
Tests pass? yes
Fixed tickets #12349
License MIT
Doc PR none

Added check for valid host size limit according to RFC 1035 - 2.3.4 (ticket #12349):
http://www.ietf.org/rfc/rfc1035.txt

  • 63 bytes limit for each domain label
  • 255 bytes limit for whole domain name

TODO:

  • Replace preg_match() by preg_replace(). (after a timing benchmark I found preg_match() faster than preg_replace(), but I also have made some little improvements in regexs).

@fabpot
Copy link
Member

fabpot commented Jan 26, 2015

Don't you think that this should be done by PHP directly instead?

@phansys
Copy link
Contributor Author

phansys commented Jan 26, 2015

Yeah, it would be great; but I don't know how PHP is handling this internally given the ticket #12349. Have you some hint @fabpot?
Since Request::getHost() relies on $_SERVER, I'll read the CGI/1.1 specification that is supposed to follow and I'll do some "real" tests setting a very long hostname in my local environment in order to check how $_SERVER is populated.
I'll ask #12349's author (@cilefen) how to reproduce this issue.

@linaori
Copy link
Contributor

linaori commented Jan 26, 2015

What kind effect would this have with (for example) Arabic or Chinese characters?

http://en.wikipedia.org/wiki/Internationalized_domain_name

@phansys
Copy link
Contributor Author

phansys commented Jan 26, 2015

Hi @iltar. The effect with these characters is the expected according to RFC 1035 (the check is about bytes length instead chars).

@phansys
Copy link
Contributor Author

phansys commented Jan 26, 2015

Well, after doing the tests in the way that @mpdonadio exposed in #12349 I get the conclussion that PHP isn't taking care about this RFC. I was thinking that Drupal was overriding or manipulating in some way the $_SERVER's content; but simply sending a long host header is enough to reproduce this issue. I've even tested it against parse_url(), and it does not emit any kind of message.
What do you think @fabpot?

@fabpot
Copy link
Member

fabpot commented Jan 26, 2015

@phansys I think this should be reported to PHP itself and fixed there.

@jderusse
Copy link
Member

unfortunately it's not so easy. IDN are translated to ascii chars (for instance www.académie-française.fr will be converted to www.xn--acadmie-franaise-npb1a.fr) and the length of the hostname is the computed with the translated version.

The length of www.académie-française.fr is not 27 (strlen) nor 25 (mb_strlen) but 33.

@phansys
Copy link
Contributor Author

phansys commented Jan 26, 2015

OK @fabpot, I'll open a ticket in PHP in order to ask this feature.
@jeremy-derusse, I think in that case IDN must be converted before with idn_to_ascii().

@dunglas
Copy link
Member

dunglas commented Jan 26, 2015

I've added support for that in PHP's filter extension some times ago: php/php-src#826

IDN domain names are not currently supported unless calling idn_to_ascii() before the call to filter_var(). I've a working patch for that but it needs a PHP RFC to add a core dependency to ICU and I'm too busy to work on it for now.

throw new \UnexpectedValueException(sprintf('Invalid Host "%s"', $host));
if ($host) {
if ('' !== preg_replace('/(?:^\[)?[a-zA-Z0-9-:\]_]+\.?/', '', $host) ||
0 !== preg_match('/(?:^\[)?[a-zA-Z0-9-:\.\]_]{256,}|[a-zA-Z0-9-:\]_]{64,}\.?/', idn_to_ascii($host))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idn_to_ascii() is only available if the INTL extension is enabled and this is not always the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right @dunglas, thank you. I'm updating the tests and this logic in order to check if idn_to_ascii() returns false (while invalid IDNs). Later I'll check filter_var() with FILTER_VALIDATE_URL based in your PR.

@dunglas
Copy link
Member

dunglas commented Jan 26, 2015

IDN are not an issue here because they must be sent encoded in punycode by the client in the HOST header.
However, is it the responsibility of the Request class to validate the given host?

Actually, a lot of invalid hosts will be considered valid including but not limited to:

  • les-tilleuls.coop:8080test
  • http:les-tilleuls.coop
  • [a:b:c:z:e:f:]

I think it's not the responsibility of the Request class to validate the hostname but if you really want this feature, something like the following will probably do a better job (depending if my patch on PHP is running or not, invalid hosts can still be allowed):

$filtered = filter_var($host, FILTER_VALIDATE_URL);
if (!$filtered || $filtered !== parse_url('http://'.$host, PHP_URL_HOST)) {
   // throw the exception
}

@phansys
Copy link
Contributor Author

phansys commented Jan 27, 2015

Well said @dunglas.
I agree with you and @fabpot about the responsability of this class about hostname validation, even of Symfony itself; but the check was already here.
Taking in count that currently hostnames like www.académie-française.fr are considered invalid and in view of these facts, I need to know which is the kind of validation supposed to be performed here:

  • for IDNs case, if idn_to_ascii() is available we can trust it; but if isn't filter_var() with FILTER_VALIDATE_URL will fail.
  • with filter_var() is much simpler to do the checks, but it is more restrictive than current validation (IE [::1] doesn't match with FILTER_VALIDATE_IP).

When host is an IDN and idn_to_ascii() is no available, which way we will choose?

  • throw the exception (as now)
  • try to validate through other rules (regex, etc)

@dunglas
Copy link
Member

dunglas commented Jan 27, 2015

IMO this method should just blindly forward the host sent by the client, without any kind of validation. Not sure if removing an exception is considered BC but anyway this can be done for 3.0.

For 2.7, if we really want a better validation (we must be careful about performance and DOS attacks vector) IDNs aren't an issue here because, As said in my previous message, the client (the browser) must sent the host encoded in punycode. académie-française.fr isn't a valid Host header value. You can just remove the call to idn_to_ascii().
About filter_var(), I've added support for IPs like ::1 in PHP 7.

@linaori
Copy link
Contributor

linaori commented Jan 27, 2015

@dunglas I've written some code to validate the hostname before, for domain forwards actually. What I did, was actually the parse_url trick that you just posted. That solution seems to work pretty nicely, 👍 from me.

On a side note, this is validation. This sort of seems to me like it could be part of a DNS or Domain name related (validation) component. #10467

@phansys
Copy link
Contributor Author

phansys commented Jan 27, 2015

@dunglas, @iltar; sure.

My concern respect to BC is that if this check surpluses, it can be removed in 3.0 (in 2.7 someone can be relying on it as side effect). For the meantime, we could then trust in the header sent by the browser through filter_var() whit its IP and URL flags, and fallback IPv6 (yet) unsupported syntax to a regex; while if the header doesn't respect these rules throw the exception.

In respect to validation component, it would be great to use it here or where needed; but note the symfony/validator requirement and the possible circular dependency.

About where to put this validation until PHP take the responsibility, I think it must be done at population time, not while trying to get the value. Based on @fabpot's comments about delegating this issue to PHP, I can't imagine (from a user point of view) other way than PHP validating it at moment of populating the $_SERVER superglobal. Given that our equivalent operation in Symfony land is done in Request::initialize() I think we could attach this logic to ServerBag's constructor; but I have no idea how PHP could handle this validation if it fails (which type of message emit? left the value blank?). A thinking about this possible behavior is that if PHP (or us) start to validating hostname in this way, other $_SERVER values should be validated also.

@dunglas
Copy link
Member

dunglas commented Jan 27, 2015

IMO the responsibility of $_SERVER is just to give access to the value of the Host header even if it is invalid. The developper can still call filter_var on that value if he needs to.

My suggestion: leave this code as is in 2.x (no change, no BC break), drop it in 3.0.

Added check for valid host according to RFC 1035 - 2.3.4 (ticket symfony#12349):
http://www.ietf.org/rfc/rfc1035.txt
* 63 bytes limit for each domain label
* 255 bytes limit for whole domain name

| Q             | A
| ------------- | ---
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | kinda
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | symfony#12349
| License       | MIT
| Doc PR        | none

Updated tests.
@phansys
Copy link
Contributor Author

phansys commented Jan 27, 2015

Thanks @dunglas, it seems to be the cleaner solution.
I just rebased the PR to my first commit and I added tests for IDN hostnames (raw and encoded with Punycode).
Depending in if gets merged or not, #12349 can be closed as "resolved" or "won't fix".

* Removed uppercased patterns since $host has already applied strtolower().
* Narrowed quantifiers to only match one char for invalid sizes.
throw new \UnexpectedValueException(sprintf('Invalid Host "%s"', $host));
if ($host) {
if ('' !== preg_replace('/(?:^\[)?[a-z0-9-:\]_]+\.?/', '', $host) ||
0 !== preg_match('/(?:^\[)?[a-z0-9-:\.\]_]{256,257}|[a-z0-9-:\]_]{64,65}\.?/', $host)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be robust against the DoS attack mentionned in the comment ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @stof, I updated these lines and added a test for this case.

@phansys
Copy link
Contributor Author

phansys commented Feb 2, 2015

Travis failures are not related to this feature (Symfony\Component\Process\Tests\SigchildDisabledProcessTest and Symfony\Component\Process\Tests\SigchildEnabledProcessTest).

@fabpot
Copy link
Member

fabpot commented Feb 10, 2015

Closing, this piece of code does not try to validate the host, just to avoid having illegal characters in there.

@fabpot fabpot closed this Feb 10, 2015
@phansys phansys deleted the ticket_12349 branch February 10, 2015 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants