Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Pipe character not supported in URL #6
$url = new http\Url("http://www.example.com/?x=a|b");
The above code results in a fatal error:
However, pipe characters are valid in URLs, and widely used (for example by Google).
In general -- I would be in favour of making http\Url::__construct() as tolerant as possible, even when you pass an invalid URL to the constructor, the object should do its best to parse as much from it as is reasonably possible.
I'm not sure where the errors come from, and what real-life URLs they are from. I just see the errors pop up in the server logfiles after upgrading the pecl-http extension, which is undesirable. For example in the following code:
But the same problem occurs when parsing urls with other characters:
It is very well possible that curly braces and pipes are not allowed in urls -- I'm not sure about that. However, even if it turns out that such chars are formally illegal in URLs, it still is considered good practice to be forgiving when parsing input.
BTW, you could use
That does solve the issue indeed, so I'm going to add this flag to the code. However, I am still of the opinion that this is a bug in the pecl-http extension, even though the workaround is probably sufficient for us. Our code now looks like this:
// flags to be passed to the http\Url constructor $flags = 0; // newer versions of the pecl http library do their own multi-byte decoding, which results // in exceptions being thrown by the pecl-http library when a user enters a url that // contains pipe characters, even though that url is perfectly well recognized by the // browser, the Apache web server and the Zend engine. As a workaround, we pass a // number of flags to the http\Url constructor to rewrite the URL so that no exceptions // are thrown if (defined('http\\Url::PARSE_MBUTF8')) $flags |= http\Url::PARSE_MBUTF8; if (defined('http\\Url::PARSE_TOPCT')) $flags |= http\Url::PARSE_TOPCT; // construct current url $url = new http\Url($_SERVER['REQUEST_URI'], null, $flags);
As you can see, we're just parsing the $_SERVER['REQUEST_URI'] variable here, which holds the content of the incoming URL. We're not in control over the contents of this variable, as it just happens to be the address that the user manually entered in his browser, or the address of a hyperlink on a remote website. We're not in a position to "allow" or "disallow" a url, we're just faced with the reality that someone who we don't know entered this address, ended up on our website, and all we want to do is split the address into a scheme, hostname, etc. This has always worked, but the upgrade to the latest version of the pecl-http library suddenly breaks this. This is a regression.
Like I mentioned before, when you deal with data it is good practice (in fact: a requirement in most RFCs!) to be tolerant and forgiving when parsing data, and strict when generating it. It would therefore be much better if the default behavior of the constructor of the http\Url class would be as tolerant as possible, and only if you explicitly pass a specific flag (for example http\Url::STRICT) to throw exceptions when it encounters (small) errors.