Skip to content

Conversation

nawarian
Copy link
Contributor

@nawarian nawarian commented Mar 1, 2020

This PR aims to fix the bug ticket #79178.

There's something bothering me though:
php_replace_controlchars_ex seems to replace every control char with _ instead of panicking.

Meanwhile, RFC 1738 says the following:

URLs are written only with the graphic printable characters of the US-ASCII coded character set. The octets 80-FF hexadecimal are not used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent control characters; these must be encoded.

The above states that characters like new line should be encoded. PHP, for some reason I couldn't figure out so far, replaces them with underscores.

Even though this fixes the bug on PHP_URL_HOST, the same issue still happens on other URL parts.

<?php

// There's no backslash after .net and before PHP_EOL!!
var_dump(parse_url('https://php.net' . PHP_EOL));
// bool(false)

// There IS a backslash after .net and before PHP_EOL!!
var_dump(parse_url('https://php.net/' . PHP_EOL));
// array(3) {
//   ["scheme"]=>
//   string(5) "https"
//   ["host"]=>
//   string(7) "php.net"
//   ["path"]=>
//   string(2) "/_"
// }

Path seems to be still wrong in my opinion.

Shall we discuss a better solution for this maybe?

@nawarian nawarian changed the base branch from master to PHP-7.3 March 1, 2020 00:20
@nawarian nawarian changed the base branch from PHP-7.3 to master March 1, 2020 00:21
@nawarian nawarian force-pushed the bug-79178_parse_url branch from db463cf to 3351a6c Compare March 1, 2020 00:57
@nawarian nawarian changed the base branch from master to PHP-7.3 March 1, 2020 00:57
@nawarian nawarian force-pushed the bug-79178_parse_url branch from 3351a6c to b916f86 Compare March 25, 2020 08:24
@patrickallaert
Copy link
Contributor

Last message on related issue:

The "_" has been used to prevent CR/LF injection issues, see: 184323c

I don't really know what you should expect from parse_url() provided that you give him an invalid URL.

Expecting "yandex.ru" while the host "yandex.ru\n" has been provided feels wrong to me.

What would you expect from:

var_dump(parse_url("http://yan\ndex.ru\n", PHP_URL_HOST));

for example?

To me, your case and the one above could equally give: bool(false) IMHO.

I am closing this issue (and the related PR) as it isn't really a bug, nor it has evolved in 17 months and there is no clear outcome of what to expect from parsing invalid URLs.

You are very welcome to re-open this issue and/or it's PR if you have more thoughts about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants