parse() does not handle custom protocol with root slash, like whatever:///whatever #11

felixfbecker · 2016-11-20T23:03:57Z

As this package is called sabre/uri, and not sabre/url, I expected it to handle any valid URI. For example, it should handle any URI scheme. However, that is not the case, because it uses parse_url internally. The docs clearly say that

This function is intended specifically for the purpose of parsing URLs and not URIs. However, to comply with PHP's backwards compatibility requirements it makes an exception for the file:// scheme where triple slashes (file:///...) are allowed. For any other scheme this is invalid.

http:// and file:// work, any other URI results in an Errror: Unsupported operand types.

Examples:

Uri\parse('ftp:///whatever')
Uri\parse('ssh://git@github.com/fruux/sabre-uri.git')

This really limits the usefulness of this function.

The text was updated successfully, but these errors were encountered:

evert · 2016-11-21T04:01:48Z

Hi!

My goal is definitely to support as many schemes as possible. I was under the impression that parse_url worked fine for a bunch of different schemes. At the very least mailto: works.

I question the validity of both those urls. As far as I can tell a triple-slash in ftp uri's is not allowed. The ssh uri is not officially registered anywhere, and I also question if it's correct. The colon in particular is weird. The only draft spec for the ssh uri I could find actually uses the format ssh://host:port/path. A path after a colon is weird and would be different than any other uri.

felixfbecker · 2016-11-21T10:23:47Z

There is a difference between URL and URI:

A URI can be further classified as a locator, a name, or both. The
term "Uniform Resource Locator" (URL) refers to the subset of URIs
that, in addition to identifying a resource, provide a means of
locating the resource by describing its primary access mechanism
(e.g., its network "location"). The term "Uniform Resource Name"
(URN) has been used historically to refer to both URIs under the
"urn" scheme [RFC2141], which are required to remain globally unique
and persistent even when the resource ceases to exist or becomes
unavailable, and to any other URI with the properties of a name.

An individual scheme does not have to be classified as being just one
of "name" or "locator". Instances of URIs from any given scheme may
have the characteristics of names or locators or both, often
depending on the persistence and care in the assignment of
identifiers by the naming authority, rather than on any quality of
the scheme. Future specifications and related documentation should
use the general term "URI" rather than the more restrictive terms
"URL" and "URN" [RFC3305].

A valid URI is anything that follows has this form:


      URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

      hier-part   = "//" authority path-abempty
                  / path-absolute
                  / path-rootless
                  / path-empty

There is no constraint on what schemes are allowed. What I gave were just examples, they don't have to be syntactically correct according to the scheme, but they are syntactically correct according to the definition of URIs. For example, I could invent my own URI protocol:

felixfbecker://yepitsawesome?isitreally#yesitis

and that would be a valid URI, but parse_url, and it turn Sabre\Uri\parse will fail to parse it.

You might think this is not a big use case but my language server gets URIs from the outside and the only thing I know is that they are valid URIs, not the protocol. VS Code for example sends new files with a URI like inmemory://1. Other clients have a virtual file system and send vfs:///somefile. I need a way to parse these, without imposing any constraints on the scheme - I just want to split it up into components.

evert · 2016-11-21T13:27:30Z

That last example you gave actually parses just fine again, because it's valid.

Can you give an example of a URI that's valid and does not parse?

felixfbecker · 2016-11-21T13:36:04Z

\Sabre\Uri\parse('whatever:///whatever');

Seems to be related to the root slash - which works fine for file:// URIs, but not anything else

evert · 2016-11-21T13:52:31Z

I had to do a bunch of digging but eventually found this:

If the URI scheme defines a default for host, then that default applies when the host subcomponent is undefined or when the registered name is empty (zero length). For example, the "file" URI scheme is defined so that no authority, an empty host, and "localhost" all mean the end-user's machine, whereas the "http" scheme considers a missing authority or empty host invalid.

So the triple slash issue is a legit one. I wouldn't go as far as saying that the library only parsers URLs and not URIs though, but rather that it can't parse a URI with an empty hostname if it's not a file: scheme.

felixfbecker · 2016-11-21T14:01:04Z

Yep, sorry. I just saw that comment on parse_url which states that it only parses URLs and file:// is only there for legacy reasons, so I too quickly assumed that no custom protocol is supported.

felixfbecker · 2016-11-21T16:14:39Z

For my use case it would be enough if it doesn't parse, but at least doesn't generate an Error - I think it is a PHP4-style error, so we could silence it with @.

evert · 2016-11-24T02:16:21Z

Marking this as a bug, but I want to say that given your situation Sabre\Uri simply not be the best choice. One reason for this is that this:

inmemory://1

Could technically pass as a valid uri.. maybe? I'm not even sure. 1 here will be interpreted as the host name. If I understand this URI correctly, this really should have been something like this:

inmemory:1

But ideally even this:

urn:x-vendorname:inmemory:1

If you run into URIs of the first kind, and you don't expect just any uri, the thing you are actually probably interested in is:

The scheme
Everything that's not the scheme

In that case you might simply be better off doing a split on :. It just doesn't feel like these are real uris, but rather just something that's made to look like one.

Just a thought! I still want to fix this though, even though it's a bit depressing because the solution to this means a custom parser which is gonna be slower for the 99.99% use-case.

felixfbecker · 2016-11-24T02:28:47Z

@evert I agree with you - I think a URN would be a better fit for this. But I get these from the client and can't control them. What I am actually interested in is only the path, I need to match it against a few conditions. It's okay if the condition doesn't match because path is actually null because the URI is one of the weird inmemory://n URIs, but getting a PHP4-style error (that gets turned into an exception) should be avoided at all cost. Splitting by : will not be sufficient, because other URIs I get (those that are not the weird inmemory scheme) may have a query and/or hash fragment (I think they could also be relative). So I would like to rely on the parse() function to give me just the path fragment, if there is one. If not, or the URI is weird, just set all the values to null like it is done already, but please avoid triggering an error. Would that be a compromise you can live with? Using parse_url, but putting an @ in front of it?

evert · 2016-11-24T02:33:45Z

I would ideally like the API to parse any valid URI you throw at it, but always throw an exception for those that aren't.

What's stopping you from catching the exception and handling it that way?

felixfbecker · 2016-11-24T02:39:59Z

Seems a bit weird to use @ on such a high-level third-party function, but I could do it that way of course

evert · 2016-11-24T14:06:54Z

What about a try..catch block? Don't use @.

felixfbecker · 2016-11-24T15:34:58Z

It's a PHP4-style error iirc. You can only catch it if you have an error handler registered, and that is an assumption I don't want to make. Maybe Sabre could @ and then throw an error depending on the return and error_get_last?

evert · 2016-11-24T15:57:19Z

I see, yes that would be perfect! Totally agree with that idea

@felixfbecker

This is a new implementation of PHP's parse_url. It's likely not perfect, but it catches specific cases that the built-in function does not support. Fixes Issue #9 Fixes Issue #11 @felixfbecker / @sii / @PVince81 Does this sufficiently solve your problem?

@felixfbecker

This is a new implementation of PHP's parse_url. It's likely not perfect, but it catches specific cases that the built-in function does not support. Fixes Issue #9 Fixes Issue #11 @felixfbecker / @sii / @PVince81 Does this sufficiently solve your problem?

felixfbecker changed the title ~~URI parser cannot handle URIs....~~ parse() does not handle custom protocol with root slash, like whatever:///whatever Nov 21, 2016

evert added the bug label Nov 24, 2016

evert mentioned this issue Dec 6, 2016

Workaround for parsing colons in partial URLs #12

Closed

evert mentioned this issue Dec 6, 2016

Fallback URI parser. #13

Merged

evert closed this as completed Dec 7, 2016

evert mentioned this issue Dec 7, 2016

Backport fallback parser #14

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parse() does not handle custom protocol with root slash, like whatever:///whatever #11

parse() does not handle custom protocol with root slash, like whatever:///whatever #11

felixfbecker commented Nov 20, 2016 •

edited

Loading

evert commented Nov 21, 2016

felixfbecker commented Nov 21, 2016 •

edited

Loading

evert commented Nov 21, 2016

felixfbecker commented Nov 21, 2016 •

edited

Loading

evert commented Nov 21, 2016

felixfbecker commented Nov 21, 2016

felixfbecker commented Nov 21, 2016

evert commented Nov 24, 2016

felixfbecker commented Nov 24, 2016

evert commented Nov 24, 2016

felixfbecker commented Nov 24, 2016

evert commented Nov 24, 2016

felixfbecker commented Nov 24, 2016

evert commented Nov 24, 2016

parse() does not handle custom protocol with root slash, like whatever:///whatever #11

parse() does not handle custom protocol with root slash, like whatever:///whatever #11

Comments

felixfbecker commented Nov 20, 2016 • edited Loading

evert commented Nov 21, 2016

felixfbecker commented Nov 21, 2016 • edited Loading

evert commented Nov 21, 2016

felixfbecker commented Nov 21, 2016 • edited Loading

evert commented Nov 21, 2016

felixfbecker commented Nov 21, 2016

felixfbecker commented Nov 21, 2016

evert commented Nov 24, 2016

felixfbecker commented Nov 24, 2016

evert commented Nov 24, 2016

felixfbecker commented Nov 24, 2016

evert commented Nov 24, 2016

felixfbecker commented Nov 24, 2016

evert commented Nov 24, 2016

felixfbecker commented Nov 20, 2016 •

edited

Loading

felixfbecker commented Nov 21, 2016 •

edited

Loading

felixfbecker commented Nov 21, 2016 •

edited

Loading