Fixed parse_userinfo breaking with complex passwords #293

Thomspoon · 2017-04-19T01:10:56Z

Fixed the issue with my last pull request.

The problem with the parse_userinfo right now is that it breaks for a fragment delimiter or query string. That's all fine and dandy for further parsing functions, however, this check shouldn't be done with looking for the username and password, because these values can and will be used in passwords.

For example: let url = Url::parse("mysql://root:pass#word@example.com");

Results in: thread 'main' panicked at 'called Result::unwrap()on anErr value: InvalidPort' Due to the early break at the # sign.

After the change, the result is as expected: mysql://root:pass%23word@example.com/

Unless I'm mistaken, a fragment or query will never appear this early on in a url.

This change is

SimonSapin · 2017-04-19T01:31:26Z

this check shouldn't be done […]

Why shouldn’t it?

This code implements the algorithm specified at https://url.spec.whatwg.org/#url-parsing. Interoperability with other implementations is important. If you think the algorithm should be changed, it should be changed in the specification first. The specification’s repository is https://github.com/whatwg/url. When asking for a change, expect to be asked for justification of why this change is desirable and why it won’t break existing content.

Thomspoon · 2017-04-19T01:35:44Z

Why shouldn’t it?

Can you think of an instance where either of those elements would be used in the user_info section of a url?

It's obvious they cannot be because if the algorithm is identical, then the algorithm would have a bug. It's not uncommon to have an octothorp in a password, so I don't understand how this isn't more of an issue.

SimonSapin · 2017-04-19T01:51:00Z

a://b:c#d@e doesn’t have a # in its password, it has @ in its fragment. # is a delimiter in URLs like many other ASCII punctuation characters. If you want to include # without it being a delimiter, you need to percent-encode it as %23. That’s what percent-encoding is for. This is similar to using @ in a username, which is common when some system uses email addresses as usernames. You also have to percent-encode it.

Thomspoon · 2017-04-19T01:57:05Z

Testing that address with your code throws a panic.

extern crate url;

use url::{Url, percent_encoding};

fn main() {
    let url = Url::parse("a://b:c#d@e");
    println!("{:?}", url.unwrap());
}

thread 'main' panicked at 'called Result::unwrap()on anErr value: InvalidPort'

The url a://b:c is not a valid address, so fragmenting to a section of it isn't valid.

In my opinion, the specification at https://url.spec.whatwg.org/#url-parsing is a little ambiguous, and in the relative state section, they should delineate that those flags are looking at the end of a url. They don't say it, but /, ?, # would only be found at the end for fragmentation, queries, and pathing.

SimonSapin · 2017-04-19T02:15:20Z

In my opinion, the specification at https://url.spec.whatwg.org/#url-parsing is a little ambiguous

Ambiguity is a specification bug that needs to be fixed there.

Thomspoon · 2017-04-19T02:29:48Z

Issue created: whatwg/url#294

nox · 2017-04-19T11:11:02Z

@Thomspoon Why aren't you just passing %23?

Thomspoon · 2017-04-19T17:43:36Z

That seems like a workaround, shouldn't be a permanent fix. I don't agree with the result of the issue I created, but I guess I don't have a choice.

nox · 2017-04-19T18:34:25Z

What would http://foo:80#bar@blah mean if # was allowed unescaped in a password? The change you want would make the grammar ambiguous.

Thomspoon · 2017-04-19T21:06:15Z

The @ is the delimiter that specifies if it should be a scheme://username:password@domain:port or scheme://domain:port#fragment.

From RFC1738:

3.1. Common Internet Scheme Syntax

While the syntax for the rest of the URL may vary depending on the
particular scheme selected, URL schemes that involve the direct use
of an IP-based protocol to a specified host on the Internet use a
common syntax for the scheme-specific data:
    //<user>:<password>@<host>:<port>/<url-path>
Some or all of the parts ":@", ":",
":", and "/" may be excluded. The scheme specific
data start with a double slash "//" to indicate that it complies with
the common Internet scheme syntax. The different components obey the
following rules:
user
    An optional user name. Some schemes (e.g., ftp) allow the
    specification of a user name.

password
    An optional password. If present, it follows the user
    name separated from it by a colon.
The user name (and password), if present, are followed by a
commercial at-sign "@". Within the user and password field, any ":",
"@", or "/" must be encoded.

Doesn't mention anything about the limitations of the password itself.

nox · 2017-04-19T21:42:24Z

This RFC is updated by RFC 3986, but we follow the URL standard anyway.

Note that the fact that it doesn't say anything about which characters are allowed or not makes it ambiguous to parse http://foo:80#bar@blah, as I stated earlier. That's one of the many reasons why RFC 3986 and then the URL standard exist.

Thomspoon · 2017-04-19T22:08:49Z

Well, the guys over at the URL Standard don't seem to think a retroactive change would be beneficial, so I guess this is a lost cause. I'll just accept the fact that I have to use percent-encoding 🤕

nox · 2017-04-19T22:19:17Z

Thanks for your contribution anyway!

Fixed userinfo breaking with complex passwords

6c8032a

Thomspoon changed the title ~~Fixed userinfo breaking with complex passwords~~ Fixed parse_userinfo breaking with complex passwords Apr 19, 2017

nox closed this Apr 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixed parse_userinfo breaking with complex passwords #293

Fixed parse_userinfo breaking with complex passwords #293

Uh oh!

Thomspoon commented Apr 19, 2017 •

edited

Loading

Uh oh!

SimonSapin commented Apr 19, 2017

Uh oh!

Thomspoon commented Apr 19, 2017

Uh oh!

SimonSapin commented Apr 19, 2017

Uh oh!

Thomspoon commented Apr 19, 2017 •

edited

Loading

Uh oh!

SimonSapin commented Apr 19, 2017

Uh oh!

Thomspoon commented Apr 19, 2017

Uh oh!

nox commented Apr 19, 2017

Uh oh!

Thomspoon commented Apr 19, 2017 •

edited

Loading

Uh oh!

nox commented Apr 19, 2017

Uh oh!

Thomspoon commented Apr 19, 2017 •

edited

Loading

Uh oh!

nox commented Apr 19, 2017

Uh oh!

Thomspoon commented Apr 19, 2017

Uh oh!

nox commented Apr 19, 2017

Uh oh!

Uh oh!

Fixed parse_userinfo breaking with complex passwords #293

Fixed parse_userinfo breaking with complex passwords #293

Uh oh!

Conversation

Thomspoon commented Apr 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SimonSapin commented Apr 19, 2017

Uh oh!

Thomspoon commented Apr 19, 2017

Uh oh!

SimonSapin commented Apr 19, 2017

Uh oh!

Thomspoon commented Apr 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SimonSapin commented Apr 19, 2017

Uh oh!

Thomspoon commented Apr 19, 2017

Uh oh!

nox commented Apr 19, 2017

Uh oh!

Thomspoon commented Apr 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nox commented Apr 19, 2017

Uh oh!

Thomspoon commented Apr 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nox commented Apr 19, 2017

Uh oh!

Thomspoon commented Apr 19, 2017

Uh oh!

nox commented Apr 19, 2017

Uh oh!

Uh oh!

Thomspoon commented Apr 19, 2017 •

edited

Loading

Thomspoon commented Apr 19, 2017 •

edited

Loading

Thomspoon commented Apr 19, 2017 •

edited

Loading

Thomspoon commented Apr 19, 2017 •

edited

Loading