Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url.parse would be better to not decode authority? #12168

Closed
yosuke-furukawa opened this issue Apr 2, 2017 · 12 comments
Closed

url.parse would be better to not decode authority? #12168

yosuke-furukawa opened this issue Apr 2, 2017 · 12 comments
Labels
security Issues and PRs related to security. semver-major PRs that contain breaking changes and should be released in the next major version. url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation.

Comments

@yosuke-furukawa
Copy link
Member

  • Version:
    N/A

  • Platform:
    N/A

  • Subsystem:

I found this article.
According to the article, Java and Python has ftp protocol injection to decode CRLF in the url.

ftp://foo:bar%0d%0aINJECTED@example.net/file.png

The url has the newline(CRLF) in authority part. Java and Python ftp server recognize following injected code.

USER foo
PASS bar
INJECTED
TYPE I
EPSV ALL
PASV
...

And our url module has the same issue for CRLF.

> url.parse('ftp://foo:bar%0d%0aINJECTED@example.net/file.png')
Url {
  protocol: 'ftp:',
  slashes: true,
  auth: 'foo:bar\r\nINJECTED',
  host: 'example.net',
  port: null,
  hostname: 'example.net',
  hash: null,
  search: null,
  query: null,
  pathname: '/file.png',
  path: '/file.png',
  href: 'ftp://foo:bar%0D%0AINJECTED@example.net/file.png' }

I tried WHATWG URL, the new url parser does not decode the authority part.

new URL('ftp://foo:bar%0d%0aINJECTED@example.net/file.png')
// Chrome
{
  hash: "",
  host: "example.net",
  hostname: "example.net",
  href: "ftp://foo:bar%0d%0aINJECTED@example.net/file.png",
  origin: "ftp://example.net",
  password: "bar%0d%0aINJECTED",
  pathname: "/file.png",
  port: "",
  protocol: "ftp:",
  search: "",
  username: "foo"
}

// Node v7.8 REPL
URL {
  href: ftp://foo:bar%0d%0aINJECTED@example.net/file.png
  protocol: ftp:
  username: foo
  password: --------
  hostname: example.net
  pathname: /file.png
}

Question

Why our url.parse decode authority by default?
And should we fix this CRLF problem?

I tried to fix this problem to sanitize CRLF url. but this change breaks compatibility. so I would like to hear some opinions.

I tried some npm modules related to ftp, but I cannot find vulnerabilities using this problem.

related urls

@vsemozhetbyt vsemozhetbyt added security Issues and PRs related to security. semver-major PRs that contain breaking changes and should be released in the next major version. url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Apr 2, 2017
@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Apr 2, 2017

If this should be addressed, beware 8.x rc deadline will be closing in 2 days.

@watilde
Copy link
Member

watilde commented Apr 3, 2017

I'd +1 on this to put the patch into 8.0.0, and I think it would be nice if you could open a pull request either.

cc @nodejs/url

@TimothyGu
Copy link
Member

TimothyGu commented Apr 3, 2017

My fear is this change is too drastic to be included in 8.0.0 without a prolonged period of testing. In fact, I am not certain the URL parser behavior should be changed at all, for compatibility with implementations in Java, Python, and older versions of Node.js, and for the fact that the bug only manifests itself with FTP, and is not a deficiency per se of the URL parser.

On the other hand, I support additional error conditions in userland FTP client modules, such as @mscdex's ftp module.

@addaleax
Copy link
Member

addaleax commented Apr 3, 2017

/cc @nodejs/security

@jasnell
Copy link
Member

jasnell commented Apr 3, 2017

While I've never been fond of the behavior of url.parse() I'm afraid that this would be too drastic of a breaking change to get into 8.0.0 with so little time left. The ecosystem is incredibly sensitive to breaking changes in the url.parse() functionality (as we've seen before).

Given that there is a functional alternative in core (the WHATWG parser), I would rather see efforts focused there than on changing the existing parser. But that's just me. I'm interested in hearing what @nodejs/ctc folks have to say :-)

@bnoordhuis
Copy link
Member

not a deficiency per se of the URL parser

I feel the same way. The problem is with (some) users of the module, not the module itself.

This probably can't be changed anyway because of backwards compatibility. You wouldn't be able to tell if foo:bar%baz contains an URL-encoded octet or the characters %, b and a without looking at the node.js version.

@yosuke-furukawa
Copy link
Member Author

I don't target v8.0 to fix this change.
But I would like to know our url module stance.

not a deficiency per se of the URL parser

Ya, I think so. but even though we would be better not to decode any url parts by default.
At the least, we would be better to provide more secure / useful option like { decodeAuth: false }.

In my humble opinion,

  1. we would be better to stop decodeUriComponent by default (or provide secure option)
  2. recommend WHATWG URL parser and deprecate url.parse gradually.

@jasnell
Copy link
Member

jasnell commented Apr 4, 2017

Item 2 (recommending WHATWG URL Parser and slowly deprecating url.parse() is the track we are already on. My preference would be to document the deficiencies in url.parse() but leave the behavior as is without any modification.

@joyeecheung
Copy link
Member

Item 2 (recommending WHATWG URL Parser and slowly deprecating url.parse() is the track we are already on.

Speaking of which, a guide on how to migrate from the legacy API to WHATWG URL API would help, but I am not sure if this should be in doc/api/url.md, or in the website repo?

@bnoordhuis
Copy link
Member

At the least, we would be better to provide more secure / useful option like { decodeAuth: false }.

That requires people to opt in, though. If someone is security-savvy enough to do that, they probably are already careful to encodeURIComponent() their inputs.

@Trott
Copy link
Member

Trott commented Aug 1, 2017

Should this remain open?

@TimothyGu
Copy link
Member

I don't think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
security Issues and PRs related to security. semver-major PRs that contain breaking changes and should be released in the next major version. url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

No branches or pull requests

9 participants