Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: URL.Parse Multiple Parsing Issues #29098

Open
wir3less opened this issue Dec 4, 2018 · 28 comments

Comments

@wir3less
Copy link

commented Dec 4, 2018

What version of Go are you using (go version)?

$ go version
go version go1.11.2 windows/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\wir3less\AppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\wir3less\go
set GOPROXY=
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=C:\Users\wir3less\AppData\Local\Temp\go-build829182294=/tmp/go-build -gno-record-gcc-switches

What did you do?

While playing around with URL.Parse I found a few problems I'd like to share.
I'll gladly share more details if anything is unclear or if someone is interested.

Normally, javascript:alert(1) when parsed by url.parse has no Hostname()
But javascript://alert(1) has a hostname of alert(1)
This can be taken further...
javascript://%250aalert(1)+'aa@google.com/a'a has a hostname of google.com and will pop an alert if relocated to by a browser (after decoding)

IPV6 support also has it's issues...
this URL http://[google.com]:80 has the hostname of google.com
But also do all of these:
http://google.com]:80
http://google.com]:80__Anything_you'd_like_sir
http://[google.com]FreeTextZoneHere]:80

Even without thinking about how this would interact with other systems and parsers,
Just considering code used URL hostname validations and Go's https functions (http.Get() for instance) leveraging url.parse should explain how this could be used maliciously.

Again, will be glad to provide more details if needed.
All POCs can be found here
https://play.golang.org/p/UoqEcxCFY8z

What did you expect to see?

Errors for most of it...

What did you see instead?

Hostnames

@ALTree ALTree changed the title Net/URL: URL.Parse Multiple Parsing Issues net/url: URL.Parse Multiple Parsing Issues Dec 4, 2018

@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 5, 2018

/cc @bradfitz

@mengzhuo

This comment has been minimized.

Copy link
Contributor

commented Dec 5, 2018

What did you see instead?
Hostnames

I think an invalid error is more appropriate.

@wir3less

This comment has been minimized.

Copy link
Author

commented Dec 5, 2018

I agree. Thats what I wrote on the "Expected" field.
Recieving a Hostname is what currently happens.

@mengzhuo

This comment has been minimized.

Copy link
Contributor

commented Dec 5, 2018

I've take a look into url_test.go

go/src/net/url/url_test.go

Lines 423 to 432 in 5e17278

// worst case host, still round trips
{
"scheme://!$&'()*+,;=hello!:port/path",
&URL{
Scheme: "scheme",
Host: "!$&'()*+,;=hello!:port",
Path: "/path",
},
"",
},

You can see this is design on purpose.

@wir3less

This comment has been minimized.

Copy link
Author

commented Dec 6, 2018

I see...
Well this is still the wrong behaviour, both in my opinion as well as by the spec.
The fact that we have tests for it doesn't make it right...
Try testing other parsers, non will allow that kind of parsing

@mees-

This comment has been minimized.

Copy link

commented Dec 27, 2018

I don't know if this is the right place to mention this but url.Parse("localhost") sets the Path to "localhost" and the host is empty. I believe this is also an error in the parsing

@fraenkel

This comment has been minimized.

Copy link
Contributor

commented Dec 27, 2018

@mees- Why would that be an error? You have provided a valid relative path.

@mees-

This comment has been minimized.

Copy link

commented Dec 27, 2018

I'm sorry, I misunderstood the documentation

@wir3less

This comment has been minimized.

Copy link
Author

commented Dec 28, 2018

Can I provide any more info to get this issue resolved? No point in leaving this issue open for nothing...

@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 29, 2018

Try testing other parsers, non will allow that kind of parsing

If you can take each case, and show a comparison of the behavior in Python, Node and Ruby, it will help us understand the situation better.

@wir3less

This comment has been minimized.

Copy link
Author

commented Jan 1, 2019

So I've taken the time and compiled the following table
https://docs.google.com/spreadsheets/d/1HNyNO6dVYNhdsd_oLZELw4L17L3hK0r2OcZZv1lrx3Y/edit?usp=sharing

You can notice that for the Javascript URLs, Chrome is the only one actually following the spec, and python is doing a half decent job (has hostname, but ignored http specific chars like '@' in non http schemes.)

For the IPv6 URLs, you can see that Go is by far the most permissive parser.

Please keep in mind that all these vectors can be used to trick and bypass code trying to validate the host of a user-supplied URL.

I'm also going to open bugs for Node and Ruby as they are also in risk of that.

@agnivade

This comment has been minimized.

Copy link
Member

commented Jan 1, 2019

Thank you for spending time on this !

@wir3less

This comment has been minimized.

Copy link
Author

commented Jan 1, 2019

I dug a little deeper on the Node cases.
I used node 10.13.0 for my testing, apparently, there's already 10.15.0
One of the patches is of interest to this issue - https://nodejs.org/en/blog/vulnerability/november-2018-security-releases/#hostname-spoofing-in-url-parser-for-javascript-protocol-cve-2018-12123

So using url.parse() is fixed on Node now, these problems still exist it if you use the new URL() syntax

> new URL("javascript://google.com").host
'google.com'
> url.parse("javascript://google.com").host
null
@wir3less

This comment has been minimized.

Copy link
Author

commented Jan 9, 2019

Guys,
Any update here?

@bradfitz bradfitz added this to the Go1.13 milestone Jan 9, 2019

@opennota

This comment has been minimized.

Copy link

commented Jan 10, 2019

   javascript://alert(1) has a hostname of alert(1)
   javascript://%250aalert(1)+'aa@google.com/a'a has a hostname of google.com and will pop an alert if relocated to by a browser (after decoding)

I don't see an issue here. If you're using unescaped and unsanitized data in HTML code, then nothing would help you. And whitelisting is not an option here, while blacklisting isn't a solution as it always can be bypassed by a malicious.actor.

@wir3less

This comment has been minimized.

Copy link
Author

commented Jan 10, 2019

Thanks @bradfitz .

@opennota - The URL parser is meant to deal with such data.
You're supposed to be able to pass user-data into it, and it should return an error in case the URL is malformed.
Otherwise, it should return a safe object that correctly represents the URL. Meaning, for example, no Hostname for hostless schemes such as JS.

As for the blacklist approach, I think a piece of code such as the one implemented by Node, should do the trick:
https://github.com/nodejs/node/blob/master/lib/url.js#L277

@wir3less

This comment has been minimized.

Copy link
Author

commented Jan 13, 2019

@bradfitz I just realized that go releases a major version every 6 months, and that 1.13 will be released on August 19.
Does that mean this issue will wait till then to get resolved?

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jan 14, 2019

At least, if ever. We don't have a great history of being able to make changes to the URL type without breaking code. So it might need to remain unchanged (or at least only documented). But I can't say because I haven't looked into this bug at all yet or your spreadsheet. But thanks for gathering data.

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019

@andybons andybons modified the milestones: Go1.14, Go1.13 Aug 6, 2019

@andybons andybons removed the early-in-cycle label Aug 6, 2019

@gopherbot

This comment has been minimized.

Copy link

commented Aug 7, 2019

Change https://golang.org/cl/189258 mentions this issue: net/url: make Hostname and Port predictable for invalid Host values

@wir3less

This comment has been minimized.

Copy link
Author

commented Aug 9, 2019

Glad to see this is getting fixed, I'm be happy to assist or just verify the fix if you'd like.

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

@wir3less We'd definitely appreciate it if you could verify the fix. For a security patch we needed to reduce disruption as much as possible, so we focused on protecting against validation bypasses, rather than rejecting all invalid hostnames (which will come in Go 1.14 as much as possible).

If you can still see validation bypass avenues, please let us know. Also, please let us know how you'd like to be credited for reporting this, if at all.

For next time, this kind of issue is worth reporting to security@golang.org rather than the issue tracker.

@wir3less

This comment has been minimized.

Copy link
Author

commented Aug 9, 2019

Great, and I completely understand. I'll go over the patch and let you know if I find anything interesting (via the security email this time :) ).

As for credit, the following would be great
Adi Cohen - adico.me

Thanks

@odeke-em

This comment has been minimized.

Copy link
Member

commented Aug 10, 2019

/cc @mikesamuel who might also be interested

gopherbot pushed a commit that referenced this issue Aug 12, 2019

net/url: make Hostname and Port predictable for invalid Host values
When Host is not valid per RFC 3986, the behavior of Hostname and Port
was wildly unpredictable, to the point that Host could have a suffix
that didn't appear in neither Hostname nor Port.

This is a security issue when applications are applying checks to Host
and expecting them to be meaningful for the contents of Hostname.

To reduce disruption, this change only aims to guarantee the following
two security-relevant invariants.

* Host is either Hostname or [Hostname] with Port empty, or
  Hostname:Port or [Hostname]:Port.

* Port is only decimals.

The second invariant is the one that's most likely to cause disruption,
but I believe it's important, as it's conceivable an application might
do a suffix check on Host and expect it to be meaningful for the
contents of Hostname (if the suffix is not a valid port).

There are three ways to ensure it.

1) Reject invalid ports in Parse. Note that non-numeric ports are
   already rejected if and only if the host starts with "[".

2) Consider non-numeric ports as part of Hostname, not Port.

3) Allow non-numeric ports, and hope they only flow down to net/http,
   which will reject them (#14353).

This change adopts both 1 and 2. We could do only the latter, but then
these invalid hosts would flow past port checks, like in
http_test.TestTransportRejectsAlphaPort. Non-numeric ports weren't fully
supported anyway, because they were rejected after IPv6 literals, so
this restores consistency. We could do only the former, but at this
point 2) is free and might help with manually constructed Host values
(or if we get something wrong in Parse).

Note that net.SplitHostPort and net.Dial explicitly accept service names
in place of port numbers, but this is an URL package, and RFC 3986,
Section 3.2.3, clearly specifies ports as a number in decimal.

net/http uses a mix of net.SplitHostPort and url.Parse that would
deserve looking into, but in general it seems that it will still accept
service names in Addr fields as they are passed to net.Listen, while
rejecting them in URLs, which feels correct.

This leaves a number of invalid URLs to reject, which however are not
security relevant once the two invariants above hold, so can be done in
Go 1.14: IPv6 literals without brackets (#31024), invalid IPv6 literals,
hostnames with invalid characters, and more.

Tested with 200M executions of go-fuzz and the following Fuzz function.

	u, err := url.Parse(string(data))
	if err != nil {
		return 0
	}
	h := u.Hostname()
	p := u.Port()

	switch u.Host {
	case h + ":" + p:
		return 1
	case "[" + h + "]:" + p:
		return 1
	case h:
		fallthrough
	case "[" + h + "]":
		if p != "" {
			panic("unexpected Port()")
		}
		return 1
	}
	panic("Host is not a variant of [Hostname]:Port")

Fixes CVE-2019-14809
Updates #29098

Change-Id: I7ef40823dab28f29511329fa2d5a7fb10c3ec895
Reviewed-on: https://go-review.googlesource.com/c/go/+/189258
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@dmitshur

This comment has been minimized.

Copy link
Member

commented Aug 13, 2019

@gopherbot Please backport this to 1.12 and 1.11. This was a security problem.

@gopherbot

This comment has been minimized.

Copy link

commented Aug 13, 2019

Backport issue(s) opened: #33632 (for 1.11), #33633 (for 1.12).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

gopherbot pushed a commit that referenced this issue Aug 13, 2019

[release-branch.go1.11-security] net/url: make Hostname and Port pred…
…ictable for invalid Host values

When Host is not valid per RFC 3986, the behavior of Hostname and Port
was wildly unpredictable, to the point that Host could have a suffix
that didn't appear in neither Hostname nor Port.

This is a security issue when applications are applying checks to Host
and expecting them to be meaningful for the contents of Hostname.

To reduce disruption, this change only aims to guarantee the following
two security-relevant invariants.

* Host is either Hostname or [Hostname] with Port empty, or
  Hostname:Port or [Hostname]:Port.

* Port is only decimals.

The second invariant is the one that's most likely to cause disruption,
but I believe it's important, as it's conceivable an application might
do a suffix check on Host and expect it to be meaningful for the
contents of Hostname (if the suffix is not a valid port).

There are three ways to ensure it.

1) Reject invalid ports in Parse. Note that non-numeric ports are
   already rejected if and only if the host starts with "[".

2) Consider non-numeric ports as part of Hostname, not Port.

3) Allow non-numeric ports, and hope they only flow down to net/http,
   which will reject them (#14353).

This change adopts both 1 and 2. We could do only the latter, but then
these invalid hosts would flow past port checks, like in
http_test.TestTransportRejectsAlphaPort. Non-numeric ports weren't fully
supported anyway, because they were rejected after IPv6 literals, so
this restores consistency. We could do only the former, but at this
point 2) is free and might help with manually constructed Host values
(or if we get something wrong in Parse).

Note that net.SplitHostPort and net.Dial explicitly accept service names
in place of port numbers, but this is an URL package, and RFC 3986,
Section 3.2.3, clearly specifies ports as a number in decimal.

net/http uses a mix of net.SplitHostPort and url.Parse that would
deserve looking into, but in general it seems that it will still accept
service names in Addr fields as they are passed to net.Listen, while
rejecting them in URLs, which feels correct.

This leaves a number of invalid URLs to reject, which however are not
security relevant once the two invariants above hold, so can be done in
Go 1.14: IPv6 literals without brackets (#31024), invalid IPv6 literals,
hostnames with invalid characters, and more.

Tested with 200M executions of go-fuzz and the following Fuzz function.

	u, err := url.Parse(string(data))
	if err != nil {
		return 0
	}
	h := u.Hostname()
	p := u.Port()

	switch u.Host {
	case h + ":" + p:
		return 1
	case "[" + h + "]:" + p:
		return 1
	case h:
		fallthrough
	case "[" + h + "]":
		if p != "" {
			panic("unexpected Port()")
		}
		return 1
	}
	panic("Host is not a variant of [Hostname]:Port")

Fixes CVE-2019-14809
Updates #29098

Change-Id: I7ef40823dab28f29511329fa2d5a7fb10c3ec895
Reviewed-on: https://go-review.googlesource.com/c/go/+/189258
Reviewed-by: Ian Lance Taylor <iant@golang.org>
(cherry picked from commit 61bb56a)
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/526408
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
(cherry picked from commit 3226f2d)
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/526409

gopherbot pushed a commit that referenced this issue Aug 13, 2019

[release-branch.go1.12-security] net/url: make Hostname and Port pred…
…ictable for invalid Host values

When Host is not valid per RFC 3986, the behavior of Hostname and Port
was wildly unpredictable, to the point that Host could have a suffix
that didn't appear in neither Hostname nor Port.

This is a security issue when applications are applying checks to Host
and expecting them to be meaningful for the contents of Hostname.

To reduce disruption, this change only aims to guarantee the following
two security-relevant invariants.

* Host is either Hostname or [Hostname] with Port empty, or
  Hostname:Port or [Hostname]:Port.

* Port is only decimals.

The second invariant is the one that's most likely to cause disruption,
but I believe it's important, as it's conceivable an application might
do a suffix check on Host and expect it to be meaningful for the
contents of Hostname (if the suffix is not a valid port).

There are three ways to ensure it.

1) Reject invalid ports in Parse. Note that non-numeric ports are
   already rejected if and only if the host starts with "[".

2) Consider non-numeric ports as part of Hostname, not Port.

3) Allow non-numeric ports, and hope they only flow down to net/http,
   which will reject them (#14353).

This change adopts both 1 and 2. We could do only the latter, but then
these invalid hosts would flow past port checks, like in
http_test.TestTransportRejectsAlphaPort. Non-numeric ports weren't fully
supported anyway, because they were rejected after IPv6 literals, so
this restores consistency. We could do only the former, but at this
point 2) is free and might help with manually constructed Host values
(or if we get something wrong in Parse).

Note that net.SplitHostPort and net.Dial explicitly accept service names
in place of port numbers, but this is an URL package, and RFC 3986,
Section 3.2.3, clearly specifies ports as a number in decimal.

net/http uses a mix of net.SplitHostPort and url.Parse that would
deserve looking into, but in general it seems that it will still accept
service names in Addr fields as they are passed to net.Listen, while
rejecting them in URLs, which feels correct.

This leaves a number of invalid URLs to reject, which however are not
security relevant once the two invariants above hold, so can be done in
Go 1.14: IPv6 literals without brackets (#31024), invalid IPv6 literals,
hostnames with invalid characters, and more.

Tested with 200M executions of go-fuzz and the following Fuzz function.

	u, err := url.Parse(string(data))
	if err != nil {
		return 0
	}
	h := u.Hostname()
	p := u.Port()

	switch u.Host {
	case h + ":" + p:
		return 1
	case "[" + h + "]:" + p:
		return 1
	case h:
		fallthrough
	case "[" + h + "]":
		if p != "" {
			panic("unexpected Port()")
		}
		return 1
	}
	panic("Host is not a variant of [Hostname]:Port")

Fixes CVE-2019-14809
Updates #29098

Change-Id: I7ef40823dab28f29511329fa2d5a7fb10c3ec895
Reviewed-on: https://go-review.googlesource.com/c/go/+/189258
Reviewed-by: Ian Lance Taylor <iant@golang.org>
(cherry picked from commit 61bb56a)
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/526408
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>

mholt added a commit to caddyserver/caddy that referenced this issue Aug 13, 2019

ernado added a commit to gortc/stun that referenced this issue Aug 14, 2019

@mikesamuel

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2019

Is net/url based on STD66 or https://url.spec.whatwg.org/ ?

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

@mikesamuel STD66. It cites RFC 3986 and RFC 2396. Is it relevant here? Do either allow non-numeric ports?

@mikesamuel

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

@FiloSottile

I don't know that they differ around ports.
IIRC, url.spec does differ around javascript://example.com/%0aalert(1).

The Appendix B regex is perfectly happy with non-numeric ports, so it really depends on whether you're strictly matching the grammar or doing approximate decomposition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.