Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: URL.Parse Multiple Parsing Issues #29098

Open
wir3less opened this Issue Dec 4, 2018 · 18 comments

Comments

Projects
None yet
8 participants
@wir3less
Copy link

commented Dec 4, 2018

What version of Go are you using (go version)?

$ go version
go version go1.11.2 windows/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\wir3less\AppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\wir3less\go
set GOPROXY=
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=C:\Users\wir3less\AppData\Local\Temp\go-build829182294=/tmp/go-build -gno-record-gcc-switches

What did you do?

While playing around with URL.Parse I found a few problems I'd like to share.
I'll gladly share more details if anything is unclear or if someone is interested.

Normally, javascript:alert(1) when parsed by url.parse has no Hostname()
But javascript://alert(1) has a hostname of alert(1)
This can be taken further...
javascript://%250aalert(1)+'aa@google.com/a'a has a hostname of google.com and will pop an alert if relocated to by a browser (after decoding)

IPV6 support also has it's issues...
this URL http://[google.com]:80 has the hostname of google.com
But also do all of these:
http://google.com]:80
http://google.com]:80__Anything_you'd_like_sir
http://[google.com]FreeTextZoneHere]:80

Even without thinking about how this would interact with other systems and parsers,
Just considering code used URL hostname validations and Go's https functions (http.Get() for instance) leveraging url.parse should explain how this could be used maliciously.

Again, will be glad to provide more details if needed.
All POCs can be found here
https://play.golang.org/p/UoqEcxCFY8z

What did you expect to see?

Errors for most of it...

What did you see instead?

Hostnames

@ALTree ALTree changed the title Net/URL: URL.Parse Multiple Parsing Issues net/url: URL.Parse Multiple Parsing Issues Dec 4, 2018

@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 5, 2018

/cc @bradfitz

@mengzhuo

This comment has been minimized.

Copy link
Contributor

commented Dec 5, 2018

What did you see instead?
Hostnames

I think an invalid error is more appropriate.

@wir3less

This comment has been minimized.

Copy link
Author

commented Dec 5, 2018

I agree. Thats what I wrote on the "Expected" field.
Recieving a Hostname is what currently happens.

@mengzhuo

This comment has been minimized.

Copy link
Contributor

commented Dec 5, 2018

I've take a look into url_test.go

go/src/net/url/url_test.go

Lines 423 to 432 in 5e17278

// worst case host, still round trips
{
"scheme://!$&'()*+,;=hello!:port/path",
&URL{
Scheme: "scheme",
Host: "!$&'()*+,;=hello!:port",
Path: "/path",
},
"",
},

You can see this is design on purpose.

@wir3less

This comment has been minimized.

Copy link
Author

commented Dec 6, 2018

I see...
Well this is still the wrong behaviour, both in my opinion as well as by the spec.
The fact that we have tests for it doesn't make it right...
Try testing other parsers, non will allow that kind of parsing

@mees-

This comment has been minimized.

Copy link

commented Dec 27, 2018

I don't know if this is the right place to mention this but url.Parse("localhost") sets the Path to "localhost" and the host is empty. I believe this is also an error in the parsing

@fraenkel

This comment has been minimized.

Copy link
Contributor

commented Dec 27, 2018

@mees- Why would that be an error? You have provided a valid relative path.

@mees-

This comment has been minimized.

Copy link

commented Dec 27, 2018

I'm sorry, I misunderstood the documentation

@wir3less

This comment has been minimized.

Copy link
Author

commented Dec 28, 2018

Can I provide any more info to get this issue resolved? No point in leaving this issue open for nothing...

@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 29, 2018

Try testing other parsers, non will allow that kind of parsing

If you can take each case, and show a comparison of the behavior in Python, Node and Ruby, it will help us understand the situation better.

@wir3less

This comment has been minimized.

Copy link
Author

commented Jan 1, 2019

So I've taken the time and compiled the following table
https://docs.google.com/spreadsheets/d/1HNyNO6dVYNhdsd_oLZELw4L17L3hK0r2OcZZv1lrx3Y/edit?usp=sharing

You can notice that for the Javascript URLs, Chrome is the only one actually following the spec, and python is doing a half decent job (has hostname, but ignored http specific chars like '@' in non http schemes.)

For the IPv6 URLs, you can see that Go is by far the most permissive parser.

Please keep in mind that all these vectors can be used to trick and bypass code trying to validate the host of a user-supplied URL.

I'm also going to open bugs for Node and Ruby as they are also in risk of that.

@agnivade

This comment has been minimized.

Copy link
Member

commented Jan 1, 2019

Thank you for spending time on this !

@wir3less

This comment has been minimized.

Copy link
Author

commented Jan 1, 2019

I dug a little deeper on the Node cases.
I used node 10.13.0 for my testing, apparently, there's already 10.15.0
One of the patches is of interest to this issue - https://nodejs.org/en/blog/vulnerability/november-2018-security-releases/#hostname-spoofing-in-url-parser-for-javascript-protocol-cve-2018-12123

So using url.parse() is fixed on Node now, these problems still exist it if you use the new URL() syntax

> new URL("javascript://google.com").host
'google.com'
> url.parse("javascript://google.com").host
null
@wir3less

This comment has been minimized.

Copy link
Author

commented Jan 9, 2019

Guys,
Any update here?

@bradfitz bradfitz added this to the Go1.13 milestone Jan 9, 2019

@opennota

This comment has been minimized.

Copy link

commented Jan 10, 2019

   javascript://alert(1) has a hostname of alert(1)
   javascript://%250aalert(1)+'aa@google.com/a'a has a hostname of google.com and will pop an alert if relocated to by a browser (after decoding)

I don't see an issue here. If you're using unescaped and unsanitized data in HTML code, then nothing would help you. And whitelisting is not an option here, while blacklisting isn't a solution as it always can be bypassed by a malicious.actor.

@wir3less

This comment has been minimized.

Copy link
Author

commented Jan 10, 2019

Thanks @bradfitz .

@opennota - The URL parser is meant to deal with such data.
You're supposed to be able to pass user-data into it, and it should return an error in case the URL is malformed.
Otherwise, it should return a safe object that correctly represents the URL. Meaning, for example, no Hostname for hostless schemes such as JS.

As for the blacklist approach, I think a piece of code such as the one implemented by Node, should do the trick:
https://github.com/nodejs/node/blob/master/lib/url.js#L277

@wir3less

This comment has been minimized.

Copy link
Author

commented Jan 13, 2019

@bradfitz I just realized that go releases a major version every 6 months, and that 1.13 will be released on August 19.
Does that mean this issue will wait till then to get resolved?

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jan 14, 2019

At least, if ever. We don't have a great history of being able to make changes to the URL type without breaking code. So it might need to remain unchanged (or at least only documented). But I can't say because I haven't looked into this bug at all yet or your spreadsheet. But thanks for gathering data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.