Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: username in authority is not strictly in RFC3986 #22655

Open
mattn opened this issue Nov 10, 2017 · 2 comments

Comments

Projects
None yet
4 participants
@mattn
Copy link
Member

commented Nov 10, 2017

What version of Go are you using (go version)?

go version devel +5a5223297a Wed Nov 1 11:43:41 2017 -0700 windows/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

Windows 7 64bit

What did you do?

package main

import (
	"fmt"
	"log"
	"net/url"
)

func main() {
	u, err := url.Parse("http://foo@evil.com:80@google.com/")
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%#v\n", u)
	fmt.Printf("%#v\n", u.User)
}

https://play.golang.org/p/HFm27EmRPU

What did you expect to see?

error should be returned

What did you see instead?

no errors.

https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf

In this slide, some URL parsers are mentioned. And seems to be different from cURL. RFC3986 says username is filled with unreserved / pct-encoded / sub-delims.

userinfo      = *( unreserved / pct-encoded / sub-delims / ":" )

pct-encoded   = "%" HEXDIG HEXDIG

sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

https://tools.ietf.org/html/rfc3986

And whatwg-url says

If the @ flag is set, prepend "%40" to buffer.

https://url.spec.whatwg.org/#authority-state

Go's implementation find @ in authority with using strings.LastIndex.

i := strings.LastIndex(authority, "@")

If implementation should be strictly in RFC3986 and whatwg-url, multiple @ should be treated as error, I think.

related issue #3439

@bradfitz bradfitz added this to the Unplanned milestone Nov 10, 2017

@opennota

This comment has been minimized.

Copy link

commented Nov 10, 2017

FWIW, node.js accepts this userinfo:

$ node -e 'console.log(require("url").parse("http://foo@evil.com:80@google.com/"))'

Url {
  protocol: 'http:',
  slashes: true,
  auth: 'foo@evil.com:80',
  host: 'google.com',
  port: null,
  hostname: 'google.com',
  hash: null,
  search: null,
  query: null,
  pathname: '/',
  path: '/',
  href: 'http://foo%40evil.com:80@google.com/' }
@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 22, 2018

This is tricky behavior. I think it would be an undeniable bug if we would have parsed "evil.com" as the hostname instead of google.com. As is evident from slide 34 of https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf.

But we don't. And so does node and python.

>>> from urllib.parse import urlparse
>>> o = urlparse('http://foo@evil.com:80@google.com/')
>>> o.username
'foo@evil.com'
>>> o.password
'80'
>>> o.hostname
'google.com'

Now if we follow the RFC to the letter, a @ is indeed not part of sub-delims and it is valid to throw an error. But at this point, I am not sure whether it is worth it or not. We have no idea how many different types of urls are there in the wild and suddenly returning an error instead of parsing it in a safe manner, might start breaking codebases.

How about we just document this behavior ?

WDYT @bradfitz ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.