Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: make URL parsing return an error on IPv6 literal without brackets #31024

Open
noneymous opened this issue Mar 25, 2019 · 16 comments
Open
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@noneymous
Copy link

noneymous commented Mar 25, 2019

What version of Go are you using (go version)?

go version go1.11.2 windows/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GOARCH=amd64
set GOHOSTARCH=amd64
set GOHOSTOS=windows

What did you do?

url.URL provides the methods .Hostname() and .Port() which schould split the URL's host (.Host) into its address and port part. In certain cases, these functions are not able to interpret IPv6 addresses correctly, leading to invalid output.

Here is a test function feeding different sample URLs, demonstrating the issue (all test URLs are valid and should succeed!):

func TestUrl(t *testing.T) {
	tests := []struct {
		name        string
		input string
		wantHost     string
		wantPort string
	}{
		{"domain-unknonw-scheme", "asfd://localhost/?q=0&p=1#frag", "localhost", ""},
		{"domain-implicit", "http://localhost/?q=0&p=1#frag", "localhost", ""},
		{"domain-explicit", "http://localhost:80/?q=0&p=1#frag", "localhost", "80"},
		{"domain-explicit-other-port", "http://localhost:80/?q=0&p=1#frag", "localhost", "80"},

		{"ipv4-implicit", "http://127.0.0.1/?q=0&p=1#frag", "127.0.0.1", ""},
		{"ipv4-explicit", "http://127.0.0.1:80/?q=0&p=1#frag", "127.0.0.1", "80"},
		{"ipv4-explicit-other-port", "http://127.0.0.1:80/?q=0&p=1#frag", "127.0.0.1", "80"},

		{"ipv6-explicit", "http://[1::]:80/?q=0&p=1#frag", "1::", "80"},
		{"ipv6-explicit-other-port", "http://[1::]:80/?q=0&p=1#frag", "1::", "80"},

		{"ipv6-implicit-1", "http://[1::]/?q=0&p=1#frag", "1::", ""},

		{"ipv6-implicit-2", "http://1::/?q=0&p=1#frag", "1::", ""},
		{"ipv6-implicit-3", "http://1::2008/?q=0&p=1#frag", "1::2008", ""},
		{"ipv6-implicit-4", "http://1::2008:1/?q=0&p=1#frag", "1::2008:1", ""},
	}
	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			// Prepare URL
			u, err := url.Parse(tt.input)
			if err != nil {
				t.Errorf("could not parse url: %v", err)
				return
			}

			// Extract hostname and port
			host := u.Hostname()
			port := u.Port()

			// Compare result
			if host != tt.wantHost {
				t.Errorf("TestUrl() got = %v, want %v", host, tt.wantHost)
			}
			if port != tt.wantPort {
				t.Errorf("TestUrl() got1 = %v, want %v", port, tt.wantPort)
			}
		})
	}
}

Output:
grafik

What did you expect to see?

All sample URLs in the test cases above are valid ones, hence, all tests should succeed as defined

What did you see instead?

The top test samples work as expected, however, the bottom three return incorrect results. The bottom three samples are valid IPv6 URLs with inplicit port specification, but .Hostname() and .Port() interpres them as IPv4 addresses returning parts of the IPv6 address as if it was the explicit port of an IPv4 input. E.g., in one of the test outputs, ":2008" is returned as the port, but it is actually part of the IPv6 address.

Where is the bug?

.Hostname() and .Port() implement their own logic to split the port from the hostname. I've found that there is already a close function in the net package, called net.SplitHostPort(), which does it's job correctly. If .Hostname() and .Port() just called that function instead of re-implementing the logic, everything should work as expected. Below is the proof in form of a test function with exactly the same inputs as above, but using net.SplitHostPort() instead of .Hostname() / .Port().

func TestUrlNetSplit(t *testing.T) {
	tests := []struct {
		name        string
		input string
		wantHost     string
		wantPort string
	}{
		{"domain-unknonw-scheme", "asfd://localhost/?q=0&p=1#frag", "localhost", ""},
		{"domain-implicit", "http://localhost/?q=0&p=1#frag", "localhost", ""},
		{"domain-explicit", "http://localhost:80/?q=0&p=1#frag", "localhost", "80"},
		{"domain-explicit-other-port", "http://localhost:80/?q=0&p=1#frag", "localhost", "80"},

		{"ipv4-implicit", "http://127.0.0.1/?q=0&p=1#frag", "127.0.0.1", ""},
		{"ipv4-explicit", "http://127.0.0.1:80/?q=0&p=1#frag", "127.0.0.1", "80"},
		{"ipv4-explicit-other-port", "http://127.0.0.1:80/?q=0&p=1#frag", "127.0.0.1", "80"},

		{"ipv6-explicit", "http://[1::]:80/?q=0&p=1#frag", "1::", "80"},
		{"ipv6-explicit-other-port", "http://[1::]:80/?q=0&p=1#frag", "1::", "80"},

		{"ipv6-implicit-1", "http://[1::]/?q=0&p=1#frag", "1::", ""},

		{"ipv6-implicit-2", "http://1::/?q=0&p=1#frag", "1::", ""},
		{"ipv6-implicit-3", "http://1::2008/?q=0&p=1#frag", "1::2008", ""},
		{"ipv6-implicit-4", "http://1::2008:1/?q=0&p=1#frag", "1::2008:1", ""},
	}
	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			// Prepare URL
			u, err := url.Parse(tt.input)
			if err != nil {
				t.Errorf("could not parse url: %v", err)
				return
			}

			host, port, err := net.SplitHostPort(u.Host)
			if err != nil {
				// Could not split port from host, as there is no port specified
				host = u.Host
				port = ""
			}

			// Compare result
			if host != tt.wantHost {
				t.Errorf("TestUrl() got = %v, want %v", host, tt.wantHost)
			}
			if port != tt.wantPort {
				t.Errorf("TestUrl() got1 = %v, want %v", port, tt.wantPort)
			}
		})
	}
}

Output:
grafik

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 24, 2019
@dmitshur dmitshur added this to the Go1.13 milestone Apr 24, 2019
@dmitshur
Copy link
Contributor

/cc @rsc @bradfitz (per net/url owners)

@bradfitz
Copy link
Contributor

Your ipv6-implicit-2, ipv6-implicit-3, and ipv6-implicit-4 aren't valid URLs. IPv6 literals need to be in square brackets.

@bradfitz
Copy link
Contributor

Closing for now. Let me know if I'm wrong, though.

@noneymous
Copy link
Author

Thank you for your response @bradfitz !

The rfc2732 recommends the use of brackets in all cases. However, it is not mandatory and can be technically omitted if no explicit port needs to be defined.

The thing is: In many cases the URL might not be built by the developer, but be received from some other source. Using url.Parse() is already an small indicator that URLs are received and parsed rather than built by the developer.

One common example, would be parsing URLs returned by a website. The website is free to return any URL format (can be decided by the web application developer). Relative URLs but also absolute ones. If the website returns absolute URLs, they might not follow the recommendation of the RFC but omit the brackets. url.Parse() would not return an error but garbage difficult to catch.

Sanitizing every received URL before parsing would be a fix, but unnecessasry overhead. It would be related to re-implementing the url.Parse() function.

The following might be a less valid argument in the views of some, but there is already the net.SplitHostPort() function in a closeby package. It is a little more forgiving and having duplicate code might not be the best practice either.

It would be also okay if parsing "http://1::2008" and calling .Hostname() returned "[1::2008]" in compliance with the RFC's suggestion for

rfc2732 : https://www.ietf.org/rfc/rfc2732.txt
definition of "should": https://tools.ietf.org/html/rfc2119

@bradfitz
Copy link
Contributor

Well, that SHOULD is news to me. I thought it was a requirement. I think it might've been upgraded to a requirement in later (post 1999) RFCs.

In any case,

3. SHOULD   This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.

Who decided that the valid reasons were to not use square brackets? Did they carefully weigh that libraries (such as Go) might reject them?

Is there any other software today that generates them? That is, how did this bug come about? Is this real or hypothetical?

@noneymous
Copy link
Author

noneymous commented Apr 25, 2019

It was hypothetical based on the RFC's definition and that IPv6 addresses go without brackets by default unless a specific endpoint (port) is to be specified.

However, I now tried how browsers (Chrome, FF, IE, Edge) react on IPv6 URLs without brackets. It seems they don't accept it and understand such input as search terms. Obviously, the general understanding is that brackets are mandatory for IPv6 URLs. Hence, no working website could apply it without. Web developers could build it but it wouldn't be useful.

If being forgiving is not desired in this case, I'm suggesting, that url.Parse() throws an error in order to prevent from continuing with garbage data! See garbage output in sample below.

func main() {
	u, err := url.Parse("http://1::12:2008")
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(u.Hostname()) // => 1
	fmt.Println(u.Port()) // => :12:2008
}

(If the port is not empty or an integer, something went bad.)

@bradfitz bradfitz changed the title net/url: Host/Port extraction malfunctions with IPv6 addresses net/url: make URL parsing return an error on IPv6 literal without brackets Apr 25, 2019
@bradfitz bradfitz reopened this Apr 25, 2019
@bradfitz bradfitz added NeedsFix The path to resolution is known, but the work has not been done. help wanted labels Apr 25, 2019
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 25, 2019
@bradfitz
Copy link
Contributor

Okay, I've repurposed this bug to be about rejecting them instead.

@philandstuff
Copy link

I'm pretty sure that RFC 3986 (which obsoletes RFC 2732) requires square brackets for IPv6 literals. The ABNF rules defined in RFC 3986 only allow an IPv6address inside square brackets, and there is nothing in the text to suggest that a reg-name that looks like an IPv6address should be interpreted that way.

This silence on IPv6 addresses without square brackets is notable, because the text does acknowledge the ambiguity between reg-name and IPv4address:

The syntax rule for host is ambiguous because it does not completely
distinguish between an IPv4address and a reg-name. In order to
disambiguate the syntax, we apply the "first-match-wins" algorithm:
If host matches the rule for IPv4address, then it should be
considered an IPv4 address literal and not a reg-name.

@noneymous
Copy link
Author

noneymous commented Apr 28, 2019

Well, an IPv6 address cannot be interpreted as a hostname due to the colons, in contrast to an IPv4 address. Anyways, it seems clear now that brackets are required to be compliant.

Any suggestions how to make the function detect and reject mentioned invalid URLs? Validating the host part might be error prone because its content could be IPv4, IPv6 or a hostname. The error actually materializes in the port part. So I thought, a test whether the port is either empty or a valid integer would catch the issue. It would also catch potential other issues not directly related with IPv6 and brackets. But I can't help it feels too cheap.

@philandstuff
Copy link

I would suggest that RFC 3986 provides the solution - it is the literal source of truth for what can be a valid URL. That does mean parsing it as IPv4, IPv6, IPvFuture (yes, that's a thing), or a reg-name.

I note also that currently, net/url thinks that https://[::y] is a valid URL, which is another non-RFC-3986-compliant URL string.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/184117 mentions this issue: net/url: Parse returns an error on IPv6 literal without brackets

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/189258 mentions this issue: net/url: make Hostname and Port predictable for invalid Host values

gopherbot pushed a commit that referenced this issue Aug 12, 2019
When Host is not valid per RFC 3986, the behavior of Hostname and Port
was wildly unpredictable, to the point that Host could have a suffix
that didn't appear in neither Hostname nor Port.

This is a security issue when applications are applying checks to Host
and expecting them to be meaningful for the contents of Hostname.

To reduce disruption, this change only aims to guarantee the following
two security-relevant invariants.

* Host is either Hostname or [Hostname] with Port empty, or
  Hostname:Port or [Hostname]:Port.

* Port is only decimals.

The second invariant is the one that's most likely to cause disruption,
but I believe it's important, as it's conceivable an application might
do a suffix check on Host and expect it to be meaningful for the
contents of Hostname (if the suffix is not a valid port).

There are three ways to ensure it.

1) Reject invalid ports in Parse. Note that non-numeric ports are
   already rejected if and only if the host starts with "[".

2) Consider non-numeric ports as part of Hostname, not Port.

3) Allow non-numeric ports, and hope they only flow down to net/http,
   which will reject them (#14353).

This change adopts both 1 and 2. We could do only the latter, but then
these invalid hosts would flow past port checks, like in
http_test.TestTransportRejectsAlphaPort. Non-numeric ports weren't fully
supported anyway, because they were rejected after IPv6 literals, so
this restores consistency. We could do only the former, but at this
point 2) is free and might help with manually constructed Host values
(or if we get something wrong in Parse).

Note that net.SplitHostPort and net.Dial explicitly accept service names
in place of port numbers, but this is an URL package, and RFC 3986,
Section 3.2.3, clearly specifies ports as a number in decimal.

net/http uses a mix of net.SplitHostPort and url.Parse that would
deserve looking into, but in general it seems that it will still accept
service names in Addr fields as they are passed to net.Listen, while
rejecting them in URLs, which feels correct.

This leaves a number of invalid URLs to reject, which however are not
security relevant once the two invariants above hold, so can be done in
Go 1.14: IPv6 literals without brackets (#31024), invalid IPv6 literals,
hostnames with invalid characters, and more.

Tested with 200M executions of go-fuzz and the following Fuzz function.

	u, err := url.Parse(string(data))
	if err != nil {
		return 0
	}
	h := u.Hostname()
	p := u.Port()

	switch u.Host {
	case h + ":" + p:
		return 1
	case "[" + h + "]:" + p:
		return 1
	case h:
		fallthrough
	case "[" + h + "]":
		if p != "" {
			panic("unexpected Port()")
		}
		return 1
	}
	panic("Host is not a variant of [Hostname]:Port")

Fixes CVE-2019-14809
Updates #29098

Change-Id: I7ef40823dab28f29511329fa2d5a7fb10c3ec895
Reviewed-on: https://go-review.googlesource.com/c/go/+/189258
Reviewed-by: Ian Lance Taylor <iant@golang.org>
gopherbot pushed a commit that referenced this issue Aug 13, 2019
…ictable for invalid Host values

When Host is not valid per RFC 3986, the behavior of Hostname and Port
was wildly unpredictable, to the point that Host could have a suffix
that didn't appear in neither Hostname nor Port.

This is a security issue when applications are applying checks to Host
and expecting them to be meaningful for the contents of Hostname.

To reduce disruption, this change only aims to guarantee the following
two security-relevant invariants.

* Host is either Hostname or [Hostname] with Port empty, or
  Hostname:Port or [Hostname]:Port.

* Port is only decimals.

The second invariant is the one that's most likely to cause disruption,
but I believe it's important, as it's conceivable an application might
do a suffix check on Host and expect it to be meaningful for the
contents of Hostname (if the suffix is not a valid port).

There are three ways to ensure it.

1) Reject invalid ports in Parse. Note that non-numeric ports are
   already rejected if and only if the host starts with "[".

2) Consider non-numeric ports as part of Hostname, not Port.

3) Allow non-numeric ports, and hope they only flow down to net/http,
   which will reject them (#14353).

This change adopts both 1 and 2. We could do only the latter, but then
these invalid hosts would flow past port checks, like in
http_test.TestTransportRejectsAlphaPort. Non-numeric ports weren't fully
supported anyway, because they were rejected after IPv6 literals, so
this restores consistency. We could do only the former, but at this
point 2) is free and might help with manually constructed Host values
(or if we get something wrong in Parse).

Note that net.SplitHostPort and net.Dial explicitly accept service names
in place of port numbers, but this is an URL package, and RFC 3986,
Section 3.2.3, clearly specifies ports as a number in decimal.

net/http uses a mix of net.SplitHostPort and url.Parse that would
deserve looking into, but in general it seems that it will still accept
service names in Addr fields as they are passed to net.Listen, while
rejecting them in URLs, which feels correct.

This leaves a number of invalid URLs to reject, which however are not
security relevant once the two invariants above hold, so can be done in
Go 1.14: IPv6 literals without brackets (#31024), invalid IPv6 literals,
hostnames with invalid characters, and more.

Tested with 200M executions of go-fuzz and the following Fuzz function.

	u, err := url.Parse(string(data))
	if err != nil {
		return 0
	}
	h := u.Hostname()
	p := u.Port()

	switch u.Host {
	case h + ":" + p:
		return 1
	case "[" + h + "]:" + p:
		return 1
	case h:
		fallthrough
	case "[" + h + "]":
		if p != "" {
			panic("unexpected Port()")
		}
		return 1
	}
	panic("Host is not a variant of [Hostname]:Port")

Fixes CVE-2019-14809
Updates #29098

Change-Id: I7ef40823dab28f29511329fa2d5a7fb10c3ec895
Reviewed-on: https://go-review.googlesource.com/c/go/+/189258
Reviewed-by: Ian Lance Taylor <iant@golang.org>
(cherry picked from commit 61bb56a)
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/526408
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
(cherry picked from commit 3226f2d)
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/526409
gopherbot pushed a commit that referenced this issue Aug 13, 2019
…ictable for invalid Host values

When Host is not valid per RFC 3986, the behavior of Hostname and Port
was wildly unpredictable, to the point that Host could have a suffix
that didn't appear in neither Hostname nor Port.

This is a security issue when applications are applying checks to Host
and expecting them to be meaningful for the contents of Hostname.

To reduce disruption, this change only aims to guarantee the following
two security-relevant invariants.

* Host is either Hostname or [Hostname] with Port empty, or
  Hostname:Port or [Hostname]:Port.

* Port is only decimals.

The second invariant is the one that's most likely to cause disruption,
but I believe it's important, as it's conceivable an application might
do a suffix check on Host and expect it to be meaningful for the
contents of Hostname (if the suffix is not a valid port).

There are three ways to ensure it.

1) Reject invalid ports in Parse. Note that non-numeric ports are
   already rejected if and only if the host starts with "[".

2) Consider non-numeric ports as part of Hostname, not Port.

3) Allow non-numeric ports, and hope they only flow down to net/http,
   which will reject them (#14353).

This change adopts both 1 and 2. We could do only the latter, but then
these invalid hosts would flow past port checks, like in
http_test.TestTransportRejectsAlphaPort. Non-numeric ports weren't fully
supported anyway, because they were rejected after IPv6 literals, so
this restores consistency. We could do only the former, but at this
point 2) is free and might help with manually constructed Host values
(or if we get something wrong in Parse).

Note that net.SplitHostPort and net.Dial explicitly accept service names
in place of port numbers, but this is an URL package, and RFC 3986,
Section 3.2.3, clearly specifies ports as a number in decimal.

net/http uses a mix of net.SplitHostPort and url.Parse that would
deserve looking into, but in general it seems that it will still accept
service names in Addr fields as they are passed to net.Listen, while
rejecting them in URLs, which feels correct.

This leaves a number of invalid URLs to reject, which however are not
security relevant once the two invariants above hold, so can be done in
Go 1.14: IPv6 literals without brackets (#31024), invalid IPv6 literals,
hostnames with invalid characters, and more.

Tested with 200M executions of go-fuzz and the following Fuzz function.

	u, err := url.Parse(string(data))
	if err != nil {
		return 0
	}
	h := u.Hostname()
	p := u.Port()

	switch u.Host {
	case h + ":" + p:
		return 1
	case "[" + h + "]:" + p:
		return 1
	case h:
		fallthrough
	case "[" + h + "]":
		if p != "" {
			panic("unexpected Port()")
		}
		return 1
	}
	panic("Host is not a variant of [Hostname]:Port")

Fixes CVE-2019-14809
Updates #29098

Change-Id: I7ef40823dab28f29511329fa2d5a7fb10c3ec895
Reviewed-on: https://go-review.googlesource.com/c/go/+/189258
Reviewed-by: Ian Lance Taylor <iant@golang.org>
(cherry picked from commit 61bb56a)
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/526408
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@Neustradamus
Copy link

Have you progressed on this bug?

@dekimsey
Copy link

Not to go off-tangent too much. But what is the intent in url.Parse on strings that have no scheme? I came across this in a library that was using url.Parse on ipv4 addresses and would fail when given implicit or explicit ipv6 addresses. The library should not have been using url.Parse on plain ip address strings, but it did make me wonder why net.Parse was not returning an error.

For example, url.Parse("127.0.0.1") parses, but returns url.URL{Path: "127.0.0.1"} which struck me as nonsensical. When I checked RFC3986 Section 3.1 and Section 1.1.1 I got the impression scheme has to exist to be a valid URL.

The behavior I'm looking at is demonstrated here: https://play.golang.org/p/MzW9lCD7Min

@noneymous
Copy link
Author

noneymous commented Aug 19, 2021

Not to go off-tangent too much. But what is the intent in url.Parse on strings that have no scheme?

I'm using url.Parse to parse hrefs extracted from web sites, but they can be defined in various ways (and are seldomly with scheme):

  • absolute: "/app/page"
  • relative: "page" (without leading slash) meaning "/app/page" if the current browser location is "app"
  • full URL: "https://www.domain.tld/app/page"

Different web developers or content management systems may chose different variants and or a combination for reasons.

@slrz
Copy link

slrz commented Aug 19, 2021

For example, url.Parse("127.0.0.1") parses, but returns url.URL{Path: "127.0.0.1"} which struck me as nonsensical. When I checked RFC3986 Section 3.1 and Section 1.1.1 I got the impression scheme has to exist to be a valid URL.

5.2. Relative Resolution allows those as URI references. The ResolveReference method on url.URL strongly hints at the intention to support this in net/url.

The parsing of 127.0.0.1 seems to be correct as per the above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

10 participants