Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: make URL parsing return an error on IPv6 literal without brackets #31024

Open
noneymous opened this issue Mar 25, 2019 · 10 comments

Comments

Projects
None yet
5 participants
@noneymous
Copy link

commented Mar 25, 2019

What version of Go are you using (go version)?

go version go1.11.2 windows/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GOARCH=amd64
set GOHOSTARCH=amd64
set GOHOSTOS=windows

What did you do?

url.URL provides the methods .Hostname() and .Port() which schould split the URL's host (.Host) into its address and port part. In certain cases, these functions are not able to interpret IPv6 addresses correctly, leading to invalid output.

Here is a test function feeding different sample URLs, demonstrating the issue (all test URLs are valid and should succeed!):

func TestUrl(t *testing.T) {
	tests := []struct {
		name        string
		input string
		wantHost     string
		wantPort string
	}{
		{"domain-unknonw-scheme", "asfd://localhost/?q=0&p=1#frag", "localhost", ""},
		{"domain-implicit", "http://localhost/?q=0&p=1#frag", "localhost", ""},
		{"domain-explicit", "http://localhost:80/?q=0&p=1#frag", "localhost", "80"},
		{"domain-explicit-other-port", "http://localhost:80/?q=0&p=1#frag", "localhost", "80"},

		{"ipv4-implicit", "http://127.0.0.1/?q=0&p=1#frag", "127.0.0.1", ""},
		{"ipv4-explicit", "http://127.0.0.1:80/?q=0&p=1#frag", "127.0.0.1", "80"},
		{"ipv4-explicit-other-port", "http://127.0.0.1:80/?q=0&p=1#frag", "127.0.0.1", "80"},

		{"ipv6-explicit", "http://[1::]:80/?q=0&p=1#frag", "1::", "80"},
		{"ipv6-explicit-other-port", "http://[1::]:80/?q=0&p=1#frag", "1::", "80"},

		{"ipv6-implicit-1", "http://[1::]/?q=0&p=1#frag", "1::", ""},

		{"ipv6-implicit-2", "http://1::/?q=0&p=1#frag", "1::", ""},
		{"ipv6-implicit-3", "http://1::2008/?q=0&p=1#frag", "1::2008", ""},
		{"ipv6-implicit-4", "http://1::2008:1/?q=0&p=1#frag", "1::2008:1", ""},
	}
	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			// Prepare URL
			u, err := url.Parse(tt.input)
			if err != nil {
				t.Errorf("could not parse url: %v", err)
				return
			}

			// Extract hostname and port
			host := u.Hostname()
			port := u.Port()

			// Compare result
			if host != tt.wantHost {
				t.Errorf("TestUrl() got = %v, want %v", host, tt.wantHost)
			}
			if port != tt.wantPort {
				t.Errorf("TestUrl() got1 = %v, want %v", port, tt.wantPort)
			}
		})
	}
}

Output:
grafik

What did you expect to see?

All sample URLs in the test cases above are valid ones, hence, all tests should succeed as defined

What did you see instead?

The top test samples work as expected, however, the bottom three return incorrect results. The bottom three samples are valid IPv6 URLs with inplicit port specification, but .Hostname() and .Port() interpres them as IPv4 addresses returning parts of the IPv6 address as if it was the explicit port of an IPv4 input. E.g., in one of the test outputs, ":2008" is returned as the port, but it is actually part of the IPv6 address.

Where is the bug?

.Hostname() and .Port() implement their own logic to split the port from the hostname. I've found that there is already a close function in the net package, called net.SplitHostPort(), which does it's job correctly. If .Hostname() and .Port() just called that function instead of re-implementing the logic, everything should work as expected. Below is the proof in form of a test function with exactly the same inputs as above, but using net.SplitHostPort() instead of .Hostname() / .Port().

func TestUrlNetSplit(t *testing.T) {
	tests := []struct {
		name        string
		input string
		wantHost     string
		wantPort string
	}{
		{"domain-unknonw-scheme", "asfd://localhost/?q=0&p=1#frag", "localhost", ""},
		{"domain-implicit", "http://localhost/?q=0&p=1#frag", "localhost", ""},
		{"domain-explicit", "http://localhost:80/?q=0&p=1#frag", "localhost", "80"},
		{"domain-explicit-other-port", "http://localhost:80/?q=0&p=1#frag", "localhost", "80"},

		{"ipv4-implicit", "http://127.0.0.1/?q=0&p=1#frag", "127.0.0.1", ""},
		{"ipv4-explicit", "http://127.0.0.1:80/?q=0&p=1#frag", "127.0.0.1", "80"},
		{"ipv4-explicit-other-port", "http://127.0.0.1:80/?q=0&p=1#frag", "127.0.0.1", "80"},

		{"ipv6-explicit", "http://[1::]:80/?q=0&p=1#frag", "1::", "80"},
		{"ipv6-explicit-other-port", "http://[1::]:80/?q=0&p=1#frag", "1::", "80"},

		{"ipv6-implicit-1", "http://[1::]/?q=0&p=1#frag", "1::", ""},

		{"ipv6-implicit-2", "http://1::/?q=0&p=1#frag", "1::", ""},
		{"ipv6-implicit-3", "http://1::2008/?q=0&p=1#frag", "1::2008", ""},
		{"ipv6-implicit-4", "http://1::2008:1/?q=0&p=1#frag", "1::2008:1", ""},
	}
	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			// Prepare URL
			u, err := url.Parse(tt.input)
			if err != nil {
				t.Errorf("could not parse url: %v", err)
				return
			}

			host, port, err := net.SplitHostPort(u.Host)
			if err != nil {
				// Could not split port from host, as there is no port specified
				host = u.Host
				port = ""
			}

			// Compare result
			if host != tt.wantHost {
				t.Errorf("TestUrl() got = %v, want %v", host, tt.wantHost)
			}
			if port != tt.wantPort {
				t.Errorf("TestUrl() got1 = %v, want %v", port, tt.wantPort)
			}
		})
	}
}

Output:
grafik

@dmitshur dmitshur added this to the Go1.13 milestone Apr 24, 2019

@dmitshur

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

/cc @rsc @bradfitz (per net/url owners)

@bradfitz

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

Your ipv6-implicit-2, ipv6-implicit-3, and ipv6-implicit-4 aren't valid URLs. IPv6 literals need to be in square brackets.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

Closing for now. Let me know if I'm wrong, though.

@bradfitz bradfitz closed this Apr 24, 2019

@noneymous

This comment has been minimized.

Copy link
Author

commented Apr 25, 2019

Thank you for your response @bradfitz !

The rfc2732 recommends the use of brackets in all cases. However, it is not mandatory and can be technically omitted if no explicit port needs to be defined.

The thing is: In many cases the URL might not be built by the developer, but be received from some other source. Using url.Parse() is already an small indicator that URLs are received and parsed rather than built by the developer.

One common example, would be parsing URLs returned by a website. The website is free to return any URL format (can be decided by the web application developer). Relative URLs but also absolute ones. If the website returns absolute URLs, they might not follow the recommendation of the RFC but omit the brackets. url.Parse() would not return an error but garbage difficult to catch.

Sanitizing every received URL before parsing would be a fix, but unnecessasry overhead. It would be related to re-implementing the url.Parse() function.

The following might be a less valid argument in the views of some, but there is already the net.SplitHostPort() function in a closeby package. It is a little more forgiving and having duplicate code might not be the best practice either.

It would be also okay if parsing "http://1::2008" and calling .Hostname() returned "[1::2008]" in compliance with the RFC's suggestion for

rfc2732 : https://www.ietf.org/rfc/rfc2732.txt
definition of "should": https://tools.ietf.org/html/rfc2119

@bradfitz

This comment has been minimized.

Copy link
Member

commented Apr 25, 2019

Well, that SHOULD is news to me. I thought it was a requirement. I think it might've been upgraded to a requirement in later (post 1999) RFCs.

In any case,

3. SHOULD   This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.

Who decided that the valid reasons were to not use square brackets? Did they carefully weigh that libraries (such as Go) might reject them?

Is there any other software today that generates them? That is, how did this bug come about? Is this real or hypothetical?

@noneymous

This comment has been minimized.

Copy link
Author

commented Apr 25, 2019

It was hypothetical based on the RFC's definition and that IPv6 addresses go without brackets by default unless a specific endpoint (port) is to be specified.

However, I now tried how browsers (Chrome, FF, IE, Edge) react on IPv6 URLs without brackets. It seems they don't accept it and understand such input as search terms. Obviously, the general understanding is that brackets are mandatory for IPv6 URLs. Hence, no working website could apply it without. Web developers could build it but it wouldn't be useful.

If being forgiving is not desired in this case, I'm suggesting, that url.Parse() throws an error in order to prevent from continuing with garbage data! See garbage output in sample below.

func main() {
	u, err := url.Parse("http://1::12:2008")
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(u.Hostname()) // => 1
	fmt.Println(u.Port()) // => :12:2008
}

(If the port is not empty or an integer, something went bad.)

@bradfitz bradfitz changed the title net/url: Host/Port extraction malfunctions with IPv6 addresses net/url: make URL parsing return an error on IPv6 literal without brackets Apr 25, 2019

@bradfitz bradfitz reopened this Apr 25, 2019

@bradfitz

This comment has been minimized.

Copy link
Member

commented Apr 25, 2019

Okay, I've repurposed this bug to be about rejecting them instead.

@philandstuff

This comment has been minimized.

Copy link

commented Apr 28, 2019

I'm pretty sure that RFC 3986 (which obsoletes RFC 2732) requires square brackets for IPv6 literals. The ABNF rules defined in RFC 3986 only allow an IPv6address inside square brackets, and there is nothing in the text to suggest that a reg-name that looks like an IPv6address should be interpreted that way.

This silence on IPv6 addresses without square brackets is notable, because the text does acknowledge the ambiguity between reg-name and IPv4address:

The syntax rule for host is ambiguous because it does not completely
distinguish between an IPv4address and a reg-name. In order to
disambiguate the syntax, we apply the "first-match-wins" algorithm:
If host matches the rule for IPv4address, then it should be
considered an IPv4 address literal and not a reg-name.

@noneymous

This comment has been minimized.

Copy link
Author

commented Apr 28, 2019

Well, an IPv6 address cannot be interpreted as a hostname due to the colons, in contrast to an IPv4 address. Anyways, it seems clear now that brackets are required to be compliant.

Any suggestions how to make the function detect and reject mentioned invalid URLs? Validating the host part might be error prone because its content could be IPv4, IPv6 or a hostname. The error actually materializes in the port part. So I thought, a test whether the port is either empty or a valid integer would catch the issue. It would also catch potential other issues not directly related with IPv6 and brackets. But I can't help it feels too cheap.

@philandstuff

This comment has been minimized.

Copy link

commented Apr 28, 2019

I would suggest that RFC 3986 provides the solution - it is the literal source of truth for what can be a valid URL. That does mean parsing it as IPv4, IPv6, IPvFuture (yes, that's a thing), or a reg-name.

I note also that currently, net/url thinks that https://[::y] is a valid URL, which is another non-RFC-3986-compliant URL string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.