Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: Parse does not permit percent escapes in host #30844

Open
zombiezen opened this issue Mar 14, 2019 · 6 comments

Comments

Projects
None yet
6 participants
@zombiezen
Copy link
Member

commented Mar 14, 2019

RFC 3986 permits percent-encoded characters in the host part of a URL. *net/url.URL.String will percent-encode the net/URL.Host field, but confusingly, url.Parse does not accept URLs produced in this form.

Applications like PostgreSQL that use net/url.Parse to parse URLs that have this form produce an error, see lib/pq#796. The prior #16127 notes some mismatch between net/url.Parse and RFC 3986. This issue is around the specific issue of percent-encoded characters in the host, not any larger scope.

What version of Go are you using (go version)?

$ go version
go version go1.12 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE=[redacted]
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=[redacted]
GOPROXY=""
GORACE=""
GOROOT=[redacted]
GOTMPDIR=""
GOTOOLDIR=[redacted]
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=[redacted]
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS=[redacted]

What did you do?

https://play.golang.org/p/l9oOmcNkYUd

What did you expect to see?

URL1: postgres://foo@%2Fvar%2Frun%2Fpostgres/mydb
Host: /var/run/postgres
URL2: postgres://foo@%2Fvar%2Frun%2Fpostgres/mydb
Host: /var/run/postgres

What did you see instead?

URL1: postgres://foo@%2Fvar%2Frun%2Fpostgres/mydb
Host: /var/run/postgres
Parse: parse postgres://foo@%2Fvar%2Frun%2Fpostgres/mydb: invalid URL escape "%2F"

@zombiezen zombiezen added the Suggested label Mar 14, 2019

@alin04

This comment has been minimized.

Copy link

commented Apr 2, 2019

In addition, both "<" and ">" are allowed in host (specifically registered name) when they shouldn't be as per: https://tools.ietf.org/html/rfc3986#section-3.2.2

https://golang.org/src/net/url/url.go?s=2590:2650#L100

This should pct-encode the ">", but doesn't:
https://play.golang.org/p/aVV6I6q-mPV

@bcmills bcmills added this to the Unplanned milestone Apr 12, 2019

@thoeni

This comment has been minimized.

Copy link
Contributor

commented Apr 13, 2019

@zombiezen I wrote a test with your expected result, and I got it passing by removing a check here:
https://github.com/golang/go/blob/master/src/net/url/url.go#L221-L223
Those lines appear to explicitly exclude ASCII (but for the %25) from escaping on parse

The note on the if branch mentions page 21 of that same RFC that you've linked: https://github.com/golang/go/blob/master/src/net/url/url.go#L215-L220

I think this is the relevant bit:

The reg-name syntax allows percent-encoded octets in order to
represent non-ASCII registered names in a uniform way that is
independent of the underlying name resolution technology. Non-ASCII
characters must first be encoded according to UTF-8 [STD63], and then
each octet of the corresponding UTF-8 sequence must be percent-
encoded to be represented as URI characters.

Also, I can see that the mention about RFC implementation by the package has been "relaxed" a while ago: https://go-review.googlesource.com/c/go/+/22859/
And here: #16127 (comment)

Not sure how to interpret the RFC, but happy to hear your thoughts.

Also worth noticing that some tests are actually checking the opposite behaviour:
https://github.com/golang/go/blob/master/src/net/url/url_test.go#L1432

@zombiezen

This comment has been minimized.

Copy link
Member Author

commented Apr 15, 2019

@bradfitz, any thoughts on this issue? I remember there being some subtleties about what sort of changes we want to permit around net/url.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Apr 15, 2019

Given that we String-ify such Hosts, I'm not opposed to trying to Parse them too. We'll see what breaks. But please send a change soon, as the freeze is coming, and net/url changes are notorious for breaking things & getting reverted.

@thoeni

This comment has been minimized.

Copy link
Contributor

commented Apr 15, 2019

What's not entirely clear to me is:
The RFC says:

The reg-name syntax allows percent-encoded octets in order to
represent non-ASCII registered names in a uniform way that is
independent of the underlying name resolution technology.

The example of / is an "ASCII" character after all. 🤔

I've made a change, and I'll open a CL, but there were tests explicitly asserting the opposite of what this issue says, see:

Also the current comment on the if statement mentions:

// Per https://tools.ietf.org/html/rfc3986#page-21
// in the host component %-encoding can only be used
// for non-ASCII bytes.
// But https://tools.ietf.org/html/rfc6874#section-2
// introduces %25 being allowed to escape a percent sign
// in IPv6 scoped-address literals. Yay.

Have a look at the changes.

@gopherbot

This comment has been minimized.

Copy link

commented Apr 15, 2019

Change https://golang.org/cl/172157 mentions this issue: net/url: Parse allow ASCII percent-encoded chars in host

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.