Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: URL.String() does not escape Query #30922

Open
vladimiroff opened this Issue Mar 19, 2019 · 9 comments

Comments

Projects
None yet
6 participants
@vladimiroff
Copy link

commented Mar 19, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.1 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/kiril/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/kiril/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/lib/go"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build120012875=/tmp/go-build -gno-record-gcc-switches"

What did you do?

url.Parse("https://тест.бг/път?ключ=стойност#фрагмент")

https://play.golang.org/p/xe9E6LAZhLs

What did you expect to see?

https://%D1%82%D0%B5%D1%81%D1%82.%D0%B1%D0%B3/%D0%BF%D1%8A%D1%82?%D0%BA%D0%BB%D1%8E%D1%87=%D1%81%D1%82%D0%BE%D0%B9%D0%BD%D0%BE%D1%81%D1%82#%D1%84%D1%80%D0%B0%D0%B3%D0%BC%D0%B5%D0%BD%D1%82

What did you see instead?

https://%D1%82%D0%B5%D1%81%D1%82.%D0%B1%D0%B3/%D0%BF%D1%8A%D1%82?ключ=стойност#%D1%84%D1%80%D0%B0%D0%B3%D0%BC%D0%B5%D0%BD%D1%82

Notice how that ключ=стойност is left unescaped.

@agnivade

This comment has been minimized.

Copy link
Member

commented Mar 19, 2019

@bradfitz - any reason we are not escaping the query string ?

@bradfitz

This comment has been minimized.

Copy link
Member

commented Mar 19, 2019

I forget. But every time we touch this package we break tons of people, so might be best alone or documented.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Mar 19, 2019

I invite people to read RFCs for guidance, compare other implementations, and explore what would break if we changed this.

@vladimiroff

This comment has been minimized.

Copy link
Author

commented Mar 21, 2019

Simply applying u.Query().Encode() breaks a lot, even trivial round trips due to sorting of keys and always writing = (e.g. /?foo=bar&baz goes to /?baz=&foo=bar).

Decided to try doing something similar to url.ParseQuery except for simply writing keys and values instead of storing them in a map for the sake of preserving keys' order. At first glance it doesn't seem to be breaking a lot.

In all cases, though, outputting magnet links will suffer from fixing this issue (see #20054).

@gopherbot

This comment has been minimized.

Copy link

commented Mar 21, 2019

Change https://golang.org/cl/168559 mentions this issue: net/url: make URL.String parse and escape query

@vladimiroff

This comment has been minimized.

Copy link
Author

commented Mar 22, 2019

Currently trying to find what this CL above breaks out in the wild. So far the only thing that bothers me is checking for u.User == nil && u.Host == "" && u.Path == "" in order not to break magnet links.

Question remains whether this is a valid approach, though. I wasn't able to find any URI scheme other than magnet that looks like opaque URIs but not quite and uses query values instead. If that's the case, shouldn't we just do if u.Scheme == "magnet"? What's worse with magnet links is that people seem to validate and convert them with regular expressions, instead of parsing the URL and extracting query values from there.

@alin04

This comment has been minimized.

Copy link

commented Apr 4, 2019

Also related #22907.

@klaus

This comment has been minimized.

Copy link

commented Apr 11, 2019

fwiw, I believe the correct way of encoding the domain part wold be ...
https://xn--babb6r0eedb.xn--obae9fb/
If at all. And do we want that for String()? I want that for DNS resolving, yes but not for reading the domain.

@vladimiroff

This comment has been minimized.

Copy link
Author

commented Apr 18, 2019

After spending about a month on testing it against different projects, I think that uploaded CL fixes this fairly well. I guess leaving it for a while on tip would expose real-world breakages.

As for what others do, it's complicated:

  • Java's java.net.URL doesn't do any escaping.
  • Python's urllib.parse.urlparse doesn't do any escaping (has methods for doing that).
  • Ruby's URI accepts only ASCII symbols.
  • Rust's url::Url escapes everything by default (playground).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.