Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: URL.String() does not escape Query #30922

Open
vladimiroff opened this issue Mar 19, 2019 · 11 comments
Open

net/url: URL.String() does not escape Query #30922

vladimiroff opened this issue Mar 19, 2019 · 11 comments
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@vladimiroff
Copy link

vladimiroff commented Mar 19, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.1 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/kiril/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/kiril/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/lib/go"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build120012875=/tmp/go-build -gno-record-gcc-switches"

What did you do?

url.Parse("https://тест.бг/път?ключ=стойност#фрагмент")

https://play.golang.org/p/xe9E6LAZhLs

What did you expect to see?

https://%D1%82%D0%B5%D1%81%D1%82.%D0%B1%D0%B3/%D0%BF%D1%8A%D1%82?%D0%BA%D0%BB%D1%8E%D1%87=%D1%81%D1%82%D0%BE%D0%B9%D0%BD%D0%BE%D1%81%D1%82#%D1%84%D1%80%D0%B0%D0%B3%D0%BC%D0%B5%D0%BD%D1%82

What did you see instead?

https://%D1%82%D0%B5%D1%81%D1%82.%D0%B1%D0%B3/%D0%BF%D1%8A%D1%82?ключ=стойност#%D1%84%D1%80%D0%B0%D0%B3%D0%BC%D0%B5%D0%BD%D1%82

Notice how that ключ=стойност is left unescaped.

@agnivade
Copy link
Contributor

@bradfitz - any reason we are not escaping the query string ?

@bradfitz
Copy link
Contributor

I forget. But every time we touch this package we break tons of people, so might be best alone or documented.

@bradfitz bradfitz added help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 19, 2019
@bradfitz bradfitz added this to the Go1.13 milestone Mar 19, 2019
@bradfitz
Copy link
Contributor

I invite people to read RFCs for guidance, compare other implementations, and explore what would break if we changed this.

@vladimiroff
Copy link
Author

Simply applying u.Query().Encode() breaks a lot, even trivial round trips due to sorting of keys and always writing = (e.g. /?foo=bar&baz goes to /?baz=&foo=bar).

Decided to try doing something similar to url.ParseQuery except for simply writing keys and values instead of storing them in a map for the sake of preserving keys' order. At first glance it doesn't seem to be breaking a lot.

In all cases, though, outputting magnet links will suffer from fixing this issue (see #20054).

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/168559 mentions this issue: net/url: make URL.String parse and escape query

@vladimiroff
Copy link
Author

vladimiroff commented Mar 22, 2019

Currently trying to find what this CL above breaks out in the wild. So far the only thing that bothers me is checking for u.User == nil && u.Host == "" && u.Path == "" in order not to break magnet links.

Question remains whether this is a valid approach, though. I wasn't able to find any URI scheme other than magnet that looks like opaque URIs but not quite and uses query values instead. If that's the case, shouldn't we just do if u.Scheme == "magnet"? What's worse with magnet links is that people seem to validate and convert them with regular expressions, instead of parsing the URL and extracting query values from there.

@alin04
Copy link

alin04 commented Apr 4, 2019

Also related #22907.

@klaus
Copy link

klaus commented Apr 11, 2019

fwiw, I believe the correct way of encoding the domain part wold be ...
https://xn--babb6r0eedb.xn--obae9fb/
If at all. And do we want that for String()? I want that for DNS resolving, yes but not for reading the domain.

@vladimiroff
Copy link
Author

After spending about a month on testing it against different projects, I think that uploaded CL fixes this fairly well. I guess leaving it for a while on tip would expose real-world breakages.

As for what others do, it's complicated:

  • Java's java.net.URL doesn't do any escaping.
  • Python's urllib.parse.urlparse doesn't do any escaping (has methods for doing that).
  • Ruby's URI accepts only ASCII symbols.
  • Rust's url::Url escapes everything by default (playground).

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@vladimiroff
Copy link
Author

@bradfitz you've mentioned that most of the changes in net/url break somebody. I've spent some fair amount to time testing proposed change against different projects, but that's obviously not enough.

Now that 1.13 is released does it make sense to land CL168559 on master in order to have enough time to test and realize whether this actually breaks somebody and if so revert it before 1.14?

@vladimiroff
Copy link
Author

I occasionally rebase this CL on top of releases in order to use it. Since 1.16 my change breaks a test regarding IA5String encoding: 1eeaff7

--- FAIL: TestIA5SANEnforcement (0.00s)
    --- FAIL: TestIA5SANEnforcement/marshal:_unicode_uniformResourceIdentifier (0.00s)
        x509_test.go:2783: expected CreateCertificate to fail with template: &{[] [] [] [] [] [] 0 0 <nil> 0 0   0001-01-01 00:00:00 +0000 UTC 0001-01-01 00:00:00 +0000 UTC 0 [] [] [] [] [] false false 0 false [] [] [] [] [] [] [] [https://example.com/?%E2%88%9E] false [] [] [] [] [] [] [] [] [] []}

@rolandshoemaker could you advise whether it makes sense NOT to expect error anymore once the whole URL is properly escaped? Wouldn't quoting actually fix those rare badly encoded names you mention in the commit message?
Probably @FiloSottile might want to take a look as well, given that he reviewed the commit above.

I think removing the test case should be fine, yet it feels terribly wrong to just remove a test case in a crypto/* package and declare it win 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

8 participants