Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: figure out IDNA Punycode story #13835

Closed
bradfitz opened this issue Jan 6, 2016 · 19 comments

Comments

Projects
None yet
8 participants
@bradfitz
Copy link
Member

commented Jan 6, 2016

(Fork of http://thread.gmane.org/gmane.comp.lang.go.devel/101215)

net/url and net/http say nothing about whose responsibility it is to map non-ASCII hostnames to/from Punycode.

Whose is it?

I was going to fix #13686 for Go 1.6 by vendoring the golang.org/x/net/idna package and using ToASCII when sending HTTP requests (when writing the Host header) in the HTTP client code.

But should I also use ToUnicode when reading Host lines in the HTTP server code? Or should I leave the url.URL.Host string in Punycode form?

@nigeltao said:

I don't know whose responsibility it is, but FWIW, the
PublicSuffixList.PublicSuffix doc comment at the top of
https://go.googlesource.com/go/+/master/src/net/http/cookiejar/jar.go
also has a TODO to decide whose responsibility it is.

Some canonicalization is also done by
https://go.googlesource.com/go/+/master/src/net/http/cookiejar/punycode.go
which is a copy/paste of the x/net/idna code. Or maybe x/net/idna is
the copy/paste and net/http/cookiejar is the original. I forget.

We should figure this out for Go 1.7.

Should net.Dial work on UTF-8 domain names?

Should url.Parse decode them into URL.Host? Should URL.Host accept either UTF-8 or Punycode?

Should Punycode be exposed in the stdlib?

@bradfitz bradfitz added this to the Go1.7Early milestone Jan 6, 2016

@robpike

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2016

I suppose it's too late to delete Punycode from the world, but I would like to understand exactly what security aspects are affected both by adding support to the library and by not adding it.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Jan 6, 2016

The security aspect I'm most aware of is homograph attacks: https://en.wikipedia.org/wiki/IDN_homograph_attack

Those are why presumably why browsers render http://кто.рф/eng/ as http://xn--j1ail.xn--p1ai/eng/ in the URL bar.

But should this work?

     c, err := net.Dial("кто.рф:80")

Or:

    res, err := http.Get("http://кто.рф/")

I can type http://кто.рф/ into my URL bar and it loads. Requiring all users working with such URLs to do this seems onerous:

    u, err := url.Parse("http://кто.рф/")
    if err != nil { 
        ... 
    }
    u.Host, err = idna.ToASCII(u.Host)
    if err != nil { 
        ... 
    }
    res, err := http.Get(u.String())

The other security aspect is substring searching.

But it's already the case that strings.Contains(req.URL.Host, "рф") in an http.Handler won't work, as browsers already send Punycode.

I think for security the best thing to do is be consistent which fields contain which form at all times, and document it.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2016

For what little it's worth, I did look into the topic of Unicode host names when working on url.Parse earlier in the cycle. Go 1.4 and earlier preserved them as is. Go 1.5 %-encoded the non-ASCII bytes (clearly not right). Go 1.6 goes back to the earlier preservation. For URL parsing the general rule is that the fields of the URL struct are unencoded, so it seemed best to keep Host in its original form, especially in non-web uses of URLs (like when people use mysql://whatever).

I looked at what seemed to be relevant RFCs at the time and it looked like the HTTP protocol needed one algorithm for mangling to ASCII while DNS needed a different one. Maybe I misunderstood, but that split seemed another argument in favor of keeping the URL.Host field unmangled, and for keeping URL.String() unmangled at least for now.

I agree that we should figure out how to make http.Get etc do the right thing. Internally http.Get already calls url.Parse, so if there is http-specific url mangling to do, it can be done there.

Similarly we should figure out how to make net.Dial do the right thing, whatever that is.

@jehiah

This comment has been minimized.

Copy link

commented Feb 8, 2016

I've experienced needing to handle this at Bitly, so i'll add my 2cents.

I think url.Parse should not handle IDN/Punycode conversion, as an IDN formatted Host is a valid URL format, and a URL struct constructed with an IDN Host (without using Parse) should also work. Changing round trip semantics of url.Parse(u).String() also seems like a backwards incompatible change of the type that's hard to detect. The Host just needs to be ascii safe when used in a HTTP Host header, which implies to me net/http responsibility. That means, as suggested, calling idna.ToASCII() before sending the Host header or TLS SNI where appropriate.

Also, as a side note, the same IDN/Punycode issue also applies to the construction of the Referer host header in refererForURL as that should also be ascii safe for use in a header.

When handling requests on the server side, I think the Host header should be left as-is (presumably in punycode if from a well behaved browser/client, but we all know that isn't always the case), but ToUnicode would also be reasonable.

It's unclear to me what, if any, restriction net.Dial should have on unicode or IDN domains as DNS as i believe punycode is a DNS convention not a requirement (i need to do some RFC reading to be better informed on this one).

@bradfitz bradfitz modified the milestones: Go1.8Early, Go1.7Early May 5, 2016

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented May 5, 2016

Didn't happen for 1.7.

@themihai

This comment has been minimized.

Copy link

commented Aug 27, 2016

The parser and encoder of net/mail.Address doesn't handle Punycode either. I've found it as a surprise(perhaps should be documented) because Address.Name is word-encoded/decoded.
The latest email RFC obsoletes both Punycode and word-encoding so I would say that the trend is to support native UTF-8. How it's displayed to the end user may be a different story.

Edit: net/mail doesn't need additional documentation. After more research I've found that
"IDNA does not update the existing email standards, which allow only ASCII characters in local parts". Therefore a domain label that holds the local part of an email address should not be encoded/decoded. Sorry for the noise.

@gopherbot

This comment has been minimized.

Copy link

commented Sep 11, 2016

CL https://golang.org/cl/28961 mentions this issue.

@bradfitz bradfitz self-assigned this Sep 11, 2016

gopherbot pushed a commit that referenced this issue Sep 11, 2016

vendor: add golang.org/x/net/idna to the vendor directory for Punycode
Adds golang.org/x/net/idna to the Go repo from the
golang.org/x/net repo's git rev 7db922ba (Dec 2012).

Punycode is needed for http.Get("привет.рф") etc., which will
come in separate commits.

Updates #13835

Change-Id: I313ed82d292737579a3ec5dcf8a9e766f2f75138
Reviewed-on: https://go-review.googlesource.com/28961
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot

This comment has been minimized.

Copy link

commented Sep 12, 2016

CL https://golang.org/cl/29072 mentions this issue.

@gopherbot

This comment has been minimized.

Copy link

commented Sep 12, 2016

CL https://golang.org/cl/29071 mentions this issue.

gopherbot pushed a commit to golang/net that referenced this issue Sep 12, 2016

http2: add Transport support for unicode domain names
No tests, because tests are in std. (See TestTransportIDNA_h2)

Updates golang/go#13835

Change-Id: I0a327d9652ea5e6f32dfa279550915af61567bed
Reviewed-on: https://go-review.googlesource.com/29071
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Chris Broadfoot <cbro@golang.org>
@gopherbot

This comment has been minimized.

Copy link

commented Sep 13, 2016

CL https://golang.org/cl/29110 mentions this issue.

gopherbot pushed a commit that referenced this issue Sep 13, 2016

net/http: update bundled x/net/http2
Updates x/net/http2 (and x/net/lex/httplex) to git rev 749a502 for:

   http2: don't sniff first Request.Body byte in Transport until we have a conn
   https://golang.org/cl/29074
   Fixes #17071

   http2: add Transport support for unicode domain names
   https://golang.org/cl/29071
   Updates #13835

   http2: don't send bogus :path pseudo headers if Request.URL.Opaque is set
   https://golang.org/cl/27632
     +
   http2: fix bug where '*' as a valid :path value in Transport
   https://golang.org/cl/29070
   Updates #16847

   http2: fix all vet warnings
   https://golang.org/cl/28344
   Updates #16228
   Updates #11041

Also uses the new -underscore flag to x/tools/cmd/bundle from
https://golang.org/cl/29086

Change-Id: Ica0f6bf6e33266237e37527a166a783d78c059c4
Reviewed-on: https://go-review.googlesource.com/29110
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Chris Broadfoot <cbro@golang.org>

gopherbot pushed a commit that referenced this issue Sep 13, 2016

net/http: make Transport support international domain names
This CL makes code like this work:

     res, err := http.Get("https://фу.бар/баз")

So far, IDNA support is limited to the http1 and http2 Transports.
The http package is currently responsible for converting domain names
into Punycode before calling the net layer. The http package also has
to Punycode-ify the hostname for the Host & :authority headers for
HTTP/1 and HTTP/2, respectively.

No automatic translation from Punycode back to Unicode is performed,
per Go's historical behavior. Docs are updated where relevant.  No
changes needed to the Server package. Things are already in ASCII
at that point.

No changes to the net package, at least yet.

Updates x/net/http2 to git rev 57c7820 for https://golang.org/cl/29071

Updates #13835

Change-Id: I1e9a74c60d00a197ea951a9505da5c3c3187099b
Reviewed-on: https://go-review.googlesource.com/29072
Reviewed-by: Chris Broadfoot <cbro@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@rsc

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2016

What is the status here? Is this going to be part of Go 1.8? What is left?

@rsc

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2016

Are there any decisions remaining here or is it just all implementation? Assuming the latter.

@rsc rsc added the NeedsFix label Sep 26, 2016

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 26, 2016

Per earlier chat with @ianlancetaylor, we're going to make net.Dial work too before this is closed. That's easy. I'll do it soon.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 26, 2016

Actually, I forgot why we might not want to do this in net: case folding Unicode domain names brings in the unicode package and tables, which is prohibited by the go/build deps_test.go policy.

I think we might just want to keep this in net/http for Go 1.8.

I'll keep this bug open until I add some more tests (around case folding), and then we'll call it good for Go 1.8.

@minux

This comment has been minimized.

Copy link
Member

commented Sep 27, 2016

@gopherbot

This comment has been minimized.

Copy link

commented Sep 27, 2016

CL https://golang.org/cl/29860 mentions this issue.

@gopherbot

This comment has been minimized.

Copy link

commented Sep 27, 2016

CL https://golang.org/cl/29859 mentions this issue.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Sep 27, 2016

@minux, perhaps. But feels slightly odd. We kinda do that for DNS if you include cgo, but that's pretty explicit. I'd rather consider that later, in Go 1.9. I have other things to focus on during 1.8.

gopherbot pushed a commit that referenced this issue Sep 27, 2016

vendor: add golang.org/x/text/unicode/norm + x/test/width for IDNA su…
…pport

Add golang.org/x/text/unicode/norm from x/text git rev a7c02369.

Needed by net/http for IDNA normalization.

Updates #13835

Change-Id: I8b024e179d573f2b093c209a4b9e4f71f7d4a1f2
Reviewed-on: https://go-review.googlesource.com/29859
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Marcel van Lohuizen <mpvl@golang.org>

@gopherbot gopherbot closed this in a73020f Sep 27, 2016

@Xeoncross

This comment has been minimized.

Copy link

commented Nov 14, 2016

In order to store URL's they need to be normalized. If I read all this correctly, then the following is still required to normalize URLs even in 1.8 correct?

    u, err := url.Parse(urlString)
    if err != nil {
        fmt.Println(err)
    }
    u.Host, err = idna.ToASCII(u.Host)
    if err != nil {
        fmt.Println(err)
    }
    u.Host, err = idna.ToUnicode(u.Host)
    if err != nil {
        fmt.Println(err)
    }

    fmt.Println(u)

@golang golang locked and limited conversation to collaborators Nov 14, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.