Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/publicsuffix: PublicSuffix() is case sensitive #25254

Open
OrBaruk opened this issue May 4, 2018 · 5 comments
Open

x/net/publicsuffix: PublicSuffix() is case sensitive #25254

OrBaruk opened this issue May 4, 2018 · 5 comments

Comments

@OrBaruk
Copy link

@OrBaruk OrBaruk commented May 4, 2018

What version of Go are you using (go version)?

go version go1.10.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOOS="linux"

What did you do?

PublicSuffix() returns incorrect values when the domain contains letters with uppercase

According to RFC4343 I believe PublicSuffix() should be case insensitive.

package main

import (
	"fmt"
	"golang.org/x/net/publicsuffix"
)

func main() {
	suffix, icann := publicsuffix.PublicSuffix("abc.CoM.Br")
	fmt.Printf("suffix: %s\nicann: %v\n", suffix, icann)
}

What did you expect to see?

suffix: com.br
icann: true

What did you see instead?

suffix: Br
icann: false
@gopherbot gopherbot added this to the Unreleased milestone May 4, 2018
@OrBaruk OrBaruk changed the title x/net/publicsuffix: x/net/publicsuffix: PublicSuffix() is case sensitive May 4, 2018
@meirf
Copy link
Contributor

@meirf meirf commented May 5, 2018

Test cases from the official PSL repo:

// Mixed case.
checkPublicSuffix('COM', null);
checkPublicSuffix('example.COM', 'example.com');
checkPublicSuffix('WwW.example.COM', 'example.com');

This indicates that PublicSuffix should be case insensitive.

Note these test cases are missing from x/net/publicsuffix. I wonder if they were omitted intentionally. @vdobler @nigeltao

@vdobler
Copy link
Contributor

@vdobler vdobler commented May 5, 2018

Yes this is deliberate because the tests would not pass.

EffectiveTLDPlusOne returns the eTLD+1 of its argument
in the original input capitalisation, it doesn't lowercase
the input.

EffectiveTLDPlus1("b.DOmaiN.BiZ") == "DoMaiN.BiZ"
@OrBaruk
Copy link
Author

@OrBaruk OrBaruk commented May 5, 2018

I think the EffectiveTLDPlusOne should preserve the original capitalisation, but it should not give different results depending on how the input was capitalised for example:

Current:

publicsuffix.EffectiveTLDPlusOne("SuBdOmAiN.aBc.BlOgSpOt.cOm") == "BlOgSpOt.cOm"
publicsuffix.EffectiveTLDPlusOne("subdomain.abc.blogspot.com") == "abc.blogspot.com"

Expected:

publicsuffix.EffectiveTLDPlusOne("SuBdOmAiN.aBc.BlOgSpOt.cOm") == "aBc.BlOgSpOt.cOm"
publicsuffix.EffectiveTLDPlusOne("subdomain.abc.blogspot.com") == "abc.blogspot.com"

These different results was how I originally found the issue.

@vdobler
Copy link
Contributor

@vdobler vdobler commented May 7, 2018

@OrBaruk

The unexpected result you see with mixed case inputs is addressed in
two TODOs

The current implementation of package publicsuffix requires arguments
to PublicSuffix and EffectiveTLDPlus1 to be: Lowercases, free from leading
and trailing dots and punycoded.
As the main use of package publicsuffix is in package cookiejar and cookiejar
calls PublicSuffix only with lowercased, dot-free, punycoded arguments and
ensuring this once more in package publicsuffix is wasteful.

But of course this is unintuitive and not documented.

You demonstrated un-intuitive behavior with uppercase domain names.
How should EffectiveTLDPlusOne behave on IDNs?
How on full qualified domain names (trailing dot)?
Should it strip leading dots (still common in the domain
attribute of a cookie)=

publicsuffix.EffectiveTLDPlusOne("世界.食狮.中国") = ??
publicsuffix.EffectiveTLDPlusOne("www.foo.com.") = ??
publicsuffix.EffectiveTLDPlusOne(".example.com") = ??

@nigeltao
I think a natural expectation would be that package publicsuffix

  • works well with uppercase/mixed case domains
  • works well with IDNs.
    With "works well" in the sense of produces the naturally expected value
    like suggested by OrBaruk and by handling IDNs transparently as this
    package seems to be used not only as an Option to cookiejar.Jar.
    (Leading/trailing dots must be removed by the caller.)

What do you think?

@nigeltao
Copy link
Contributor

@nigeltao nigeltao commented May 18, 2018

I understand the "works well" expectation, but I think that there's a bigger question of how does Go generally handle IDNs (and upper/mixed case domains)? I think any answer needs to be consistent across net, net/http, net/http/cookiejar, net/url, x/net/publicsuffix, etc, and not just a tactical modification only to x/net/publicsuffix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants