Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any single label is considered a TLD #48

Closed
m6w6 opened this issue Feb 15, 2016 · 28 comments
Closed

Any single label is considered a TLD #48

m6w6 opened this issue Feb 15, 2016 · 28 comments

Comments

@m6w6
Copy link

m6w6 commented Feb 15, 2016

Is there a rationale behind this behavior? Why not use the existing TLD list instead of a match-all rule[1]?

I'm not sure the following statement holds truth: psl_is_public_suffix(ctx, "localhost")

Currently, that means that libcurl drops any cookies from localhost etc.[2] when built with libpsl support.

[1] https://github.com/rockdaboot/libpsl/blob/master/src/psl.c#L810
[2] https://github.com/curl/curl/blob/master/lib/cookie.c#L801

/cc @remicollet, @bagder

@bagder
Copy link

bagder commented Feb 15, 2016

According to the documentation: "This function checks if domain is a public suffix by the means of the Mozilla Public Suffix List." and "localhost" is not a public suffix, so I would argue that the function should then return false.

@rockdaboot
Copy link
Owner

Algorithm , Point 2. Says if no rules match, the prevailing rule is "*".
Do I interpret this sentence wrong ?

@m6w6
Copy link
Author

m6w6 commented Feb 15, 2016

Hi, which document are you referring to?

Thank you!

@m6w6
Copy link
Author

m6w6 commented Feb 15, 2016

The more I think about all that, it seems that such an API should only be used with FQDNs.

Say, you query a host named com in the domain local and e.g. local is in your DNS search path, how should anything but the caller know that we actually mean com.local? (Does the caller even know?!)

...and would the server send a cookie with domain=com? Could be if some server-side logic detects, that we're accessing com and not com.local or maybe it doesn't even send a domain, so how's that usually (with regards to cookies) handled?

/cc @bagder

@bagder
Copy link

bagder commented Feb 15, 2016

In cookies the domain attribute is tail matched with the host name used in the URL and I don't think there's ever any guarantee that it is in fact a FQDN. It is even not that easy to figure out since the name resolving functions etc don't tell us that very easily. But yes, the PSL matching pretty much assumes that we give it a true and global domain name. Which, if we access public URLs, we will - but in private circumstances we may not always...

@m6w6
Copy link
Author

m6w6 commented Feb 15, 2016

Looking at the provided test data, it seems that single-label names do not match at all:
https://raw.githubusercontent.com/publicsuffix/list/master/tests/test_psl.txt

@rockdaboot
Copy link
Owner

Hi, sorry for latency... (and short answer from mobile phone)

Looking at the provided test data, it seems that single-label names do not match at all
What exactly (e.g .line number) lets you assume this ?

Hi, which document are you referring to?
https://publicsuffix.org/list/

@bagder What you say. In a private area the PSL might not always do the right thing.
Is there a curl CLI option (and/or API) to switch PSL on and off ?

@m6w6
Copy link
Author

m6w6 commented Feb 15, 2016

What exactly (e.g .line number) lets you assume this ?

Any check of a single label.

@bagder
Copy link

bagder commented Feb 15, 2016

No, there's currently no way to switch off PSL in curl but there probably should be, for example the cases where users run it within organizations using custom domains etc. I think it hasn't been that widely used yet.

But can you elaborate on why the function should return TRUE on a domain that clearly is not listed as a public suffix list? I don't understand how that can be the job of this function.

@remicollet
Copy link

Point 2. Says if no rules match, the prevailing rule is "*".

but "*" is not part of the public_list ;)
while all TLD are, including the recent exotic ones (beer, pizza...)

Quick test from a browser, cookie from non qualified domain are accepted, which is very common.

@remicollet
Copy link

@bagder looking at other consumer of libpsl (e.g. wget), it seems using psl_is_cookie_domain_acceptable could be a better solution than psl_is_public_suffix.

psl_is_cookie_domain_acceptable(psl, localhost, localhost) => 1

@bagder
Copy link

bagder commented Feb 16, 2016

Maybe, yes. The irony here is of course that @rockdaboot himself wrote the libcurl adaption that uses libpsl =)

@remicollet
Copy link

@bagder
Copy link

bagder commented Feb 16, 2016

We could certainly work around this issue in curl that way, but that won't make psl_is_public_suffix work as documented, which this bug report is about... The curl bug is here: curl/curl#658

@rockdaboot
Copy link
Owner

@bagder There must have been a reason for using psl_is_public_suffix at the time I wrote the patch. Sadly, I hardly remember and do not have the time to investigate right now. But today, I would use psl_is_cookie_domain_acceptable() in a cookie context, as @remicollect correctly found out. I am willing to have a look at curl code the next days, if that is fine for you.

but that won't make psl_is_public_suffix work as documented, which this bug report is about

The documentation has to be clearer. I won't work against the proposed PSL... if there is something wrong or unclear in those rules, we have to open an issue at https://github.com/publicsuffix/list, asking for clarification. There are a few points unclear, see publicsuffix/list#145.
BTW, ..foo.bar is explicitely allowed for the PSL. But that breaks current Chromium and libpsl implementations that allow only one wildcard at the left position. @weppos already reverted a commit with ..githubcloudusercontent.com due to this.

psl_is_cookie_domain_acceptable(psl, localhost, localhost) => 1

@remicollect, this is of course accepted and circumvents a check against the PSL (due to localhost==localhost). The real question is if xyz.localhost may set a cookie for localhost. The PSL rulez say NO. I personally would say default=NO, but YES if user allows it explicitely. But that is beyond libpsl and should be handled at application level.

IMO, there are these TODOs:

  • fix libpsl documentation
  • fix curl code to use psl_is_cookie_domain_acceptable()
  • add curl CLI options for finer user control about the PSL (optional)
  • open an issue at https://github.com/publicsuffix/list for clarification (optional)

@bagder
Copy link

bagder commented Feb 16, 2016

The real question is if xyz.localhost may set a cookie for localhost. The PSL rulez say NO

Why? "localhost" is not a public suffix, so why does the PSL rules limit non-PSL domains?

@rockdaboot
Copy link
Owner

@m6w6

Looking at the provided test data, it seems that single-label names do not match at all:

What exactly (e.g .line number) lets you assume this ?

Any check of a single label.

The 'checkPublicSuffix' function checks for the 'shortest registrable domain part' of a given input domain, e.g. checkPublicSuffix('COM', null); means '.com' is not registrable (it is a public suffix).
checkPublicSuffix('a.b.example.example', 'example.example'); means '.example.example' is the shortest domain part that is not a PS (.example is a PS because it is a single label domain / TLD).

@bagder
Copy link

bagder commented Feb 16, 2016

.example is a PS because it is a single label domain

Ugh. I find that very counter-intuitive and strange. But sure, it explains the functionality.

@rockdaboot
Copy link
Owner

Point 2. Says if no rules match, the prevailing rule is "*".

but "*" is not part of the public_list ;)

Yes, it is implicit. Because of the rule mentioned above.

The real question is if xyz.localhost may set a cookie for localhost. The PSL rulez say NO

Why? "localhost" is not a public suffix, so why does the PSL rules limit non-PSL domains?

"localhost" is a PS, same explanation as above (prevailing * rule). Please look at https://publicsuffix.org/list/ if you don't believe me. I am not aware of a list of exception (e.g. private domains). Why not opening an issue or PR to add "!localhost" to the PSL ? An official decision is beyond libpsl.

But I am fine to discuss adding libpsl functionality regarding private PSL rules. E.g. having a second, private list of rules that can be added/removed from the PSL. A user could simply add e.g. "!localhost" or for testing, remove/overwrite existing rules. So we cover all kinds of private and 'exotic' PSL usages. WDYT ? If you like, open another issue just for this... I guess some details have to be discussed there.

@bagder
Copy link

bagder commented Feb 16, 2016

My surprise is not that "localhost" specifically isn't listed as a PSL. My surprise is that an API for PSL (being "Public Suffix List" - a list of public suffixes) gives back a response about a domain that is clearly not specified as a PSL. I think it is outside of PSL's jurisdiction. I would claim that the limitations on a "single label domain" is not a PSL job to enforce. If you're just following the PSL guidelines than my beef is with the PSL guidelines and I'm just here barking up the wrong tree.

I'm thankful for your work and your library, don't mistake my complaining for anything else.

@m6w6
Copy link
Author

m6w6 commented Feb 16, 2016

As already mentioned on the PSL site, this "algorithm" is merely a plumber-ed list.
I found an old HTML-WG thread about that, too: http://lists.w3.org/Archives/Public/public-html/2009Jan/0529.html

For me the key question remains, why would any implementation say, "yes, foo is a public suffix", despite foo not being listed in the PSL?

"localhost" is a PS, same explanation as above (prevailing * rule)

Then any single-label rule from the PSL could be spared.

Anyway, thanks for your time! ;)

@weppos
Copy link

weppos commented Feb 16, 2016

Why? "localhost" is not a public suffix, so why does the PSL rules limit non-PSL domains?

My surprise is that an API for PSL (being "Public Suffix List" - a list of public suffixes) gives back a response about a domain that is clearly not specified as a PSL. I think it is outside of PSL's jurisdiction.

@bagder as @rockdaboot correctly mentioned, there is a specific rule in the PSL algorithm that says:

If no rules match, the prevailing rule is "*".

And there are corresponding tests:

// Unlisted TLD.
checkPublicSuffix('example', null);
checkPublicSuffix('example.example', 'example.example');
checkPublicSuffix('b.example.example', 'example.example');
checkPublicSuffix('a.b.example.example', 'example.example');

Which means, if a TLD is not listed, it should be considered a "standard" TLD. I did not make that rule, hence I can't share the exact initial motivations, however my assumption is that it was done because the PSL is considered "a list of specific configuration" to be applied on top of the standard practice where the domain is third-level.second-level.tld.

This potentially makes possible to clear from the list all standard suffixes (e.g. in a pre-processing phase). Moreover, the list will not cause any denial of service to new TLDs (think about all the new GTLDs) or to TLDs that for one reason or another were not listed (although I'm quite sure we currently include all the TLDs that were available at ICANN before the newGLTD phase).

I agree that this rule is very centric to the idea of public web and it may cause conflicts when the PSL is used within a local network context. However, it's also dependent on the specific usage and implementation that the library makes of the list. For e.g., the libpsl (or any other lib) may decide to provide a flag to use * or not if there is no match.

I hope this provides a little bit of extra context. If you should decide to suggest some specific changes to the PSL, I agree with @rockdaboot that the best thing to do is to open a ticket in the PSL repo itself.

@bagder
Copy link

bagder commented Mar 2, 2016

I still disagree. A single label is indeed a TLD, but if it isn't listed in the PSL I disagree that it should be returned as such.

@m6w6
Copy link
Author

m6w6 commented Mar 3, 2016

Same feeling here.

@rockdaboot
Copy link
Owner

So you opened an issue (discussion) at https://github.com/publicsuffix/list, where it belongs !?

Background is that the PSL is an (optional) tool to prevent leaking of privacy via cookies AND that not all TLDs are listed in the PSL (e.g. when new ones come, it will take a while before an updated PSL is spread around everywhere). In some scenarios (e.g. secured local network, testing purposes), the PSL isn't appropriate and you don't want to use it.

You will know when it isn't the right tool and should have a switch (e.g. command line option) to turn it off.

Also, libpsl supports loading your own version of a PSL - you could bring in your own exceptions, thus have a very fine grained control (this also needs some support via curl).

If you think, there are other - automated - possibilities to detect when the PSL should be used and when not, you could give us a hint. But I believe such measurements (e.g. DNS lookup to see if IP is private) belong to the application level.

@weppos
Copy link

weppos commented Mar 3, 2016

@rockdaboot ultimately it's your decision on how to tweak the lib, but what about a flag that allows the developer to optionally decide (or switch) the behavior when a rule doesn't match?

This is for example what I did here:
https://github.com/weppos/publicsuffix-ruby/blob/master/lib/public_suffix/list.rb#L245-L251

By default, I return a "*" rule in case of no match. However, a developer can either return nil or a different rule to customize the behavior. If I correctly interpreted the request here, returning nil in my Ruby implementation, for example, will match the behavior proposed by @bagder

@rockdaboot
Copy link
Owner

Back to start. As I understand, @m6w6 had the problem that server 'localhost' was not able to set a cookie for domain 'localhost'. That was a bug due to not calling psl_is_cookie_domain_acceptable(). I provided a patch for curl to get this fixed (curl/curl#658).

The questions that remains is if 'host1.localhost' may set a cookie for 'localhost' - this is called a super-cookie. If this cookie will be accepted, the next request to 'host2.localhost' will contain it. Information is transfered from host1 to host2.

IMO, If you explicitly want this behavior, just switch PSL off via e.g. a command line switch.

BTW, you can play around with the PSL using the 'psl' command from libpsl, e.g.:

$ psl --is-cookie-domain-acceptable localhost host1.localhost
host1.localhost: 0
$ psl --is-cookie-domain-acceptable localhost localhost
localhost: 1

@rockdaboot
Copy link
Owner

I have to add that the reason that 'host1.localhost' MUST NOT set a cookie for 'localhost' has nothing to do with the PSL. It is the RFC 6265 that disallows it.

The code in the function psl_is_cookie_domain_acceptable() is:

    cookie_domain_length = strlen(cookie_domain);
    hostname_length = strlen(hostname);

    if (cookie_domain_length >= hostname_length)
        return 0; /* cookie_domain is too long */

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants