Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify number and position of wildcard labels #145

Closed
weppos opened this issue Feb 14, 2016 · 30 comments
Closed

Clarify number and position of wildcard labels #145

weppos opened this issue Feb 14, 2016 · 30 comments
Assignees
Labels
r=dnsguru Marked as approved and ready to merge by @dnsguru

Comments

@weppos
Copy link
Member

weppos commented Feb 14, 2016

I made a commit a few weeks ago that introduced a rule like *.*.private.domain and the commit caused the build to fail.

According to our website, that is supposed to be a valid format:

Wildcards are not restricted to appear only in the leftmost position, but they must wildcard an entire label. (I.e. ..foo is a valid rule: *bar.foo is not.)

@rockdaboot mentioned a potential incompatibility of Chromium if we allow multiple wildcards. libpsl is currently not compatible with multiple wildcards, and to be fair I haven't tested my Ruby implementation either.

@gerv @sleevi can we clarify whether multiple wildcard labels are accepted? Specifically, we should be more clear if the following rules are valid:

// multiple leading wildcards (common case)
*.*.foo.bar

// single wildcard, but inside the rule
foo.*.bar

// multiple willdcards, inside the rule
foo.*.*.bar

// multiple willdcards, inside the rule, non consecutive
foo.*.bar.*.baz

// I suppose this is invalid
foo.*

The current list definition doesn't explicitly deny these rules, they are supposed to be valid.

Once the decision is taken, I think we should:

  1. Update the website to make it more clear
  2. Add some corresponding tests in the test file
  3. Should the rules be considered invalid, I'll also add the corresponding rules to the linter I'm working on,
  4. Should the rules be considered valid, we should ping the various lib maintainers
@rockdaboot
Copy link
Contributor

// single wildcard, but inside the rule
foo.*.bar

This would not allow exceptions right now, since (AFAIR) the exclamation mark has to be the first character (and must not be followed by a dot).
If this form will be allowed, the rules for exceptions must IMO change (e.g. foo.!baz.bar).

@gerv
Copy link
Contributor

gerv commented Feb 16, 2016

I believe the Mozilla implementation also didn't allow multiple wildcards. It seems like there are lots of implementations out there which don't. Given that their use is quite niche (what was the use case for the rule you added?), perhaps we need to bow to the inevitable and update the spec to match reality.

@sleevi
Copy link
Contributor

sleevi commented Feb 20, 2016

Right, the Chrome code doesn't presently, but we could likely fix that. But in the implementations I've seen, I haven't seen any support for multi-wildcards, despite what the website says.

@User4574
Copy link

Hi

As you can see from our pull request here:

#179

we need to get the double wildcard issue addressed, as we need our customer domain suffix in the PSL.

We name machines under MACHINE.GROUP.ACCOUNT.CLUSTER.bigv.io, so ...bigv.io would be ideal, or *..uk0.bigv.io.

Thanks
Næþ'n

@gerv
Copy link
Contributor

gerv commented Mar 16, 2016

When you say "get it addressed", do you mean "update all the PSL-using code in the universe to support double wildcards"? It seems to me that the most likely way for this to be addressed would be to document what appears to be a fairly universal client limitation - that is, that double wildcards are not supported.

@weppos
Copy link
Member Author

weppos commented Mar 16, 2016

@gerv @sleevi I'm curious, what would be the implications of having one of those entries in the list as of today?

Would it cause some sort of crash, or will it simply "not work as expected"?

I'm asking because it may make sense in the long run to support it. And if this is the direction we want to try to take, we may simply document it as "compatibility not guaranteed". Clients and consumers will eventually update their codebase.

In other words, we allow, but we document as "it may not work on all clients". Different is if rolling out today would cause Firefox or Chrome to potentially crash.

What do you think?

@gerv
Copy link
Contributor

gerv commented Mar 16, 2016

I suspect the answer is implementation-dependent... We'd have to try it and see.

@sleevi
Copy link
Contributor

sleevi commented Mar 16, 2016

@weppos I don't think we can make a demarcation between "crash" and "not-work". That's the sort of decision PSL consumers should take. For example, how would we quantify if a double-wildcard caused entries subsequent on the PSL to be ignored due to parsing issues? We don't have the means to quantify that.

That said, I'm not fundamentally opposed to changing, so long as major platforms will end up supporting them.

@myronmarston
Copy link

@benkirzhner and I are working on an Elixir public suffix library and have found it much, much easier to support only a single left-most wildcard than to support multiple wildcards or wildcards at any position. If we only support a single wildcard at the left-most position, our algorithm can do a very simple, constant-time set membership check (e.g. ["*" | suffix] in wild_card_rules) to see if a domain matches a wild card rule. Full wildcard support as per the spec is much more complicated, and we would have to do a slow linear scan of all the rules to compare a domain against each or put the wildcard rules into some kind of a tree data structure and traverse that.

Given that up to now, the only usage of wildcards in the data file has been in the left-most position...is it really necessary to add extra complexity to every implementation to support wildcards at any level and multiple wildcards?

If it is decided to support the added complexity, I would ask that the tests be updated to include examples of multiple wildcards and wildcards at less common positions so implementors can use those in their test suites.

@rockdaboot
Copy link
Contributor

rockdaboot commented Apr 19, 2016

*.label and *.*.label would allow fast lookups as we do it now.

One or more wildcards somewhere inbetween (e.g. foo.*.bar) result in slower and more complex lookup algorithms.

Despite from that, we need new exception rules.
E.g. when having *.*.bar, we would need exceptions with wildcards like !*.doo-*.bar.

@weppos
Copy link
Member Author

weppos commented Feb 17, 2018

@rockdaboot @gerv @sleevi given:

  • the conversation above
  • most (if not all) the PSL consumers/libraries today assume one single *
  • this is consistent with other notations (e.g. the CA use of * in the subjectAltName and the dsName)

barring any objection I am going to update the format documentation and the site to permit the use of one single wildcard, and only as the left outermost label.

Should the need change, we can always revisit the decision. But it looks like there is no current practical application of it.

@peterthomassen
Copy link
Contributor

@weppos Is there any update on this (updating the documentation)? (There's also another correction outstanding, from #208.)

Inline wildcards pose a bunch of problems, as discussed above. To add one more: It's near impossible to incorporate inline wildcard rules in the PSL DNS lookup service that I set up, due to the constraint that in the DNS, wildcards can only appear in the leftmost label. I would feel a bit better about the "spec coverage" of the service if it was official that inline wildcards don't need to be considered.

@dnsguru dnsguru self-assigned this Feb 27, 2020
@dnsguru
Copy link
Member

dnsguru commented Feb 27, 2020

Nudging this.

@weppos wrote:

barring any objection I am going to update the format documentation and the site to permit the use of one single wildcard, and only as the left outermost label.

I support this change and agree this should be done. I will be working on documentation for #982 and can update the documentation to reflect what Simone said above within the PR I make for that once we settle on the wording for it - @sleevi any objection?

@sleevi
Copy link
Contributor

sleevi commented Feb 27, 2020

None here.

@dnsguru
Copy link
Member

dnsguru commented Feb 27, 2020

Excellent @sleevi, thx. Without letting my bias for action while I have some cycles to donate seem too cavalier, I think there is an opportunity to proceed. Considering that @weppos proposed the idea, I feel it safe to count that as a non-objection, and I think it is smart. Concensus.

I just hate closing the ones with Gerv's notes in them :(

@weppos
Copy link
Member Author

weppos commented Feb 28, 2020

Green light here too. Thanks for the help @dnsguru

@dnsguru
Copy link
Member

dnsguru commented Feb 28, 2020

I had proposed to update this with the other PR #982 but that is on the wiki and this is up on publicsuffix.org, so I did some updates that I'd like both of you @weppos and @sleevi to review, and then I will do the needed PR for those in that repo.

The following is the update I want to make, and I want to just be sure I have it right.

From "Specification" in https://publicsuffix.org/list, with removed text in strikethru and new text in bold.

Specification

  • The list is a set of rules, with one rule per line.

  • Each line is only read up to the first whitespace; entire lines can also be commented using //.

  • Each line which is not entirely whitespace or begins with a comment contains a rule.

  • Each rule lists a public suffix, with the subdomain portions separated by dots (.) as usual. There is no leading dot.

  • The wildcard character * (asterisk) matches any valid sequence of characters in a hostname part. (Note: the list uses Unicode, not Punycode forms, and is encoded using UTF-8.) Wildcards are not restricted to appear only in the leftmost position and , but they must wildcard an entire label. (I.e. ..foo is a valid rule: *bar.foo is not.) A Wwildcards may only be used to wildcard an entire level. That is, they must be surrounded must be delimited by a dot **on its right.**s (or implicit dots, if at the beginning of a line)

  • If a hostname matches more than one rule in the file, the longest matching rule (the one with the most levels) will be used.

  • An exclamation mark (!) at the start of a rule marks an exception to a previous wildcard rule. An exception rule takes priority over any other matching rule.

  • The list uses Unicode, not Punycode forms, and is encoded using UTF-8.
    The following characters are used explicitly, please avoid Unicode variants of the following: Space " " (Dec/Hex: 32/20), Exclamation "!" (Dec/Hex: 33/21), Forward Slash "/" (Dec/Hex: 47/2F), Period/Dot (Dec/Hex: 46/2E), and Asterisk "*" (Dec/Hex: 42/2A). When in doubt, use the ASCII characters between Dec/Hex 32/20 and 126/7E

  • Entries should not have trailing whitespace

Examples of valid entries and ! or * / wildcard usage:

Entry Valid/Invalid Why
*.foo Valid Correct Use of Wildcard
!specificsite.foo Valid Correct use of Exclamation to indicate exception to previous wildcard rule
*.bar.foo Valid Correct Use of Wildcard
*.예 Valid Correct Use of Wildcard
*.예.예 Valid Correct Use of Wildcard

Examples of invalid entries and ! or * / wildcard usage:

Entry Valid/Invalid Why
*.*.bar.foo Invalid Multiple Wildcard
bar.*.foo Invalid Nested Wildcard (must be in leftmost position, only)
*bar.foo Invalid Wildcard Non-delimited by dot
예.*.foo Invalid Nested Wildcard
ǃspecificsite.예.예 Invalid Used latin letter retroflex click unicode char (01C3) instead of exclamation (33)

@vdobler
Copy link

vdobler commented Mar 2, 2020

@weppos wrote:

barring any objection I am going to update the format documentation and the site to permit the use of one single wildcard, and only as the left outermost label.

No objection. We only handle single, leftmost wildcards so the documentation change is very welcomed.

@peterthomassen
Copy link
Contributor

"must be delimited by a dot on its right." sounds like something*.example.org would be a valid rule. That's probably not intended?

@dnsguru
Copy link
Member

dnsguru commented Mar 2, 2020 via email

@weppos
Copy link
Member Author

weppos commented Mar 3, 2020

I suggest we take a look at the DNS RFC and how this is described. We are basically shifting to the same behavior.

The wildcard character * (asterisk) matches any valid sequence of characters in a hostname part. It must be the left-outermost character and can only represent an entire label.

(assuming we defined what a label is)

@dnsguru
Copy link
Member

dnsguru commented Apr 5, 2020

If we were to take bat.bar.foo and say that each of bat, bar and foo are 'labels' within the definition of Mockapetris, RFC1034, section 4.3.3. "Wildcards" p25 * usage in a zone (paraphrasing this) but it states the wildcard is, "...always whole labels". It also goes on to define the explicit placement in the zones.

Maybe we just reference that the syntax for wildcards ("*") usage in the PSL are identical to those as defined in RFC1034 section 4.3.3, where it is

  • always in the left most position, and
  • always a whole label

@dnsguru
Copy link
Member

dnsguru commented Apr 5, 2020

The final version of the bullet about wildcard use would read like this:

  • The wildcard character * (asterisk) matches any valid sequence of characters in a hostname part. Wildcards in the PSL follow the syntax defined in RFC1034, section 4.3.3 (pp24-25), and are restricted to appear only in the leftmost position and must wildcard an entire label.

@rushmorem
Copy link
Contributor

I'm not sure if anyone is using this feature in their own lists but my Rust implementations, both static and dynamic, support multiple wildcards in any position as per the current spec.

@dnsguru
Copy link
Member

dnsguru commented Jun 3, 2021

I'm not sure if anyone is using this feature in their own lists but my Rust implementations, both static and dynamic, support multiple wildcards in any position as per the current spec.

That sounds really flexible, and the voluntary nature of downstream use or incorporation of the PSL is that it works like a buffet, essentially, where folks can put what they want on their tray or add their own.

This catalog and the maintainers (speaking for myself for sure) are completely non-prescriptive about what folks do downline from the PSL, but because the left-most only, single position wildcard is how DNS behaves, and this is the widest used approach that long-term PSL consumer/integrators have come to know and expect, we're really following that here due to the primary/legacy compatibility.

@rushmorem
Copy link
Contributor

@dnsguru I'm not personally against this change. I totally understand the motive behind it. I was merely pointing out that there might be users out there who might be impacted by this change, though I'm not aware of any.

@dnsguru
Copy link
Member

dnsguru commented Feb 8, 2022

We've not received any reports in the past 8 months on affected parties related to compressing to the single, leftmost-only wildcard rules, so this will be closed.

@dnsguru dnsguru closed this as completed Feb 8, 2022
@dnsguru dnsguru added this to To do in Meta Topics, Questions, Process via automation Feb 8, 2022
@dnsguru dnsguru moved this from To do to Done or Won't in Meta Topics, Questions, Process Feb 8, 2022
@peterthomassen
Copy link
Contributor

this will be closed.

Do I understand correctly that based on the wide consensus and the lack of objections / reports of affected parties, the spec will now be adapted to the phrasing of #145 (comment) (i.e. close + fix)?

For reference,

The wildcard character * (asterisk) matches any valid sequence of characters in a hostname part. Wildcards in the PSL follow the syntax defined in RFC1034, section 4.3.3 (pp24-25), and are restricted to appear only in the leftmost position and must wildcard an entire label.

@dnsguru
Copy link
Member

dnsguru commented Feb 8, 2022

Yes, with a microtweak to specify the unicode character for the asterisk... and I have added this onto the Wiki here in the format section, hoping to deprocate use of the legacy publicsuffix.org references by referring them to the github.com/publicsuffix/list wiki instead so that the content is more easily maintained within this project.

@peterthomassen
Copy link
Contributor

Awesome! 🎉 Thanks for the work.

peterthomassen added a commit to sse-secure-systems/publicsuffix.zone that referenced this issue Feb 8, 2022
Spec was changed such that what was previously a limitation of our
implentation is no longer allowed, so no limitations remain.
publicsuffix/list#145 (comment)
peterthomassen added a commit to sse-secure-systems/psl-dns that referenced this issue Feb 8, 2022
Spec was changed such that what was previously a limitation of our
implentation is no longer allowed, so no limitations remain.
publicsuffix/list#145 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
r=dnsguru Marked as approved and ready to merge by @dnsguru
Projects
Development

No branches or pull requests

10 participants