Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to parse hostnames with leading . #1523

Closed
sammacbeth opened this issue Jan 19, 2023 · 2 comments · Fixed by #1553
Closed

Fails to parse hostnames with leading . #1523

sammacbeth opened this issue Jan 19, 2023 · 2 comments · Fixed by #1553

Comments

@sammacbeth
Copy link

Domains with leading . are not parsed by this library. e.g.

tldts.getDomain('.example.com') === null

It is not clear to me if a leading . is invalid in a hostname from the available specs. The URL spec is very loose on the definition of a valid hostname, and the implementation in browsers accepts such a hostname:

new URL('https://.example.com').hostname === '.example.com'

Additionally, the leading dot notation is commonly used for cookies which span all subdomains of a given domain. This kind of notation is acknowledged as possible in the domain part of a cookie string (though apparently ignore in modern implementations):

"Contrary to earlier specifications, leading dots in domain names (.example.com) are ignored."
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie

@remusao
Copy link
Owner

remusao commented Jan 29, 2023

Hey @sammacbeth,

I've done a bit of research and found some contradictory information on the matter, which is a bit puzzling. Let me share what I have and we can decide on the best way forward:

  1. It seems like a leading or trailing dot is valid for a hostname from what I can understand in https://url.spec.whatwg.org/#host-miscellaneous (and also testing with the browser implementation of new URL or the npm package whatwg-url).
  2. It seems from the spec of the public suffix list as well as whatwg-url spec that trailing dots are allowed and should be ignored
  3. When it comes to leading dots, the spec of the public suffix list specifies that "Empty labels are not permitted, meaning that leading and trailing dots are ignored." (it is unclear to me if they speak of the rules in publicsuffix list, or domains, or both). On the other hand, the tests from the Mozilla implementation specifically have test-cases that forbid leading dots.

So it seems that at least the hostname should be allowed to have a leading dot as well as trailing dots (although it's probably not expected for a hostname to have more than one, the whatwg-url does not seem to strictly forbid multiple trailing dots).

On the other hand, the public suffix list seems to forbid hostnames with leading dots. Since tldts can potentially afford to be a bit more lenient, for the sake of convenience, I guess we could allow both leading and trailing dots for all these fields: hostname, domain, publicSuffix, subdomain and domainWithoutSuffix.

This would probably warrant a breaking change version bump, though.

What do you think?

@remusao
Copy link
Owner

remusao commented Apr 1, 2023

@sammacbeth I have just published a new release which I believe should fix the issues reported here. Let me know how it works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants