Handling suffixes not in the public list #15

andersju · 2016-05-20T12:46:40Z

I think it would make sense for the library to handle domains that don't have a valid public suffix (possibly as an option).

Right now (with 0.2.0):

> PublicSuffix.registrable_domain("foobar.thistlddoesntexist")
"foobar.thistlddoesntexist"

Suggested:

> PublicSuffix.registrable_domain("foobar.thistlddoesntexist")
:undefined

Or something like that. This is how publicsuffix-erl does it, e.g.:

> :tld.domain("foobar.thistlddoesntexist")
:undefined

The text was updated successfully, but these errors were encountered:

myronmarston · 2016-05-20T15:34:40Z

The publicsuffix.org spec specifies what compliant libraries are supposed to do in this situation:

If no rules match, the prevailing rule is "*".

(See number 2 under "Algorithm"). We also rely on the test file they provide, which includes test cases for this case (see "Unlisted TLD" section). In order to pass those tests, we must follow the published spec.

andersju · 2016-05-20T15:44:30Z

Ah, I see! Still, perhaps it could be an optional parameter disabled by default? The library could then be used to answer a question like "Does this look like a sane public domain name?", which can sometimes be handy. I understand if you don't want to add it, though. Thanks for the module, anyway, I used publicsuffix-erl before but it gave me headaches with exrm.

Edit: Or just another function, e.g. PublicSuffix.is_suffix_valid?("foobar.commmm"). Would be great for validation purposes where you just need to do a "good enough" check.

myronmarston · 2016-05-20T18:29:33Z

I used publicsuffix-erl before but it gave me headaches with exrm.

Yep, same thing happened with us. That's what led to us creating this library.

Anyhow, one of the issues with returning nil or a function like is_suffix_valid? is that we don't actually have a way to know if it's valid. As I understand the public suffix spec and rules, the reason for the If no rules match, the prevailing rule is "*" bit of the spec is that normally public suffixes have only one part. It creates for a smaller rules list if only the exceptions to that norm have to be listed. Given the language in the spec, I believe the people who maintain the public suffix list assume they do not have to list every one-part public suffix in the list--because compliant libraries will treat that as valid w/o needing a specific rule.

All that's to say: if a domain string does not match any rules in the list, I don't think that necessarily means it is invalid, and I wouldn't want to expose an API that claims it as such.

I think we can give you what you want, with a slightly different API, though:

> PublicSuffix.prevailing_rule("foobar.thistlddoesntexist")
["*"]

Basically, we can expose a function that gives you the prevailing rule. If we return ["*"] then it means we had to fallback because none of the explicitly defined rules matched. In your application, you could choose to interpret this as meaning the provided domain string is invalid.

On a side note, @benkirzhner have been discussing if we should switch to a single parse function that returns a struct. Something like:

> PublicSuffix.parse("mysite.foo.bar.co.uk")
%PublicSuffix.Domain{
  public_suffix: "co.uk",
  registrable_domain: "bar.co.uk",
  prevailing_rule: ["co", "uk"]
}

Thoughts?

andersju · 2016-05-20T19:31:17Z

Basically, we can expose a function that gives you the prevailing rule. If we return ["*"] then it means we had to fallback because none of the explicitly defined rules matched. In your application, you could choose to interpret this as meaning the provided domain string is invalid.

That makes total sense and (since I've now learned a bit about the spec) it seems more reasonable and elegant than the solutions I've seen in similar libraries for other languages.

On a side note, @benkirzhner have been discussing if we should switch to a single parse function that returns a struct.

Sounds like a good idea! Fits well with how Elixir's URI module (which one might want to use at the same time) does it.

myronmarston · 2016-05-20T19:39:45Z

Would you be interested in working up a PR to make the discussed changes, @andersju?

andersju · 2016-05-20T20:30:31Z

Sure thing. I'm still fairly new to Elixir but I'll give it a go (might not have time until a week or two from now though).

bkirz · 2016-05-24T18:50:03Z

Thanks for taking this on, @andersju! I just had a chat with @myronmarston, and we think a good plan for now would be to expose two new functions:

a general-purpose function, PublicSuffix.prevailing_rule(domain), that returns the prevailing rule as a string.
a predicate function, PublicSuffix.matches_explicit_rule?(domain), for explicitly checking whether or not the domain matches an existing rule.

This way, we get the flexibility of being able to see the prevailing rule (for debugging purposes, for example) while having a simpler API for the use case you originally described that doesn't require explicit knowledge of the fallback "*" case. How does that sound?

andersju · 2016-05-25T01:37:18Z

Sounds good! I already cooked up a quick solution (but need to write tests and docs). Question though: should PublicSuffix.matches_explicit_rule?(domain) extract just the public suffix (like public_suffix/2) and check whether that matches an explicit rule, or extract the registrable part of a domain (like registrable_domain/2) and check whether its public suffix matches an explicit rule?

I think the latter would be more useful:

iex(6)> PublicSuffix.registrable_domain("example.com")
"example.com"
iex(7)> PublicSuffix.registrable_domain("example")    
nil
iex(8)> PublicSuffix.matches_explicit_rule?("example.com")
true
iex(9)> PublicSuffix.matches_explicit_rule?("com")        
false

...but matches_explicit_rule? is perhaps then slightly misleading. What I'm after is really registrable_domain_matches_explicit_rule?, but that's a bit verbose :) Although, come to think of it, if we go with the former and matches_explicit_rule? handles nil and treats it as false you could do:

iex(7)> "example" |> PublicSuffix.registrable_domain |> PublicSuffix.matches_explicit_rule?    
false
iex(8)> "example.com" |> PublicSuffix.registrable_domain |> PublicSuffix.matches_explicit_rule?
true

What do you think?

myronmarston · 2016-05-25T06:06:01Z

Question though: should PublicSuffix.matches_explicit_rule?(domain) extract just the public suffix (like public_suffix/2) and check whether that matches an explicit rule, or extract the registrable part of a domain (like registrable_domain/2) and check whether its public suffix matches an explicit rule?

I confess I don't understand the distinction you are making. Both public_suffix and registrable_domain begin by finding the prevailing rule for the given input, and they will always find the same rule given the same input. The difference between the functions is what they do with the prevailing rule: given a prevailing rule of n parts, public_suffix will return the last n parts of the input domain, whereas registrable_domain will return the last n + 1 parts.

So I don't see how public_suffix vs registrable_domain plays into whether or not a provided string matches an explicit rule.

But maybe I'm missing something?

andersju · 2016-05-25T13:02:08Z

Basically, I want to determine "does the given string contain a registrable part and, if so, does its public suffix match an explicit rule?". So I want "example", "example.notatld", "com" to evaluate to false. But I figure a matches_explicit_rule? function ought to return false, false, true, respectively.

So, to answer the initial question, I think we could either create an additional function or simply make it possible to pipe the output from registrable_domain, as registrable_domain returns nil if there's no registrable part. I.e.:

> "example" |> PublicSuffix.registrable_domain
nil
> "example" |> PublicSuffix.registrable_domain |> PublicSuffix.matches_explicit_rule?
false

Make sense?

andersju · 2016-05-25T15:02:03Z

Actually, PublicSuffix.matches_explicit_rule? treating nil as valid input might just make it unnecessarily ugly. For the use case in the previous post it's easy enough to handle it on your own if you want the piping ability (e.g. "example" |> PublicSuffix.registrable_domain |> (&(if &1 == nil, do: "", else: &1)).() |> PublicSuffix.matches_explicit_rule?).

myronmarston · 2016-05-26T18:12:51Z

Thanks for explaining. I think I understand your use case better now. I was confused previously because it sounded like you were trying to decide if PublicSuffix.matches_explicit_rule? should first extract the registrable_domain or the public_suffix but it sounds like we're on the same page about doing neither :).

Actually, PublicSuffix.matches_explicit_rule? treating nil as valid input might just make it unnecessarily ugly.

With pattern matching, it's really not bad...we just need a function clause like:

def matches_explicit_rule?(nil), do: false

While I don't want to support every elixir type, nil seems OK; and the answer to "Does nil match an explicit rule?" is "no".

myronmarston · 2016-06-03T15:00:16Z

I've released 0.3.0 with your changes.

andersju mentioned this issue May 27, 2016

Add prevailing_rule/2 and matches_explicit_rule?/1 #17

Merged

myronmarston closed this as completed in #17 Jun 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling suffixes not in the public list #15

Handling suffixes not in the public list #15

andersju commented May 20, 2016

myronmarston commented May 20, 2016

andersju commented May 20, 2016 •

edited

Loading

myronmarston commented May 20, 2016

andersju commented May 20, 2016

myronmarston commented May 20, 2016

andersju commented May 20, 2016

bkirz commented May 24, 2016

andersju commented May 25, 2016 •

edited

Loading

myronmarston commented May 25, 2016

andersju commented May 25, 2016

andersju commented May 25, 2016

myronmarston commented May 26, 2016

myronmarston commented Jun 3, 2016

Handling suffixes not in the public list #15

Handling suffixes not in the public list #15

Comments

andersju commented May 20, 2016

myronmarston commented May 20, 2016

andersju commented May 20, 2016 • edited Loading

myronmarston commented May 20, 2016

andersju commented May 20, 2016

myronmarston commented May 20, 2016

andersju commented May 20, 2016

bkirz commented May 24, 2016

andersju commented May 25, 2016 • edited Loading

myronmarston commented May 25, 2016

andersju commented May 25, 2016

andersju commented May 25, 2016

myronmarston commented May 26, 2016

myronmarston commented Jun 3, 2016

andersju commented May 20, 2016 •

edited

Loading

andersju commented May 25, 2016 •

edited

Loading