New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prevailing_rule/2 and matches_explicit_rule?/1 #17

Merged
merged 1 commit into from Jun 3, 2016

Conversation

Projects
None yet
2 participants
@andersju
Contributor

andersju commented May 27, 2016

This fixes #15 by adding prevailing_rule/2 (returns prevailing rule of domain) and matches_explicit_rule? (checks whether domain matches explicit rule). E.g.:

iex(1)> PublicSuffix.prevailing_rule(".com")                 
nil
iex(2)> PublicSuffix.prevailing_rule("foobar.com")
"com"
iex(3)> PublicSuffix.prevailing_rule("foobar.co.uk")
"co.uk"
iex(4)> PublicSuffix.prevailing_rule("com")         
"com"
iex(5)> PublicSuffix.prevailing_rule("foobar.example")
"*"
iex(6)> PublicSuffix.prevailing_rule("ck")            
"*"
iex(7)> PublicSuffix.prevailing_rule("www.ck")
"ck"
iex(8)> PublicSuffix.prevailing_rule("foobar.net.ck")
"net.ck"
iex(9)> PublicSuffix.matches_explicit_rule?(".com")          
false
iex(10)> PublicSuffix.matches_explicit_rule?("foobar.com")
true
iex(11)> PublicSuffix.matches_explicit_rule?("ck")        
false
iex(12)> PublicSuffix.matches_explicit_rule?("www.ck")
true
iex(13)> PublicSuffix.matches_explicit_rule?("foobar.net.ck")
true
iex(14)> PublicSuffix.matches_explicit_rule?("foobar.example")
false

Tests also added. prevailing_rule is tested in test/generated_test_cases.exs. It should return the same as public_suffix, except for a few cases (e.g. when the domain is unlisted) where it should return "*"; these are handled separately in test/public_suffix_test.exs, where there are also some tests for matches_explicit_rule?.

(Disclaimer: I'd never touched ExUnit before but I think it's OK :))

|> String.split(".")
|> find_prevailing_rule(options)
|> Enum.join(".")
end

This comment has been minimized.

@myronmarston

myronmarston May 31, 2016

Contributor

I think it's confusing to overload parse_domain to do entirely separate things based on the flag provided as the last argument. Instead, WDYT about removing the last arg to parse_domain and inlining this logic in prevailing_rule/2?

This comment has been minimized.

@andersju

andersju May 31, 2016

Contributor

Yep. Will do.

"github.io"
"""
@spec prevailing_rule(String.t) :: nil | String.t
@spec prevailing_rule(String.t, options) :: nil | String.t

This comment has been minimized.

@myronmarston

myronmarston May 31, 2016

Contributor

In what situations will this function return nil? The fallback rule is "*" so I don't see how nil is a possible return value.

This comment has been minimized.

@myronmarston

myronmarston May 31, 2016

Contributor

Nevermind, I see that nil is returned if the input itself is in the invalid form containing a leading dot. That case is weird and we only support it because the spec mentions it :(.

Do you mind updating the doc to mention when nil can be returned since the description (If no rules match, the prevailing rule is "*") makes it sound like nil will never be returned?

This comment has been minimized.

@andersju

andersju May 31, 2016

Contributor

Sure. By the way, I'll also kill the double @spec lines if you don't mind. If my understanding is correct there should only be one @spec per function, no matter whether an argument is optional or not (still the same arity). E.g. there's no function public_suffix/1 so @spec public_suffix(String.t) :: nil | String.t is wrong. I think...

This comment has been minimized.

@myronmarston

myronmarston Jun 1, 2016

Contributor

there's no function public_suffix/1 so @SPEC public_suffix(String.t) :: nil | String.t is wrong. I think...

That's not my understanding at all. There absolutely is a public_suffix/1, which the Elixir compiler is defining as:

def public_suffix(domain), do: public_suffix(domain, [])

Elixir/Erlang have no concept of a function with variable arity. Elixir only supports optional args w/ defaults as a way to cut out the boilerplate of having to define the delegating functions yourself.

That said, I took a look at the source for some functions in Elixir that use default args and there's only one @spec there (for the full list of args), so I think it might take care of the @spec for the alternate arity for you. So yes, feel free to delete the version of the @spec with fewer arguments.

This comment has been minimized.

@andersju

andersju Jun 1, 2016

Contributor

Right you are! Which PublicSuffix.module_info confirms. I did know about no variable arity, but was confused as to how defaults were handled. Sorry, I didn't mean to nitpick -- I'm learning the language as I go along. Which probably means I should talk less and code more...

nil -> false
"*" -> false
_ -> true
end

This comment has been minimized.

@myronmarston

myronmarston May 31, 2016

Contributor

This can be simplified to:

!(prevailing_rule(domain) in [nil, "*"])
"""
@spec matches_explicit_rule?(String.t | nil) :: boolean
def matches_explicit_rule?(nil), do: false
def matches_explicit_rule?(domain) when is_binary(domain) do

This comment has been minimized.

@myronmarston

myronmarston May 31, 2016

Contributor

Should this accept an options arg so the user can decide if private rules should be considered or not?

This comment has been minimized.

@andersju

andersju May 31, 2016

Contributor

I was thinking about that, but wondered whether it would matter - is it possible that there could be a private domain that would not match an explicit non-private rule if ignore_private: true was passed?

This comment has been minimized.

@myronmarston

myronmarston May 31, 2016

Contributor

I don't see why it wouldn't be possible.

This comment has been minimized.

@andersju

andersju May 31, 2016

Contributor

I figured private domains, by definition, must include a public suffix (from the submission page: "owners of privately-registered domains who themselves issue subdomains to mutually-untrusting parties may wish to be added to the PRIVATE section of the list.")

This comment has been minimized.

@myronmarston

myronmarston May 31, 2016

Contributor

I think there's value in API parity with the other functions, though. In addition, while the policies of publicsuffix.org suggest that all private domains should have a shorter non-private rule as well, it's not something we validate or enforce in our library--so at any point in time there could be a case in the data file of a private rule lacking a corresponding public rule.

This comment has been minimized.

@andersju

andersju May 31, 2016

Contributor

Ok - that makes sense.

@@ -37,6 +37,10 @@ defmodule PublicSuffix.TestCaseGenerator do
input: input,
registrable_domain_output: registrable_domain_output,
public_suffix_output: public_suffix_output(registrable_domain_output, input),
# prevailing_rule/2 should always return the same as public_suffix/2 except
# when the domain doesn't match an explicit rule, so the following should be fine,

This comment has been minimized.

@myronmarston

myronmarston May 31, 2016

Contributor

prevailing_rule/2 should always return the same as public_suffix/2 except
when the domain doesn't match an explicit rule

I don't think this is true: there are rules like *.bd and !www.ck, which are clearly different than what the matching public suffix will be.

Actually, your writing this comment made me realize that our implementation of prevailing_rule/2 is buggy! Specifically, the find_prevailing_rule/2 helper function does not return a rule (except in the specific case when no explicit rule matches, and then it returns ["*"]). It returns the domain parts that match the prevailing rule. For example, the @wild_card_rules[["*" | suffix]] in allowed_rule_types -> domain_labels clearly returns the labels from the input domain, not the wild card rule itself. Similarly, find_prevailing_exception_rule does not return the rule, but instead returns the part of the domain that the algorithm says to count as the public suffix according to the rule.

To properly support prevailing_rule/2 I think we need to actually have find_prevailing_rule return the matching rule, but to do that I think the return type needs to be richer so the caller can distinguish between an exception rule and a normal rule (which affects how the rule is used). Maybe we can refactor things so that a tuple like {:normal, ["*", "bd"]} or {:exception, ["www", "ck"]} is returned by the private find_prevailing_rule/2 function? I think that would support both the needs of the existing public APIs as well as this new function you are adding.

Thoughts?

This comment has been minimized.

@andersju

andersju May 31, 2016

Contributor

That sounds like a good idea. I'll have a look at it in the coming days and put together a new pull request (and also fix the other things you mentioned - thanks for the feedback!)

@andersju andersju force-pushed the andersju:prevailing-and-explicit branch from 400c9f3 to 7520462 Jun 2, 2016

@andersju

This comment has been minimized.

Contributor

andersju commented Jun 2, 2016

Alright, should be better now. I did a few slight modifications to make find_prevailing_rule return a tuple with the actual matching rule, as suggested. Also removed the prevailing_rule/2 tests from generated_test_cases.exs (as it was probably just messy and confusing with the exceptions), and instead added more in public_suffix_test.exs.

@andersju andersju force-pushed the andersju:prevailing-and-explicit branch from 7520462 to e88871d Jun 2, 2016

# TODO: "Wildcards are not restricted to appear only in the leftmost position"
@wild_card_rules[["*" | suffix]] in allowed_rule_types -> domain_labels
@wild_card_rules[["*" | suffix]] in allowed_rule_types -> {:normal, ["*"] ++ suffix}

This comment has been minimized.

@myronmarston

myronmarston Jun 2, 2016

Contributor

I recommend using ["*" | suffix] over ["*"] ++ suffix. The ++ operator is O(n) over the size of the list on the left, whereas [a | b] is highly optimized and is always constant time.

In this case, the list of the left has only one element so I doubt there's a perf difference, but as a a matter of habit I think it's best to prefer the [a | b] form when it can be used, as it can here.

|> find_prevailing_rule(options)
|> case do
{:exception, rule} -> "!" <> Enum.join(rule, ".")
{_, rule} -> Enum.join(rule, ".")

This comment has been minimized.

@myronmarston

myronmarston Jun 2, 2016

Contributor

I think I'd prefer matching on {:normal, rule} here (and in the similar case statement below) so as to be more explicit and assertive (which is one of the values of elixir).

end
@doc """
Checks whether the provided domain matches an existing rule in the

This comment has been minimized.

@myronmarston

myronmarston Jun 2, 2016

Contributor

s/existing/explicit/

@myronmarston

This comment has been minimized.

Contributor

myronmarston commented Jun 2, 2016

Well done @andersju! I left a few suggestions but this LGTM otherwise.

Add prevailing_rule/2 and matches_explicit_rule?/2 and add tests for the
same; make find_prevailing_rule return actual matching rule; update README

@andersju andersju force-pushed the andersju:prevailing-and-explicit branch from e88871d to fbbcacd Jun 2, 2016

@andersju

This comment has been minimized.

Contributor

andersju commented Jun 2, 2016

@myronmarston: I fixed the last few things you mentioned. Thanks for the feedback, very helpful!

@myronmarston

This comment has been minimized.

Contributor

myronmarston commented Jun 3, 2016

LGTM. Thanks, @andersju!

@myronmarston myronmarston merged commit 917dc81 into seomoz:master Jun 3, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

myronmarston added a commit that referenced this pull request Jun 3, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment