Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue/inconsistent behavior for all *. rules #18

Closed
theagilehacker opened this issue Oct 20, 2021 · 3 comments
Closed

issue/inconsistent behavior for all *. rules #18

theagilehacker opened this issue Oct 20, 2021 · 3 comments

Comments

@theagilehacker
Copy link

if there is a rule like:
*.abc.com
I would expect that if you give it
substuf.def.abc.com that the public suffix should be def.abc.com.

from publicsuffixlist import PublicSuffixList

# RULES TESTED:
# *.awdev.ca
# *.advisor.ws
#
# *.compute.amazonaws.com
# *.compute-1.amazonaws.com
# *.compute.amazonaws.com.cn
#
# *.elb.amazonaws.com
# *.elb.amazonaws.com.cn

psl = PublicSuffixList()
input = [
    'test.awdev.ca',
    'test.advisor.ws',
    
    'test.compute.amazonaws.com',
    'test.compute-1.amazonaws.com',
    'test.compute.amazonaws.com.cn',
    
    'test.elb.amazonaws.com',
    'test.amazonaws.com.cn',
    
    # add another level and it gets weird
    'sub.test.awdev.ca',
    'sub.test.advisor.ws',
    
    'sub.test.compute.amazonaws.com',
    'sub.test.compute-1.amazonaws.com',
    'sub.test.compute.amazonaws.com.cn',

    'sub.test.elb.amazonaws.com',
    'sub.test.amazonaws.com.cn',
]


output = [(i, psl.privatesuffix(i)) for i in input]

for t in output:
    print(f'{t[0]} -> {t[1]}')

Output from the run:

test.awdev.ca -> None
test.advisor.ws -> None
test.compute.amazonaws.com -> None
test.compute-1.amazonaws.com -> None
test.compute.amazonaws.com.cn -> None
test.elb.amazonaws.com -> None
test.amazonaws.com.cn -> amazonaws.com.cn
sub.test.awdev.ca -> sub.test.awdev.ca
sub.test.advisor.ws -> sub.test.advisor.ws
sub.test.compute.amazonaws.com -> sub.test.compute.amazonaws.com
sub.test.compute-1.amazonaws.com -> sub.test.compute-1.amazonaws.com
sub.test.compute.amazonaws.com.cn -> sub.test.compute.amazonaws.com.cn
sub.test.elb.amazonaws.com -> sub.test.elb.amazonaws.com
sub.test.amazonaws.com.cn -> amazonaws.com.cn

I would have expected the first set to return the domains unchanged and the second set to return the part minus the sub. part.

in either case the behavior is inconsistent for 2 reasons:

  1. test.amazonaws.com.cn -> amazonaws.com.cn the return was not None like all the others.
  2. why are all the domains with sub returning unchanged? again with the sub.test.amazonaws.com.cn -> amazonaws.com.cn behaving differently.
@ko-zu
Copy link
Owner

ko-zu commented Oct 21, 2021

privatesuffix() returns None if input has no private part. test.amazonaws.com.cn does not match *.elb.amazonaws.com.cn so com.cn should be public part.
Please try publicsuffix() instead privatesuffix() to see where the PSL matches input.

@theagilehacker
Copy link
Author

Thank you for responding so quickly. I will try publicsuffix() and let you know what I get on my actual data.

@theagilehacker
Copy link
Author

You are correct there is not inconsistency. I messed up my test by accidentally removing the elb from that test domain passed in. So YAY, and egg on my face then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants