Skip to content

randomstring/Net-Domain-PublicSuffix

Repository files navigation

NAME

Net::Domain::PublicSuffix - Fast XS implementation of public_suffix and base_domain

SYNOPSIS

use Net::Domain::PublicSuffix qw(base_domain public_suffix);

my $d0 = public_suffix("www.foo.com");
my $d1 = base_domain("www.foo.com");
# $d0 and $d1 equal "foo.com"

my $d2 = public_suffix("www.smms.pvt.k12.ca.us");
my $d3 = base_domain("www.smms.pvt.k12.ca.us");
# $d2 and $d3 equal "smms.pvt.k12.ca.us"

my $d4 = public_suffix("www.whitbread.co.uk");
my $d5 = base_domain("www.whitbread.co.uk");
# $d4 and $d5 equal "whitbread.co.uk"

my $d6 = public_suffix("www.foo.zz");
my $d7 = base_domain("www.foo.zz");
# $d6 eq "" because .zz is not a valid TLD
# $d7 eq "foo.zz"

DESCRIPTION

Net::Domain::PublicSuffix finds the public suffix, or top level domain (TLD), of a given hostname name.

public_suffix()

$public_suffix = public_suffix($hostname)

Given a hostname return the TLD (top level domain). Returns the empty string for hostnames with an invalid public suffix.

public_suffix() is not an exact replacement for Mozilla::PublicSuffix. See the tests run in publicsuffix.t for notable differences. I think some of the tests from publicsuffix.org are just wrong. For instance, publicsuffix.org thinks that "example.example" (a non-existance TLD) should pass, but "test.om" (a non-existent second level domain for the valid TLD om) should not.

base_domain()

$tld = base_domain($hostname)

Given a hostname return the TLD (top level domain).

This function is more permissive than public_suffix in that it will always try to return a reasonable answer. public_suffix returns an answer even when the given hostname does not have a valid TLD (for example www.foo.xx returns foo.xx) or is missing a required sub domain (for example ak.cy returns the incomplete ak.cy).

base_domain() will treat truncated TLDs as valid. For instance base_domain("com.bd") will return "com.bd" but public_suffix("com.bd") will return "" (empty string) because the TLD rules stipulate there should be a third level (i.e. "foo.com.bd") to be valid.

has_valid_tld()

$bool = has_valid_tld($hostname)

Returns true if the domain of the provided string exists in the list of valid top level domains. The list of valid domains is constructed from the list of public_suffix rules.

all_valid_tlds()

@tld_list = all_valid_tlds();

Return a list of all valid top level domains.

gen_basedomain_tree()

Initialize the base domain trie. This function will get called the first time base_domain() is called. This function is made public so that the trie can be initialized manually before any time critical code.

Rule Data

The list of TLD rules is generated primarily from the Public Suffic list from publicsuffix.org and can be found at https://publicsuffix.org/list/effective_tld_names.dat

Previously rules were generated from the list in the Mozilla source http://lxr.mozilla.org/mozilla/source/netwerk/dns/src/effective_tld_names.dat The publicsuffix.org list now supersceeds the Mozilla list.

Additional research was done via the Wikipedia (for example http://en.wikipedia.org/wiki/.uk) and by consulting the actual NICs that assign domains (for example http://www.kenic.or.ke/).

.us rules

The United States of America has some unique rule formats (see http://en.wikipedia.org/wiki/.us). Including wildcards in the middle of the TLD. For example in the pattern ci...us, is one of a fixed set of valid state abbreviations, but is effectively a wildcard city/town/county/etc, followed by a fixed list of oranizational types (ci, town, vil, co).

The Mozilla Public Suffix implementation ignores these patterns and just adds all the known combinations via brute force. This package honors wildcards mid-pattern.

Differences with Mozilla's PublicSuffix

There are some rules that Net::Domain::PublicSuffix has added to the list of rules from publicsuffix.org. These rules are the result of additional research. For instance http://en.wikipedia.org/wiki/.mt lists gov.mt as a valid TLD, but it is missing from the publicsuffix.org list.

These rule lists are kept separate in the code to make future upgrades easier. There are two lists: @publicsuffix_rules that are autogenerated from the publicsuffix.org list and @special_rules for these additional missing rules.

Net::Domain::PublicSuffix does not support punycode hostnames. Hostnames need to be decoded before calling base_domain().

CAVEATS

See Ryan Sleevi's Public Suffix List Problems for potential problems with using the public suffix list.

AUTHOR

Blekko.com

SEE ALSO

Mozilla::PublicSuffix, Domain::PublicSuffix, IO::Socket::SSL::PublicSuffix, ParseUtil::Domain, Net::Domain::Match

Of which Domain::PublicSuffix gets the answers right most of the time. The rest do not work for much more than the examples they provide, if any.

Net::IDN::Punycode, Net::IDN::Encode, IDNA::Punycode,

About

Perl module for computing the public suffix of a domain.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors