Net::Domain::PublicSuffix - Fast XS implementation of public_suffix and base_domain
use Net::Domain::PublicSuffix qw(base_domain public_suffix);
my $d0 = public_suffix("www.foo.com");
my $d1 = base_domain("www.foo.com");
# $d0 and $d1 equal "foo.com"
my $d2 = public_suffix("www.smms.pvt.k12.ca.us");
my $d3 = base_domain("www.smms.pvt.k12.ca.us");
# $d2 and $d3 equal "smms.pvt.k12.ca.us"
my $d4 = public_suffix("www.whitbread.co.uk");
my $d5 = base_domain("www.whitbread.co.uk");
# $d4 and $d5 equal "whitbread.co.uk"
my $d6 = public_suffix("www.foo.zz");
my $d7 = base_domain("www.foo.zz");
# $d6 eq "" because .zz is not a valid TLD
# $d7 eq "foo.zz"
Net::Domain::PublicSuffix finds the public suffix, or top level domain (TLD), of a given hostname name.
$public_suffix = public_suffix($hostname)
Given a hostname return the TLD (top level domain). Returns the empty string for hostnames with an invalid public suffix.
public_suffix() is not an exact replacement for Mozilla::PublicSuffix. See the tests run in publicsuffix.t for notable differences. I think some of the tests from publicsuffix.org are just wrong. For instance, publicsuffix.org thinks that "example.example" (a non-existance TLD) should pass, but "test.om" (a non-existent second level domain for the valid TLD om) should not.
$tld = base_domain($hostname)
Given a hostname return the TLD (top level domain).
This function is more permissive than public_suffix in that it will always try to return a reasonable answer. public_suffix returns an answer even when the given hostname does not have a valid TLD (for example www.foo.xx returns foo.xx) or is missing a required sub domain (for example ak.cy returns the incomplete ak.cy).
base_domain() will treat truncated TLDs as valid. For instance base_domain("com.bd") will return "com.bd" but public_suffix("com.bd") will return "" (empty string) because the TLD rules stipulate there should be a third level (i.e. "foo.com.bd") to be valid.
$bool = has_valid_tld($hostname)
Returns true if the domain of the provided string exists in the list of valid top level domains. The list of valid domains is constructed from the list of public_suffix rules.
@tld_list = all_valid_tlds();
Return a list of all valid top level domains.
Initialize the base domain trie. This function will get called the first time base_domain() is called. This function is made public so that the trie can be initialized manually before any time critical code.
The list of TLD rules is generated primarily from the Public Suffic list from publicsuffix.org and can be found at https://publicsuffix.org/list/effective_tld_names.dat
Previously rules were generated from the list in the Mozilla source http://lxr.mozilla.org/mozilla/source/netwerk/dns/src/effective_tld_names.dat The publicsuffix.org list now supersceeds the Mozilla list.
Additional research was done via the Wikipedia (for example http://en.wikipedia.org/wiki/.uk) and by consulting the actual NICs that assign domains (for example http://www.kenic.or.ke/).
The United States of America has some unique rule formats (see http://en.wikipedia.org/wiki/.us). Including wildcards in the middle of the TLD. For example in the pattern ci...us, is one of a fixed set of valid state abbreviations, but is effectively a wildcard city/town/county/etc, followed by a fixed list of oranizational types (ci, town, vil, co).
The Mozilla Public Suffix implementation ignores these patterns and just adds all the known combinations via brute force. This package honors wildcards mid-pattern.
There are some rules that Net::Domain::PublicSuffix has added to the list of rules from publicsuffix.org. These rules are the result of additional research. For instance http://en.wikipedia.org/wiki/.mt lists gov.mt as a valid TLD, but it is missing from the publicsuffix.org list.
These rule lists are kept separate in the code to make future upgrades easier. There are two lists: @publicsuffix_rules that are autogenerated from the publicsuffix.org list and @special_rules for these additional missing rules.
Net::Domain::PublicSuffix does not support punycode hostnames. Hostnames need to be decoded before calling base_domain().
See Ryan Sleevi's Public Suffix List Problems for potential problems with using the public suffix list.
Blekko.com
Mozilla::PublicSuffix, Domain::PublicSuffix, IO::Socket::SSL::PublicSuffix, ParseUtil::Domain, Net::Domain::Match
Of which Domain::PublicSuffix gets the answers right most of the time. The rest do not work for much more than the examples they provide, if any.