Skip to content
This repository has been archived by the owner on Nov 20, 2019. It is now read-only.

Parser bug when subdomain has "-" #43

Open
NizarBlond opened this issue Mar 1, 2019 · 1 comment
Open

Parser bug when subdomain has "-" #43

NizarBlond opened this issue Mar 1, 2019 · 1 comment

Comments

@NizarBlond
Copy link

The parser fails for the following:

// If the subdomain has "-"
$url = 'https://s3-ap-southeast-2.amazonaws.com/blabla/blabla/wp-content/uploads/media/2019/03/16860571424_31c94205de_b.jpg';

// Extract domain parts
$extract = new \LayerShifter\TLDExtract\Extract();
$domainParser = $extract->parse($url);

parse_url($url, PHP_URL_HOST); // s3-ap-southeast-2.amazonaws.com
$domainParser->getSubdomain(); // null 
@jkns
Copy link

jkns commented Mar 13, 2019

I don't believe this is an issue with hyphens, it's an issue with S3 domains.

s3-ap-southeast-2.amazonaws.com is defined as a private domain - https://github.com/publicsuffix/list/blob/master/public_suffix_list.dat#L10747

Once you parse the S3 domain you end up with:

subdomain: null
hostname: s3-ap-southeast-2.amazonaws.com
suffix: null

So you could use $domainParser->getHostname().

If you don't care about private domains you can do this:

$url = 'https://s3-ap-southeast-2.amazonaws.com/blabla/blabla/wp-content/uploads/media/2019/03/16860571424_31c94205de_b.jpg';

// Extract domain parts
$extract = new \LayerShifter\TLDExtract\Extract(null, null, \LayerShifter\TLDExtract\Extract::MODE_ALLOW_ICCAN);
$domainParser = $extract->parse($url);

$domainParser->getSubdomain(); // s3-ap-southeast-2 

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants