Extracts metadata (title, description, Open Graph etc) from the content of a web page.
Note that this library simply deals with raw HTML, rather than try to tie you down to one particular method for retrieving the content of an external URL. (I usually use Guzzle, but to make it a dependency might cause difficulties in terms of versioning.)
composer require lukaswhite/php-meta-tags-parser
use Lukaswhite\MetaTagsParser\Parser;
$html = '<html><head>...</head></html>';
$parser = new Parser();
$result = $parser->parse($html);
The parse()
method returns an object that encapsulates any page data it's extracted from the provided HTML.
$result->getTitle();
$result->getDescription();
$result->getKeywords();
$result->getUrl();
$result->getFacebookAppId();
$result->openGraph()->getSiteName();
$result->openGraph()->getType();
$result->openGraph()->getTitle();
$result->openGraph()->getDescription();
$result->openGraph()->getLocale();
$result->openGraph()->getImages(); // returns an array of URLs
$result->openGraph()->getLatitude();
$result->openGraph()->getLongitude();
$result->openGraph()->getAltitude();
$result->toArray(); // all of the extracted metadata
It will also extract RSS and/or Atom feeds; getFeeds()
returns an array of instances of the Feed
class:
$feed->getType(); // Feed::RSS or Feed::ATOM
$feed->isRSS();
$feed->isAtom();
$feed->getUri();
$feed->getTitle();
The getFeeds()
method accepts an optional $type
argument, to choose one or the other:
$result->getFeeds(Feed::RSS);
// or
$result->getFeeds(Feed::ATOM);
The package ships with a very simple string cleanser; essentially it just decodes any HTML entities. You're free to provide your own cleanser; just implement the CleansesStrings
interface, and provide an instance to the parser's constructor. It simply needs to provide a run()
method, that accepts a string and returns the cleansed version.
The package ships with a very simple string sanitzer; under the hood it simply uses the strip_tags()
function. If you wish to provide your own sanitizer, just implement the SanitizesStrings
interface, and provide an instance to the parser's constructor. It simply needs to provide a run()
method, that accepts a string and returns the sanitized version.