Skip to content

Latest commit

 

History

History
77 lines (64 loc) · 2.53 KB

README.md

File metadata and controls

77 lines (64 loc) · 2.53 KB

PHP-MetaParser

Inspired by Facebook's link sharing flow, this abstractly accessed class attempts to parse a document (x/html), and retrieve it's meta-information. I emphasize attempts, as x/html documents are exceptionally tough to parse, and data is often lost due to the content structuring delivered.

This class, as seen by the example below, works very well when coupled with the PHP-Curler class.
The following is an example of how data is returned, using http://www.bbc.com/ as an example:

Array
(
    [base] => http://www.bbc.com/
    [favicon] => http://www.bbc.co.uk/favicon.ico
    [meta] => Array
        (
            [description] => Breaking news, sport, ...
            [keywords] => Array
                (
                    [0] => BBC
                    [1] => bbc.co.uk
                    ...
                    [6] => BBCi
                )

        )

    [images] => Array
        (
            [0] => http://sa.bbc.co.uk/bbc/bbc/s?name=home.page&geo_edition=us&ml_name=barlesque&app_type=web&language=en-GB&ml_version=0.6.3
            [1] => http://static.bbc.co.uk/frameworks/barlesque/1.21.3/desktop/3/img/blocks/light.png
            [2] => http://static.bbc.co.uk/wwhomepage-3.5/ic/news/432-259/57632000/jpg/_57632639_013603124-1.jpg
            [3] => http://static.bbc.co.uk/wwhomepage-3.5/ic/news/432-259/57626000/jpg/_57626527_57626526.jpg
            ...
            [25] => http://me.effectivemeasure.net/em_image
        )

    [openGraph] => Array
        (
            [title] => BBC - Homepage
            [type] => website
            [image] => http://static.bbc.co.uk/wwhomepage-3.5/1.0.29/img/iphone.png
            [url] => http://www.bbc.co.uk/
        )

    [title] => BBC - Homepage
    [url] => http://www.bbc.com/
)

Parsing Example

The following code uses the PHP-Curler class to curl the BBC site, store it's content, and pass it along to a MetaParser instance. The URL is passed along as well to ensure any paths (favicons, images) are rewritten relative to the path of the document that was parsed.

<?php

    // booting
    require_once APP . '/vendors/PHP-Curler/Curler.class.php';
    require_once APP . '/vendors/PHP-MetaParser/MetaParser.class.php';
    
    // curling
    $curler = new Curler();
    $url = 'http://www.bbc.com/';
    $body = $curler->get($url);
    $parser = new MetaParser($body, $url);
    print_r($parser->getDetails());