Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Atrox\Matcher

Downloads this Month Build Status License

Matcher - powerful tool for extracting data from XML and HTML using XPath and pure magic.

Why was Matcher made (czech), XPath intro (czech)

Installation:

Install Matcher using Composer:

composer require atrox/matcher

Examples:

use Atrox\Matcher;

$m = Matcher::multi('//div[@id="siteTable"]/div[contains(@class, "thing")]', [
  'id'    => '@data-fullname',
  'title' => './/p[@class="title"]/a',
  'url'   => './/p[@class="title"]/a/@href',
  'date'  => './/time/@datetime',
  'img'   => 'a[contains(@class, "thumbnail")]/img/@src',
  'votes' => (object) [
    'ups'   => '@data-ups',
    'downs' => '@data-downs',
    'rank'  => 'span[@class="rank"]',
    'score' => './/div[contains(@class, "score")]',
  ],
])->fromHtml();

$f = file_get_contents('http://www.reddit.com/');

$extractedData = $m($f);

result:

[
  [
    "id"    => "t3_1ep0c5",
    "title" => "Obligatory funny cat pictures.",
    "url"   => "http://imgur.com/sGu0pEk",
    "date"  => "2013-05-20T14:16:24+00:00",
    "img"   => "http://e.thumbs.redditmedia.com/MZjtg3UnZ8MOVjcd.jpg",
    "votes" => (object) [
      "ups"   => "115036",
      "downs" => "10266",
      "rank"  => "1",
      "score" => "105650"
    ]
  ],
  [
    ...
  ]
]

Matchers can be arbitrarily chained and nested.

$postMatcher = Matcher::single('.//div[@class="postInfo desktop"]', [
  'id'   => './input/@name',
  'name' => './span[@class="nameBlock"]/span[@class="name"]',
  'date' => './span/@data-utc',
]);

$m = Matcher::multi('//div[@class="thread"]', [
  'op'      => Matcher::single('./div[@class="postContainer opContainer"]', $postMatcher),
  'replies' => Matcher::multi('./div[@class="postContainer replyContainer"]', $postMatcher)
])->fromHtml();

$f = file_get_contents('http://boards.4chan.org/po/');

$extractedData = $m($f);

result:

[
  [
    "op" => [
      "id"   => "481874858",
      "name" => "Anonymous",
      "date" => "1369242761"
    ],
    "replies" => [
      [
        "id"   => "481879347",
        "name" => "moot",
        "date" => "1369244554"
      ],
      ...
    ]
  ],
  [
    ...
  ],
  ...
]

Use with external parsers:

Because Matcher is internally working with DOMDocument or SimpleXML objects it's possible to use it with external HTML/XML parsers such as html5-php.

$html5 = new Masterminds\HTML5(['disable_html_ns' => true]);
$dom = $html5->loadHTML($html);

$m = Matcher::single('//h1');
$title = $m($dom);

About

Powerful tool for extracting data from XML and HTML using XPath and pure magic

Resources

License

You can’t perform that action at this time.