Skip to content

html extraction library, based on SimpleXml & nokogiri XpathSubquery.php

Notifications You must be signed in to change notification settings

fizzka/extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

extractor

html extraction library, based on SimpleXML & nokogiri XpathSubquery.php

Latest Stable Version Build Status Coveralls

Benefits

  • Simple
  • Minimal code
  • Fast
  • Query results are SimpleXMLElement instances
  • Supports nested css/xpath queries

Installation

#Using packagist:
composer require 'fizzka/extractor'

Basic Usage

<?php
require_once 'vendor/autoload.php';

$html = gzdecode(file_get_contents('http://habrahabr.ru/'));

$ex = Extractor::fromHtml($html);
var_dump($ex->get('a.habracut'));

Advanced Usage

echo $ex->cssPathFirst('div.post')->xpathFirst('.//@href');

foreach ($ex->cssPath('div.post') as $post) {
	var_dump($post->cssPathFirst('a.post_title'));
}

Testing

Just run phpunit from the top of project

Contribute

Feel free to use & contribute ;)

License

MIT

About

html extraction library, based on SimpleXml & nokogiri XpathSubquery.php

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages