PHP port of html5lib, currently unmaintained.
Branch: master
Clone or download


html5lib - php flavour

This is an implementation of the tokenization and tree-building parts
of the HTML5 specification in PHP.  Potential uses of this library
can be found in web-scrapers and HTML filters.

Warning: This is a pre-alpha release, and as such, certain parts of
this code are not up-to-snuff (e.g. error reporting and performance).
However, the code is very close to spec and passes 100% of tests
not related to parse errors.  Nevertheless, expect to have to update
your code on the next upgrade.

Usage notes:

    require_once '/path/to/HTML5/Parser.php';
    $dom = HTML5_Parser::parse('<html><body>...');
    $nodelist = HTML5_Parser::parseFragment('<b>Boo</b><br>');
    $nodelist = HTML5_Parser::parseFragment('<td>Bar</td>', 'table');


    $text  : HTML to parse
    return : DOMDocument of parsed document

HTML5_Parser::parseFragment($text, $context)
    $text    : HTML to parse
    $context : String name of context element
    return   : DOMDocument of parsed document

Developer notes:

* To setup unit tests, you need to add a small stub file test-settings.php
  that contains $simpletest_location = 'path/to/simpletest/'; This needs to
  be version 1.1 (or, until that is released, SVN trunk) of SimpleTest.

* We don't want to ultimately use PHP's DOM because it is not tolerant
  of certain types of errors that HTML 5 allows (for example, an element
  "foo@bar"). But the current implementation uses it, since it's easy.
  Eventually, this html5lib implementation will get a version of SimpleTree;
  and may possibly start using that by default.

    vim: et sw=4 sts=4