Wrappers around the PHP DOM classes
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src
tests
.gitattributes
.gitignore
.travis.yml
CHANGELOG.rst
LICENSE
README.rst
composer.json
phpunit.xml.dist

README.rst

DOM

Wrappers around the PHP DOM classes that handle the common DOM extension pitfalls.

https://travis-ci.com/kuria/dom.svg?branch=master

Features

  • HTML documents
    • encoding sniffing
    • optional tidy support (automatically fix broken HTML)
  • HTML fragments
  • XML documents
  • XML fragments
  • XPath queries
  • creating documents from scratch
  • optional error suppression
  • helper methods for common tasks, such as:
    • querying multiple or a single node
    • checking for containment
    • removing a node
    • removing all nodes from a list
    • prepending a child node
    • inserting a node after another node
    • fetching <head> and <body> elements (HTML)
    • fetching root elements (XML)

Requirements

  • PHP 7.1+

Container methods

These methods are shared by both HTML and XML containers.

Loading documents

<?php

use Kuria\Dom\HtmlDocument; // or XmlDocument, HtmlFragment, etc.

// using loadString()
$dom = new HtmlDocument();
$dom->setLibxmlFlags($customLibxmlFlags); // optional
$dom->setIgnoreErrors($ignoreErrors); // optional
$dom->loadString($html);

// using static loadString() shortcut
$dom = HtmlDocument::fromString($html);

// using existing document instance
$dom = new HtmlDocument();
$dom->loadDocument($document);

// using static loadDocument() shortcut
$dom = HtmlDocument::fromDocument($document);

// creating an empty document
$dom = new HtmlDocument();
$dom->loadEmpty();

Getting or changing document encoding

<?php

// get encoding
$encoding = $dom->getEncoding();

// set encoding
$dom->setEncoding($newEncoding);

Note

The DOM extension uses UTF-8 encoding.

This means that text nodes, attributes, etc.:

  • will be encoded using UTF-8 when read (e.g. $elem->textContent)
  • should be encoded using UTF-8 when written (e.g. $elem->setAttribute())

The encoding configured by setEncoding() is used when saving the document, see Saving documents.

Saving documents

<?php

// entire document
$content = $dom->save();

// single element
$content = $dom->save($elem);

// children of a single element
$content = $dom->save($elem, true);

Getting DOM instances

After a document has been loaded, the DOM instances are available via getters:

<?php

$document = $dom->getDocument();
$xpath = $dom->getXpath();

Running XPath queries

<?php

// get a DOMNodeList
$divs = $dom->query('//div');

// get a single DOMNode (or null)
$div = $dom->query('//div');

// check if a query matches
$divExists = $dom->exists('//div');

Escaping strings

<?php

$escapedString = $dom->escape($string);

DOM manipulation and traversal helpers

Helpers for commonly needed tasks that aren't easily achieved via existing DOM methods:

<?php

// check if the document contains a node
$hasNode = $dom->contains($node);

// check if a node contains another node
$hasNode = $dom->contains($node, $parentNode);

// remove a node
$dom->remove($node);

// remove a list of nodes
$dom->removeAll($nodes);

// prepend a child node
$dom->prependChild($newNode, $existingNode);

// insert a node after another node
$dom->insertAfter($newNode, $existingNode);

Usage examples

HTML documents

Loading an existing document

<?php

use Kuria\Dom\HtmlDocument;

$html = <<<HTML
<!doctype html>
<html>
    <head>
        <meta charset="UTF-8">
        <title>Example document</title>
    </head>
    <body>
        <h1>Hello world!</h1>
    </body>
</html>
HTML;

$dom = HtmlDocument::fromString($html);

var_dump($dom->queryOne('//title')->textContent);
var_dump($dom->queryOne('//h1')->textContent);

Output:

string(16) "Example document"
string(12) "Hello world!"

Optionally, the markup can be fixed by Tidy prior to being loaded.

<?php

$dom = new HtmlDocument();
$dom->setTidyEnabled(true);
$dom->loadString($html);

Note

HTML documents ignore errors by default, so there is no need to call $dom->setIgnoreErrors(true).

Creating an new document

<?php

use Kuria\Dom\HtmlDocument;

// initialize empty document
$dom = new HtmlDocument();
$dom->loadEmpty(['formatOutput' => true]);

// add <title>
$title = $dom->getDocument()->createElement('title');
$title->textContent = 'Lorem ipsum';

$dom->getHead()->appendChild($title);

// save
echo $dom->save();

Output:

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Lorem ipsum</title>
</head>
<body>
    </body>
</html>

HTML fragments

Loading an existing fragment

<?php

use Kuria\Dom\HtmlFragment;

$dom = HtmlFragment::fromString('<div id="test"><span>Hello</span></div>');

$element = $dom->queryOne('/div[@id="test"]/span');

if ($element) {
    var_dump($element->textContent);
}

Output:

string(5) "Hello"

Note

HTML fragments ignore errors by default, so there is no need to call $dom->setIgnoreErrors(true).

Creating a new fragment

<?php

use Kuria\Dom\HtmlFragment;

// initialize empty fragment
$dom = new HtmlFragment();
$dom->loadEmpty(['formatOutput' => true]);

// add <a>
$link = $dom->getDocument()->createElement('a');
$link->setAttribute('href', 'http://example.com/');
$link->textContent = 'example';

$dom->getBody()->appendChild($link);

// save
echo $dom->save();

Output:

<a href="http://example.com/">example</a>

XML documents

Loading an existing document

<?php

use Kuria\Dom\XmlDocument;

$xml = <<<XML
<?xml version="1.0" encoding="utf-8"?>
<library>
    <book name="Don Quixote" author="Miguel de Cervantes" />
    <book name="Hamlet" author="William Shakespeare" />
    <book name="Alice's Adventures in Wonderland" author="Lewis Carroll" />
</library>
XML;

$dom = XmlDocument::fromString($xml);

foreach ($dom->query('/library/book') as $book) {
   /** @var \DOMElement $book */
   var_dump("{$book->getAttribute('name')} by {$book->getAttribute('author')}");
}

Output:

string(34) "Don Quixote by Miguel de Cervantes"
string(29) "Hamlet by William Shakespeare"
string(49) "Alice's Adventures in Wonderland by Lewis Carroll"

Creating a new document

<?php

use Kuria\Dom\XmlDocument;

// initialize empty document
$dom = new XmlDocument();
$dom->loadEmpty(['formatOutput' => true]);

// add <users>
$document = $dom->getDocument();
$document->appendChild($document->createElement('users'));

// add some users
$bob = $document->createElement('user');
$bob->setAttribute('username', 'bob');
$bob->setAttribute('access-token', '123456');

$john = $document->createElement('user');
$john->setAttribute('username', 'john');
$john->setAttribute('access-token', 'foobar');

$dom->getRoot()->appendChild($bob);
$dom->getRoot()->appendChild($john);

// save
echo $dom->save();

Output:

<?xml version="1.0" encoding="UTF-8"?>
<users>
  <user username="bob" access-token="123456"/>
  <user username="john" access-token="foobar"/>
</users>

Handling XML namespaces in XPath queries

<?php

use Kuria\Dom\XmlDocument;

$xml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<lib:root xmlns:lib="http://example.com/">
    <lib:book name="Don Quixote" author="Miguel de Cervantes" />
    <lib:book name="Hamlet" author="William Shakespeare" />
    <lib:book name="Alice's Adventures in Wonderland" author="Lewis Carroll" />
</lib:root>
XML;

$dom = XmlDocument::fromString($xml);

// register namespace in XPath
$dom->getXpath()->registerNamespace('lib', 'http://example.com/');

// query using the prefix
foreach ($dom->query('//lib:book') as $book) {
    /** @var \DOMElement $book */
    var_dump($book->getAttribute('name'));
}

Output:

string(11) "Don Quixote"
string(6) "Hamlet"
string(32) "Alice's Adventures in Wonderland"

XML fragments

Loading an existing fragment

<?php

use Kuria\Dom\XmlFragment;

$dom = XmlFragment::fromString('<fruits><fruit name="Apple" /><fruit name="Banana" /></fruits>');

foreach ($dom->query('/fruits/fruit') as $fruit) {
    /** @var \DOMElement $fruit */
    var_dump($fruit->getAttribute('name'));
}

Output:

string(5) "Apple"
string(6) "Banana"

Creating a new fragment

<?php

use Kuria\Dom\XmlFragment;

// initialize empty fragment
$dom = new XmlFragment();
$dom->loadEmpty(['formatOutput' => true]);

// add a new element
$person = $dom->getDocument()->createElement('person');
$person->setAttribute('name', 'John Smith');

$dom->getRoot()->appendChild($person);

// save
echo $dom->save();

Output:

<person name="John Smith"/>