Skip to content

Bad performance of DOMDocument::getElementsByTagName #11689

@dtdesign

Description

@dtdesign

Description

We’re relying on DOMDocument to process and manipulate HTML messages and initially used getElementsByTagName() to process certain types of elements. We became aware that this method has a huge performance impact to a point where its slowness becomes noticeable.

It is possible to replicate the same functionality using a simple DOMXPath query that runs in a fraction of the time, yielding the same result.

The following test script runs both methods 100 times and measures the time elapsed. This is a very simplified case for illustration purposes.

<?php
$html = str_repeat("<p>Hello World</p>", 10_000);
$domDocument = new DOMDocument();
$domDocument->loadHTML($html);

function run(string $name, callable $function): void
{
    $start = microtime(true);
    for ($i = 0; $i < 100; $i++) {
        $function();
    }
    $end = microtime(true);

    echo sprintf(
        "%s (100 runs) took %f seconds\n",
        $name,
        round($end - $start, 5),
    );
}

run("getElementsByTagName", function () use ($domDocument) {
    $length = 0;
    foreach ($domDocument->getElementsByTagName("p") as $p) {
        $length = strlen($p->textContent);
    }

    assert($length !== 0);
});

run("xpath", function () use ($domDocument) {
    $xpath = new DOMXPath($domDocument);

    $length = 0;
    foreach ($xpath->query(".//p") as $p) {
        $length = strlen($p->textContent);
    }

    assert($length !== 0);
});

Running this test locally using PHP 8.2.6 (macOS 13.4, M1 Pro) yields these numbers:

getElementsByTagName (100 runs) took 22.242160 seconds
xpath (100 runs) took 0.177340 seconds

These numbers are (relatively speaking) consistent with test runs using different Intel Xeon processors in an effort to rule out aarch64 as the reason for the bad performance. This performance behavior was observed on older PHP versions too.

PHP Version

8.2.6

Operating System

macOS 13.4.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions