-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Description
Description
We’re relying on DOMDocument
to process and manipulate HTML messages and initially used getElementsByTagName()
to process certain types of elements. We became aware that this method has a huge performance impact to a point where its slowness becomes noticeable.
It is possible to replicate the same functionality using a simple DOMXPath
query that runs in a fraction of the time, yielding the same result.
The following test script runs both methods 100 times and measures the time elapsed. This is a very simplified case for illustration purposes.
<?php
$html = str_repeat("<p>Hello World</p>", 10_000);
$domDocument = new DOMDocument();
$domDocument->loadHTML($html);
function run(string $name, callable $function): void
{
$start = microtime(true);
for ($i = 0; $i < 100; $i++) {
$function();
}
$end = microtime(true);
echo sprintf(
"%s (100 runs) took %f seconds\n",
$name,
round($end - $start, 5),
);
}
run("getElementsByTagName", function () use ($domDocument) {
$length = 0;
foreach ($domDocument->getElementsByTagName("p") as $p) {
$length = strlen($p->textContent);
}
assert($length !== 0);
});
run("xpath", function () use ($domDocument) {
$xpath = new DOMXPath($domDocument);
$length = 0;
foreach ($xpath->query(".//p") as $p) {
$length = strlen($p->textContent);
}
assert($length !== 0);
});
Running this test locally using PHP 8.2.6 (macOS 13.4, M1 Pro) yields these numbers:
getElementsByTagName (100 runs) took 22.242160 seconds
xpath (100 runs) took 0.177340 seconds
These numbers are (relatively speaking) consistent with test runs using different Intel Xeon processors in an effort to rule out aarch64 as the reason for the bad performance. This performance behavior was observed on older PHP versions too.
PHP Version
8.2.6
Operating System
macOS 13.4.1