-
Notifications
You must be signed in to change notification settings - Fork 8k
Description
Description
URLs with an IPv6 address as the host use square brackets [] around the address, per RFC 3986. The saveHTML() method on DOMDocument incorrectly URL-encodes these square brackets in attributes that expect a URL value (like href, src, and action). Other attributes I tested don't seem to be affected.
This example with various permutations of attributes and IPv6 URLs:
<?php
$html = <<<EOD
<html>
<head>
<link rel='stylesheet' href='http://[::1]:5173/app.css'/>
<script src='https://[::1]:5173/app.js'></script>
</head>
<body>
<a href='http://[::1]' data-custom='http://[::1]'>anchor to http://[::1]</a>
<form action='http://[::1]'></form>
<blockquote cite='http://[::1]'></blockquote>
</body>
</html>
EOD;
$document = new DOMDocument();
$document->loadHTML($html);
print $document->saveHTML();Resulted in this output:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<link rel="stylesheet" href="http://%5B::1%5D:5173/app.css">
<script src="https://%5B::1%5D:5173/app.js"></script>
</head>
<body>
<a href="http://%5B::1%5D" data-custom="http://[::1]">anchor</a>
<form action="http://%5B::1%5D"></form>
<blockquote cite="http://[::1]"></blockquote>
</body>
</html>But I expected this output instead:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<link rel="stylesheet" href="http://[::1]:5173/app.css">
<script src="https://[::1]:5173/app.js"></script>
</head>
<body>
<a href="http://[::1]" data-custom="http://[::1]">anchor</a>
<form action="http://[::1]"></form>
<blockquote cite="http://[::1]"></blockquote>
</body>
</html>(cite on <blockquote> seems to be unaffected, even though by spec it should be a URL.)
The internal representation of such an attribute within the class is unaffected; the escaping happens only on output with saveHTML().
I also checked Dom\HTMLDocument::saveHTML(), and that method returns all attributes correctly without escaping. I know that is the preferred version today, but a great many older codebases still rely on DOMDocument.
Live example comparing both classes: https://3v4l.org/9gXDT#v8.4.18
PHP Version
PHP 8.4.17 (cli) (built: Jan 13 2026 17:17:10) (NTS)
Copyright (c) The PHP Group
Built by Shivam Mathur
Zend Engine v4.4.17, Copyright (c) Zend Technologies
with Xdebug v3.5.0, Copyright (c) 2002-2025, by Derick Rethans
with Zend OPcache v8.4.17, Copyright (c), by Zend Technologies
Operating System
macOS 15.7.4