-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHP 8.3 regression DomNode->value setting #11469
Comments
@nielsdos as I see you're cleaning up dom extension atm, maybe you have idea? |
This is caused by the fix in 50fdad8 for #8388. Basically, PHP incorrectly unescaped character references when setting nodeValue on DOM attributes. This is not in accordance to the DOM spec. This can cause a whole bunch of subtle issues (including security pitfalls), such as not being able to round-trip data: $attrNode = new DOMAttr('test');
$attrNode->value = '&';
print $attrNode->value; // prints & instead of & This behaviour was fixed in 8.3 to comply with the spec. If really necessary, the fix can be reverted for the time being. |
Thank you! I will ask in community because it can affect more editor's integrations as this code will need more then one place to fix, the broken code is https://git.drupalcode.org/project/drupal/-/blob/11.x/core/modules/editor/src/EditorXssFilter/Standard.php#L97 |
Moreover our CI system is broken with Alpha1 - unable to parse xml-artifacts, Very probably the same cause |
Thanks, please keep me posted. If necessary, this can be reverted and discussed on the internals list to find a possible alternative solution. e.g. something that comes to mind is having a
I don't know how Drupal's parser works. In any case, nothing changed to the DOM parsing at PHP's side. |
My initial reaction would be a concern this inconsistent behavior could lead to unsafe escaping down the line by someone trying to support two versions of PHP. |
Thanks for the input. |
cc @tstarling |
There's no alternative which both fixes the bug and avoids a b/c break. If necessary, I will open an RFC proposing to reapply my patch. There's no need for rawNodeValue since forwards-compatible code is already possible with e.g.
Drupal should do that instead. If an application relies on buggy behaviour, how are we meant to fix the bug without breaking the application? The only solution is to stop relying on the bug. |
@tstarling thank you, maybe |
unset is not implemented for that property. It does nothing.
|
I don't know what the best way to handle this situation is. This isn't the only revert I did, I also reverted another patch after people complained. The main problem with this bug (and many others in ext/dom) is that they've been around for about 20 years. It's hard to asses the impact, but it seems that many people expect the wrong behaviour, and their code breaks hard when the behaviour is corrected. For bugfixes that break BC we generally send them to master. However, given the age of the bug, even this seems risky. Non-bugfix BCs have to go through an RFC process to get a community consensus on what to do with them. Perhaps an RFC for fixing all spec compliance bugs in ext/dom is necessary given how old these issues are and the amount of code relying on wrong behaviour. One solution I suggested in another thread is to introduce a "spec compliant" flag which is opt-in. In that way existing applications will remain working, and new code can opt-in to the correct behaviour. This would avoid BC breaks. I think this also needs an RFC though, and we're close to the feature freeze unfortunately. Please let me know your thoughts. Whatever is chosen, this will probably need a discussion on the internals list. |
The best way is deprecate old property (make it throw warning at least) and introduce new one |
The property is named in the DOM spec. It's non-compliant to deprecate it. |
Sounds similar to how we're stuck with a non ISO DateTimeInterface::ISO8601 :-/ Maybe its just one of those changes that needs a lot of documentation and examples in prominent places to explain why, and how to properly use the property between versions. |
Yes, such big changes should be mentioned in UPGRADING at least Also having a test that shows how old behavior has changed is likely to see in the same commit |
After upgrade to 8.3.0beta3 we got new test failure, sounds like related https://www.drupal.org/project/drupal/issues/3383577#comment-15212429 TL'DR when body contain only <?php
$document = <<<EOD
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /></head>
<body><</body>
</html>
EOD;
$dom = new \DOMDocument();
$dom->loadHTML($document, LIBXML_NOBLANKS);
echo $dom->textContent . PHP_EOL;
older version
|
@andypost That's unrelated to this issue report. This is not because of a change in PHP but because of a change in libxml2. I vaguely remember there was a change related to tags and error handling to prevent some kind of bad behaviour with untrusted data. I believe you should report this to the libxml2 developers instead. |
Thank you! Yes, it it depends on libxml version |
Description
The following code:
Resulted in this output:
But I expected this output instead:
This is a cause of few tests failed in Drupal HEAD https://www.drupal.org/project/drupal/issues/3366843
PHP Version
PHP 8.3.0-alpha1
Operating System
Alpinelinux, Debian
The text was updated successfully, but these errors were encountered: