You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I'm passing non-convertable UTF-8 characters into a DOMDocument node that has been initialized with an encoding like ISO-8859-15, I expect the library to correctly handle these characters. As in such a case where they are non-printable, I expect them to be encoded in HTML entitites.
This is what happens when my DOMDocument encoding is ISO-8859-1. However, it does not work correctly when I'm using ISO-8859-15.
Example
The following code:
<?php// Decimal Unicode Code Points: 8224, 8225, 8482, 49, 50, 51$input = '†‡™123';
// Correct output -> the UTF-8-only characters are encoded as HTML entitites$domISO88591 = newDOMDocument('1.0', 'ISO-8859-1');
$domISO88591->appendChild($domISO88591->createElement('text', $input));
echo$domISO88591->saveXML();
// Incorrect output$domISO885915 = newDOMDocument('1.0', 'ISO-8859-15');
$domISO885915->appendChild($domISO885915->createElement('text', $input));
echo$domISO885915->saveXML();
Both outputs should be exactly the same because, given the input, both ISO encodings share the same set of printable characters. Also, all printable characters should not be encoded to HTML entitites (as with 1 and 1)
PHP Version
PHP 8.2.8
(All of my local PHP binaries are affected, with the newest version being 8.2.8 and the oldest being 7.4.15)
Operating System
macOS 13.4, libxml 2.9.4
The text was updated successfully, but these errors were encountered:
This is a libxml2 bug. I can reproduce this with 2.9.4 indeed, but not with version 2.11.4 which is used on my host.
Can you try upgrading libxml2? I believe a more recent version should fix your issue. Version 2.9.4 is also quite old.
@nielsdos thanks for the quick reply. I already suspected libxml2 to be at fault here, but so far was unable to update my local instance that ships with xcode or to compile PHP with a separate installation of libxml2 2.11.4, due to the removal of --with-libxml-dir in 29d1b7f.
If you're able to reproduce with libxml2 2.9.4 but not with 2.11.4 I think this issue can be closed again. In case an update of libxml2 doesn't fix it for me, I'll reopen.
Just for future reference: Updating my XCode SDK (which is the default source for libxml2 on macOS) to the latest version did not help, as the newest version of libxml2 that Apple ships is 2.9.13, but that version is still affected by this bug.
Having looked through the changelogs of libxml2, I can't really tell which version fixed the faulty behaviour, but I can confirm that it works well using 2.11.4.
Additionally, the docs are outdated on how to configure the path to libxml2 to be used during the compile step. Instead of the no longer supported --with-libxml-dir option, I was successful customizing my version of libxml2 via pkg-config.
Description
When I'm passing non-convertable UTF-8 characters into a
DOMDocument
node that has been initialized with an encoding like ISO-8859-15, I expect the library to correctly handle these characters. As in such a case where they are non-printable, I expect them to be encoded in HTML entitites.This is what happens when my
DOMDocument
encoding is ISO-8859-1. However, it does not work correctly when I'm using ISO-8859-15.Example
The following code:
Result
Problem: The
†
and™
are omitted and somehow the1
has been incorrectly converted to1
.Expected Output
Both outputs should be exactly the same because, given the input, both ISO encodings share the same set of printable characters. Also, all printable characters should not be encoded to HTML entitites (as with
1
and1
)PHP Version
PHP 8.2.8
(All of my local PHP binaries are affected, with the newest version being 8.2.8 and the oldest being 7.4.15)
Operating System
macOS 13.4, libxml 2.9.4
The text was updated successfully, but these errors were encountered: