XML parser: stop using CDATA with invalid XML entities, fixes #10765 #4432
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @jim-p,
I have been looking further into this double-escaped ampersand issue and I propose the following patch.
It removes an old workaround, added in 2e6a43a, which resulted in invalid XML files where CDATA sections contained HTML entities, some of which were invalid XML entities (such as
ü
which was translated toü
).This was only a problem because the
htmlentities
function was used (which attempts to escape all characters that have matching HTML entities) instead ofhtmlspecialchars
(which only converts the 5 characters that have valid XML entities).By switching to
htmlspecialchars
, we ensure that the resulting XML will be valid no matter what special characters are used inside any tag, removing the need to maintain a list of special tags.I initially meant to only remove the double escaping because text inside CDATA doesn't need to be escaped so I wrote sbraz@95d37e7. But after understanding why CDATA was used in the first place, I thought it best to completely get rid of it and simplify the code while making it future-proof (no need to maintain a list of tags containing special characters).