Permalink
Browse files

Remove invalid chars from displayed string per XML specification

Strict XHTML requires that data comply with XML 1.0 specification [1],
which only allows a subset of the UTF-8 charset.

Function string_html_specialchars() has been modified to remove from the
string to print, any character which is not in the defined range

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
         [#x10000-#x10FFFF]

Fixes #14744

[1] http://www.w3.org/TR/xml/
  • Loading branch information...
1 parent a93121b commit 2b5d66217bd4ecf5e7271f1a4b2b339d7681e91c @dregad dregad committed Sep 26, 2012
Showing with 4 additions and 0 deletions.
  1. +4 −0 core/string_api.php
View
@@ -910,6 +910,10 @@ function string_html_entities( $p_string ) {
* @return string
*/
function string_html_specialchars( $p_string ) {
+ # Remove any invalid character from the string per XML 1.0 specification
+ # http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Char
+ $p_string = preg_replace( '/[^\x9\xA\xD\x20-\xD7FF\xE000-\xFFFD\x{10000}-\x{10FFFF}]/u', '', $p_string );
+
# achumakov: @ added to avoid warning output in unsupported codepages
# e.g. 8859-2, windows-1257, Korean, which are treated as 8859-1.
# This is VERY important for Eastern European, Baltic and Korean languages

0 comments on commit 2b5d662

Please sign in to comment.