Skip to content

jozef/XML-Char

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAME

    XML::Char - validate characters for XML

SYNOPSIS

        use XML::Char;
        if (not XML::Char->valid("bell ".chr(7))) {
            die 'no way to store this string directly to XML';
        }
    
        use utf8;
        use XML::Char;
        if (XML::Char->valid("UTF8 je pořádný peklo")) {
            print "fuf, we are fine\n";
        }

DESCRIPTION

    For me it was kind of a surprised to learn that char(0) is a valid
    UTF-8 character. All of the 0-0x7F are...

        Emo: well it's not because that they are valid utf-8 characters that you have to expect XML to accept them

    Well of course not, now I know :-)

    http://www.w3.org/TR/REC-xml/#charsets defines which characters XML
    processors MUST accept:

        [2]         Char       ::=          #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
        /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

    This module validates if a given string meets this criteria. In
    addition the string has to be a Perl UTF-8 string (is_utf8_string() -
    see "Unicode-Support" in perlapi).

 valid($value)

    Returns true or false if $value consists of valid UTF-8 XML characters.

LINKS

    How can I strip invalid XML characters from strings in Perl?
    <https://stackoverflow.com/questions/1016910/how-can-i-strip-invalid-xm
    l-characters-from-strings-in-perl>

    Extensible Markup Language (XML) 1.0
    <http://www.w3.org/TR/REC-xml/#charsets>

    Extensible Markup Language (XML) 1.1
    <https://www.w3.org/TR/xml11/#charsets>

AUTHOR

    Jozef Kutej

    Aristotle Pagaltzis - completely rewrote the initial Char.XS to handle
    the SvUTF8 flag

COPYRIGHT

    Copyright 2009 Jozef Kutej, all rights reserved.

    This program is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.