Skip to content

Commit

Permalink
Item5990: Try and capture the effect of character set and encoding co…
Browse files Browse the repository at this point in the history
…nversions, and hopefully capture some of the intent.

Add some sanity checks based on how I understand the API. I can get the output ASSERT to fail by setting {Site}{CharSet} to 'iso-8859-1' so either I misunderstood something or there is a bug. Either way, I would like to understand what is wrong :)


git-svn-id: http://svn.foswiki.org/trunk@7813 0b4bb1d4-4e5a-0410-9cc4-b2b747904278
  • Loading branch information
MichaelTempest authored and MichaelTempest committed Jun 15, 2010
1 parent ecd7d06 commit cc4d235
Showing 1 changed file with 34 additions and 0 deletions.
34 changes: 34 additions & 0 deletions WysiwygPlugin/lib/Foswiki/Plugins/WysiwygPlugin/Handlers.pm
Original file line number Diff line number Diff line change
Expand Up @@ -588,15 +588,49 @@ DEFAULT
sub RESTParameter2SiteCharSet {
my ($text) = @_;

# $text is supposed to contain octets that are a valid UTF-8 encoding.
# $text should certainly not have any codes above 255.
ASSERT( $text !~ /[^\x00-\xff]/,
"only octets expected in input to RESTParameter2SiteCharSet" )
if DEBUG;

# $text might contain octets that are not a valid UTF-8 encoding
# because it came from the browser, and so it might be hostile content.
# Encode::FB_PERLQQ makes decode_utf8 convert invalid octet sequences
# into a perl escape sequence, octet for octet (e.g. \xFF\x80),
# instead of throwing an exception. This defuses the invalid sequence.
$text = Encode::decode_utf8( $text, Encode::FB_PERLQQ );

# $text now contains unicode characters

WC::mapUnicode2HighBit($text);

if ( $Foswiki::cfg{Site}{CharSet} ) {
$text = Encode::encode( $Foswiki::cfg{Site}{CharSet},
$text, Encode::FB_PERLQQ );

# $text is now encoded as per the site charset.
# For UTF-8 - that means octets.

# SMELL: The use of Encode::FB_PERLQQ is probably incorrect here.
# If {Site}{CharSet} is set to 'iso-8859-1' then wide characters
# (with codes greater than 256) which cannot be represented in
# iso-5589-1 are encoded as perl escapes e.g. \x{03b1}.
# Encode::FB_HTMLCREF would be far better, as characters that
# cannot be represented in the specified site character set
# would be converted to HTML entities e.g. α
}

# SMELL: if {Site}{CharSet} is blank (which is the default)
# then $text may contain wide characters.
# Thus, $text is NOT encoded in the SiteCharSet!

# The return value is supposed to be according to the currently selected
# Foswiki site character set, encoded as octets.
# Thus, there should not be any codes above 255.
ASSERT( $text !~ /[^\x00-\xff]/,
"only octets expected in return value for RESTParameter2SiteCharSet" )
if DEBUG;
return $text;
}

Expand Down

0 comments on commit cc4d235

Please sign in to comment.