New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out if Unicode::UTF8 can be replaced #11
Comments
I think this might illustrate a way how to do it with a core module: use Data::Dumper;
use Encode;
use feature 'say';
use warnings;
use strict;
my $latin_bytes = "\346\370\345";
my $utf8_bytes = "æøå";
my $flagged_utf8_str = decode_utf8("æøå");
say "Decoding two raw strings with byte values corresponding to latin1 and utf8: ";
say Dumper({ latin1_bytes => $latin_bytes, utf8_bytes => $utf8_bytes, flagged_utf8_str => $flagged_utf8_str });
say encode_utf8(
decode( 'utf-8', "$latin_bytes $utf8_bytes", sub {
decode( 'latin1', chr($_[0]), sub {
chr($_[0]);
})
})
);
__END__
Decoding two raw strings with byte values corresponding to latin1 and utf8:
$VAR1 = {
'flagged_utf8_str' => "\x{e6}\x{f8}\x{e5}",
'latin1_bytes' => '���',
'utf8_bytes' => 'æøå'
};
æøå æøå
|
What do you think @marcusramberg ? |
@jhthorsen I'm not sure. Would appreciate @chansen 's input here as he made the original recommendation. |
Is there a test for this in the test suite? I'm not quite certain I understand what the issue actually is, and if I could see code that would verify things, it'd probably be enlightening. |
I can't seem to find any... The test should be something like this:
The reason for using this module is that we want to guess that if a messages is received from a IRC client using latin1, then we still want the characters decoded as utf8 on our side. Note: I might be very wrong here. Maybe we want the scalar to contain bytes, without any encoding? I can't really remember. |
I'm not sure what it's used for either, but for the purpose of guessing+decoding messages received from the IRC server, IRC::Utils::decode_irc will do the job. |
That's interesting @Grinnz! Thanks. |
@jhthorsen, the code in @nicomen, close but no cigar! You should use @Grinnz Unicode::UTF8The callback passed to
The reason I recommend EncodeIt's possible to use
Portability + EfficiencyThis code uses an instance of my $Encoding;
BEGIN {
my $has_unicode_utf8 = !!eval { require Unicode::UTF8; 1 };
unless ($has_unicode_utf8) {
require Encode;
$Encoding = Encode::find_encoding('UTF-8')
or die q/Could not find UTF-8 encoding in Encode/;
}
*HAS_UNICODE_UTF8 = sub () { $has_unicode_utf8 };
}
sub decode_irc {
@_ == 1 or die q/Usage: decode_irc($octets)/;
if (HAS_UNICODE_UTF8) {
no warnings 'utf8';
return Unicode::UTF8::decode_utf8($_[0], sub { $_[0] });
}
else {
# The stringfication of $_[0] is intentional!
# Older versions of Encode have had bugs with GETMAGIC and issues
# with references and overloaded objects causing segfaults.
return $Encoding->decode("$_[0]", sub { $Encoding->encode(chr $_[0]) });
}
} -- |
Maybe we can do something funky with this code? https://metacpan.org/source/RIBASUSHI/Devel-PeekPoke-0.03/lib/Devel/PeekPoke/PP.pm#L100
The text was updated successfully, but these errors were encountered: