Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Fix double encoded UTF-8 bytes to the correct one
Perl
branch: master

Checking in changes prior to tagging of version 0.05.

Changelog diff is:

diff --git a/Changes b/Changes
index 0480d1f..77712c6 100644
--- a/Changes
+++ b/Changes
@@ -1,5 +1,8 @@
 Revision history for Perl extension Encode::DoubleEncodedUTF8

+0.05  Thu Jul  7 11:02:53 PDT 2011
+        - Added a big WARNINGS in the pod to prevent people from using in the production code.
+
 0.04  Wed Sep 30 18:45:40 PDT 2009
         - Simplify POD example
latest commit 8ecdaffceb
@miyagawa authored
Failed to load latest commit information.
lib/Encode
t
.gitignore
.shipit
Changes Checking in changes prior to tagging of version 0.05.
MANIFEST
MANIFEST.SKIP
Makefile.PL
README

README

NAME
    Encode::DoubleEncodedUTF8 - Fix double encoded UTF-8 bytes to the
    correct one

SYNOPSIS
      use Encode;
      use Encode::DoubleEncodedUTF8;

      my $dodgy_utf8 = "Some byte strings from the web/DB with double-encoded UTF-8 bytes";
      my $fixed = decode("utf-8-de", $dodgy_utf8); # Fix it

WARNINGS
    Use this module only for testing, debugging, data recovery and working
    around with buggy software you *can't* fix.

    Do not use this module in your production code just to *work around*
    bugs in the code you *can* fix. This module is slow, and not perfect and
    may break the encodings if you run against correctly encoded strings.
    See perlunitut for more details.

DESCRIPTION
    Encode::DoubleEncodedUTF8 adds a new encoding "utf-8-de" and fixes
    double encoded utf-8 bytes found in the original bytes to the correct
    Unicode entity.

    The double encoded utf-8 frequently happens when strings with UTF-8 flag
    and without are concatenated, for instance:

      my $string = "L\x{e9}on";   # latin-1
      utf8::upgrade($string);
      my $bytes  = "L\xc3\xa9on"; # utf-8

      my $dodgy_utf8 = encode_utf8($string . " " . $bytes); # $bytes is now double encoded

      my $fixed = decode("utf-8-de", $dodgy_utf8); # "L\x{e9}on L\x{e9}on";

    See encoding::warnings for more details.

AUTHOR
    Tatsuhiko Miyagawa <miyagawa@bulknews.net>

LICENSE
    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.

SEE ALSO
    encoding::warnings, Test::utf8

Something went wrong with that request. Please try again.