Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Fix double encoded UTF-8 bytes to the correct one

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 lib
Octocat-spinner-32 t
Octocat-spinner-32 .gitignore
Octocat-spinner-32 .shipit
Octocat-spinner-32 Changes
Octocat-spinner-32 MANIFEST
Octocat-spinner-32 MANIFEST.SKIP
Octocat-spinner-32 Makefile.PL
Octocat-spinner-32 README
README
NAME
    Encode::DoubleEncodedUTF8 - Fix double encoded UTF-8 bytes to the
    correct one

SYNOPSIS
      use Encode;
      use Encode::DoubleEncodedUTF8;

      my $dodgy_utf8 = "Some byte strings from the web/DB with double-encoded UTF-8 bytes";
      my $fixed = decode("utf-8-de", $dodgy_utf8); # Fix it

WARNINGS
    Use this module only for testing, debugging, data recovery and working
    around with buggy software you *can't* fix.

    Do not use this module in your production code just to *work around*
    bugs in the code you *can* fix. This module is slow, and not perfect and
    may break the encodings if you run against correctly encoded strings.
    See perlunitut for more details.

DESCRIPTION
    Encode::DoubleEncodedUTF8 adds a new encoding "utf-8-de" and fixes
    double encoded utf-8 bytes found in the original bytes to the correct
    Unicode entity.

    The double encoded utf-8 frequently happens when strings with UTF-8 flag
    and without are concatenated, for instance:

      my $string = "L\x{e9}on";   # latin-1
      utf8::upgrade($string);
      my $bytes  = "L\xc3\xa9on"; # utf-8

      my $dodgy_utf8 = encode_utf8($string . " " . $bytes); # $bytes is now double encoded

      my $fixed = decode("utf-8-de", $dodgy_utf8); # "L\x{e9}on L\x{e9}on";

    See encoding::warnings for more details.

AUTHOR
    Tatsuhiko Miyagawa <miyagawa@bulknews.net>

LICENSE
    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.

SEE ALSO
    encoding::warnings, Test::utf8

Something went wrong with that request. Please try again.