Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Unicode wikilinks' t/unicode.t test fails with Encode ≥ 2.53 #121

Open
ppisar opened this issue Apr 29, 2014 · 2 comments
Open

'Unicode wikilinks' t/unicode.t test fails with Encode ≥ 2.53 #121

ppisar opened this issue Apr 29, 2014 · 2 comments

Comments

@ppisar
Copy link

ppisar commented Apr 29, 2014

'Unicode wikilinks' t/unicode.t test fails with Encode ≥ 2.53 because Encode::decode_utf8() started to check that the input is a valid UTF-8 byte-stream even if Perl UTF-8 flag is set on:

$ CATALYST_CONFIG=t/var/mojomojo.yml prove -l -v t/unicode.t
[...]
ok 6 - POST /.jsrpc/render
ok 7 - basic Unicode: page content
[error] Caught exception in MojoMojo::Controller::Jsrpc->render "Cannot decode string with wide characters at /usr/lib64/perl5/vendor_perl/Encode.pm line 215."
not ok 8 - Unicode wikilinks

The problem is a Perl Unicode string is passed to Encode::decode_utf8() which is and effective request for double decoding.

See https://bugzilla.redhat.com/show_bug.cgi?id=1092015 for more details.

@ppisar
Copy link
Author

ppisar commented Apr 29, 2014

This can be the same bug as in issue #63 or #118.

@ppisar
Copy link
Author

ppisar commented Apr 29, 2014

MojoMojo::Schema::ResultSet::Page::normalize_name() does:

    return (
        Encode::decode_utf8(URI::Escape::uri_unescape($name_orig)),
        Encode::decode_utf8(URI::Escape::uri_unescape($name)),
    );

while input variable is an Unicode string. uri_unescape() unfortunately preserves the UTF-8 flag despite the output of URI-unescaping is a byte-string by definition.

See for discussions in Encode bug queue: CPAN RT#81460, #70161, and #43859.

This patch fixes it:

From f3fb6a261e047ca7068b08c0c292ae22d9007656 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Petr=20P=C3=ADsa=C5=99?= <ppisar@redhat.com>
Date: Tue, 29 Apr 2014 12:37:39 +0200
Subject: [PATCH] normalize name as a byte string
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Since Encode-2.53, Encode::decode_utf8() requires argument to be
a byte string. Because URI::Escape::uri_unescape() returns unicode
string on unicode string input, one need to convert it to a byte
string before.

<https://github.com/mojomojo/mojomojo/issues/121#issuecomment-41651507>

Signed-off-by: Petr Písař <ppisar@redhat.com>
---
 lib/MojoMojo/Schema/ResultSet/Page.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/MojoMojo/Schema/ResultSet/Page.pm b/lib/MojoMojo/Schema/ResultSet/Page.pm
index 306aa6a..3f1a7e2 100644
--- a/lib/MojoMojo/Schema/ResultSet/Page.pm
+++ b/lib/MojoMojo/Schema/ResultSet/Page.pm
@@ -177,8 +177,8 @@ sub normalize_name {
     $name =~ s/\s+/_/g;
     $name = lc($name);
     return (
-        Encode::decode_utf8(URI::Escape::uri_unescape($name_orig)),
-        Encode::decode_utf8(URI::Escape::uri_unescape($name)),
+        Encode::decode_utf8(URI::Escape::uri_unescape(Encode::encode_utf8($name_orig))),
+        Encode::decode_utf8(URI::Escape::uri_unescape(Encode::encode_utf8($name))),
     );
 }

-- 
1.9.0

'''

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant