non-ASCII characters in consecutive parent/child page URLs are corrupted #63

Open
dandv opened this Issue Jan 14, 2010 · 2 comments

Projects

None yet

2 participants

@dandv
Member
dandv commented Jan 14, 2010
  1. In a test page, create a link starting with a lowercase non-ASCII character, such as [[βeta]] - starts with lowercase non-ASCII letter Beta
  2. Edit the new βeta page, insert another link, such as [[foo]], and save.
  3. Re-save the test page to regenerate the link to βeta so that the trailing '?' disappears. (workaround for issue #43).
  4. Navigate to the βeta page and notice that the non-ASCII characters in the URL (i.e. just β here) are double-UTF-8 encoded: /βeta/foo
@dandv
Member
dandv commented Jan 14, 2010

A workaround is to have ASCII-only characters for the link, and any non-ASCII for the link text, e.g. [[beta|βeta]].

@tarkhil
tarkhil commented Dec 4, 2010

I've found the place of the problem. In MojoMojo::Schema::ResultSet::Page, line 176, URI::Escape::uri_unescape escapes (JS-generated) URL like %D0%B2%D1%82%D0%BE%D1%80%D0%BE%D0%B9_%D1%81%D1%82%D0%B5%D0%BD%D0%B4 into NON-UNICODE string, and Encode::decode_utf8 garbles it totally.

I have too little experience with JS to check if it's JS-related problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment