UTF-8 filenames (in releases) are getting mojibaked #248

rwstauner opened this Issue Jan 19, 2013 · 1 comment


None yet
4 participants

rwstauner commented Jan 19, 2013

I know perldoc utf8 says

One can have Unicode in identifier names, but not in package/class or subroutine names. While some limited functionality towards this does exist as of Perl 5.8.0, that is more accidental than designed; use of Unicode for the said purposes is unsupported.

One reason of this unfinishedness is its (currently) inherent unportability: since both package names and subroutine names may need to be mapped to file and directory names, the Unicode capability of the filesystem becomes important-- and there unfortunately aren't portable answers.

But we should probably be attempting to decode file names when indexing.
For example:

*$ curl http://api.metacpan.org/module/Acme::ǝmɔA?fields=documentation,path
   "documentation" : "Acme::ǝmɔA",
   "path" : "lib/Acme/�m�A.pm"

The file basename is supposed to be the same string as the documentation but it's been encoded twice.

I took a quick (but insufficient) look at the indexer:

I don't know if File::Find or Path::Class support any options for decoding filenames but it would probably be simple enough to just decode the name in the indexer (somewhere around line 256).

I'm not sure why the package name is properly encoded... perhaps we get that from the pod parser or something.

I'm not sure if there is anything else we should also be decoding for files... maybe everything in that hashref.

Anyway, something to play with in the future.

lol, as usual rwstauner is a few months ahead of me here.

I haven't been able to confirm yet whether 5.18+ is supposed to fully support utf8 in filenames/modulenames, but it does look like symbol names (sub names, variable names, hash keys etc) are supposed to be handled, starting with 5.16.0. (I've run into an issue there with PPI, which I'm pursuing, but that doesn't seem to be related to this issue here.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment