Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
52 lines (35 sloc) 2.31 KB

😱 How to use utf8 in Perl and don't go crazy

UTF-8 is the only way to fly, and I hope everyone can agree with that. It's like mescaline :P Anyway, while Perl handles any character beautiflly within variables, things get messy when you want to save and load these characters into a file, for example. God forbid it's json, then you're in for a wild ride. But there's a simple way that just works.

For starters, and to be on the safe side, use utf8! The pragma, I mean:

use utf8;

Next, when saving text (but not binary content), open the file handle in utf8 mode – $file is the full path to the saved file, $data holds the file's contents:

if (open FILE, '>:utf8', $file) {
  print FILE $data;
  close FILE;
}

Finally, when loading text (again, not binary content unless you want a flood of warnings!), open the file handle in utf8 encoding mode:

if (open FILE, '<:encoding(utf8)', $file) {
  while(<FILE>) {
    $data .= $_;
  }
  close FILE;
}

PS. An excellet Perl file module is File::Util. It's latest version also has a nice switch to write/read utf8 files, or other binmodes for that matter.

Json burn

Now, json. I'm assuing you are including the most popular module like use JSON;. It does some utf8 operations internally, and from my tests it seems that each of the four encoding/decoding subroutines does it differently.

To make it work with what I described earlier, here's my suggestion. When creating $data to be saved into a file, use:

$data = to_json($object, {pretty => 1}));

It seems to_json has {utf8 => 0} set by default. pretty is just useful, but you can live without it. However, the other way around – loading json from a file – is different. Once you get the $data, you have to decode it like this:

$object = from_json($data, {utf8 => 0});

Disable utf8 explicitly. After all, it was already decoded when you opened the file with <:encoding(utf8). Yes, encoding to decode seems strange and any of the Perl Monks will give you a perfectly smart explanation as to why it's like that. I am not one of them. I just test my code and find stuff that works. Enjoy!

f055