Permalink
Browse files

New blog post

  • Loading branch information...
f055 committed Dec 11, 2018
1 parent 8fc6461 commit e5d82fa5e7c4fbb02437c18e938e89c7d7c49475
Showing with 51 additions and 0 deletions.
  1. +51 −0 post/How-to-use-utf8-in-Perl-and-dont-go-crazy.md
@@ -0,0 +1,51 @@
# 😱 How to use utf8 in Perl and don't go crazy

UTF-8 is the only way to fly, and I hope everyone can agree with that. It's like mescaline :P Anyway, while Perl handles any character beautiflly within variables, things get messy when you want to save and load these characters into a file, for example. God forbid it's json, then you're in for a wild ride. But there's a simple way that just works.

For starters, and to be on the safe side, use utf8! The pragma, I mean:

```
use utf8;
```

Next, when saving text (but not binary content), open the file handle in utf8 mode – `$file` is the full path to the saved file, `$data` holds the file's contents:

```
if (open FILE, '>:utf8', $file) {
print FILE $data;
close FILE;
}
```

Finally, when loading text (again, not binary content unless you want a flood of warnings!), open the file handle in utf8 encoding mode:

```
if (open FILE, '<:encoding(utf8)', $file) {
while(<FILE>) {
$data .= $_;
}
close FILE;
}
```

*PS. An excellet Perl file module is [`File::Util`](https://metacpan.org/pod/File::Util). It's latest version also has a nice switch to write/read utf8 files, or [other binmodes](https://metacpan.org/pod/File::Util#File-Encoding-and-UTF-8) for that matter.*

## Json burn

Now, json. I'm assuing you are including the most popular module like `use JSON;`. It does some utf8 operations internally, and from my tests it seems that each of the four encoding/decoding subroutines does it differently.

To make it work with what I described earlier, here's my suggestion. When creating `$data` to be saved into a file, use:

```
$data = to_json($object, {pretty => 1}));
```

It seems `to_json` has `{utf8 => 0}` set by default. `pretty` is just useful, but you can live without it. However, the other way around – loading json from a file – is different. Once you get the `$data`, you have to decode it like this:

```
$object = from_json($data, {utf8 => 0});
```
Disable utf8 explicitly. After all, it was already decoded when you opened the file with `<:encoding(utf8)`. Yes, encoding to decode seems strange and any of the Perl Monks will give you a perfectly smart explanation as to why it's like that. I am not one of them. I just test my code and find stuff that works. Enjoy!


[![f055](https://github.com/f055/blog/blob/master/icon.png)](https://github.com/f055/blog)

0 comments on commit e5d82fa

Please sign in to comment.