Utf 8 manipulation

Iain Campbell edited this page Oct 6, 2015 · 12 revisions

UTF-8 Manipulation

Enable utf8 pragma

If you want to manipulate UTF-8 string, you need to enable utf8 pragma in all your scripts which contain UTF-8 strings.

use Mojolicious::Lite;

use utf8;

my $name = "おおつか たろう";

This is basic convention in Perl, and you remember to save the script as UTF-8.

Request

In Mojolicious, all strings which contain requests are converted to Perl internal strings.

# Parameter value of "foo" is a Perl internal string
my $foo = $self->req->param('foo');

If you save it to data storage such as RDBMS, you must encode it to a byte string by using encode() from the Encode module.

use Encode 'encode';
$foo = encode('UTF-8', $foo);

Generally, you can use the DBD feature of converting a Perl internal string to byte string if the DBD provides that feature.

# SQLite
my $dbh = DBI->connect($data_source, undef, undef, {sqlite_unicode => 1});

# MySQL
my $dbh = DBI->connect($data_source, $user, $password, {mysql_enable_utf8 => 1});

Please note, that this setting shall be done upon connecting to the database, not in the middle of a connection.

Rendering

In HTML rendering, Perl internal strings are automatically converted to UTF-8 byte strings. The character set should be specified in HTML header "http-equiv" attribute.

get '/' => 'index';
app->start;

__DATA__

@@ index.html.ep
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>タイトル</title>
  </head>
  <body>
    コンテンツ
  </body>
</html>

JSON configuration file

When you read configuration from a configuration file as JSON using the json_config plugin, the data is converted from a UTF-8 byte string to a Perl internal string, so remember to save the configuration file as UTF-8.

# Load JSON configuration file
plugin 'json_config';

JSON Rendering

When you render JSON data, the data is converted from Perl internal strings to UTF-8 byte strings, so all strings provided to the renderer must be Perl internal strings (not preencoded to UTF-8 or another charset).

# JSON rendering
$self->render(json => $data);

Testing

In test script, you enable utf8 pragma, and save the script as UTF-8.

use Test::More tests => 3;

use utf8;

my $t = Test::Mojo->new(...);

If you want to contain UTF-8 byte string in query string of URL, use url_escape() of Mojo::ByteStream. b() is shortcut of Mojo::ByteStream->new.

# Test get request 
my $url = '/foo?name=すずき';
$url = b($url)->url_escape->to_string;
$t->get_ok($url)
  ->status_is(200)

If you want to post form data for test, form data is encoded as UTF-8 by default. All parameter names and values are converted from Perl internal string to byte string.

# Test post request
$t->post_ok('/foo', form => {name => 'すずき'})
  ->status_is(200)