Skip to content

MySQL UTF 8 Encoding issues

Brendon Muir edited this page Jul 10, 2018 · 2 revisions

utf8mb4 is a MySQL encoding that supports all of Unicode. It stores characters as 1-4 bytes. utf8, on the other hand, only does 1-3 bytes, and so can't encode anything on the Supplementary Multilingual Plane.

In InnoDB, indexes have a maximum length of 767 bytes. Because MySQL assumes each character will use the maximum number of bytes it can, that means an index on a utf8mb4 column can only be floor(767/4)=191 bytes long.

Every polymorphic column in the public_activity migration creates a _type column, which is 255 bytes long. In a InnoDB/utf8mb4 DB, when the migration tries to add these columns to indexes, it fails with

Mysql2::Error: Specified key was too long; max key length is 767 bytes.

Solution

To fix this, we need the _type columns or the indexes to shorter. What I did is add this before the the add_indexes:

change_column :activities, :trackable_type, :string, :limit => 100
change_column :activities, :owner_type, :string, :limit => 100
change_column :activities, :recipient_type, :string, :limit => 100

Source: https://github.com/pokonski/public_activity/issues/218

Update

Don't just rely on the above for guidance. There's now a lot of reference material regarding converting to mb4. If you use MySQL 5.7 or greater you don't have to muddle with the index lengths anymore as you can convert to the DYNAMIC row format that allows you to store longer keys without trouble. Here's some gists for the migration process:

https://gist.github.com/tjh/1711329

and a handy guide:

https://railsmachine.com/articles/2017/05/19/converting-a-rails-database-to-utf8mb4.html

NOTE! Read everything thoroughly before you start out on your adventure! Do backups, and all that :)