Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown collation: 'utf8mb4_col' #741

Closed
cannikin opened this issue Mar 28, 2016 · 18 comments
Closed

Unknown collation: 'utf8mb4_col' #741

cannikin opened this issue Mar 28, 2016 · 18 comments

Comments

@cannikin
Copy link

Setting up a Rails 5.0.0.beta3 app with paper_trail pointing to master. I'm deploying to an AWS RDS MySQL instance. Deploying fails when trying to migrate the DB with the following error:

Mysql2::Error: Unknown collation: 'utf8mb4_col': CREATE TABLE `versions` (`id` int AUTO_INCREMENT PRIMARY KEY, `item_type` varchar(255) NOT NULL, `item_id` int NOT NULL, `event` varchar(255) NOT NULL, `whodunnit` varchar(255), `object` longtext, `created_at` datetime) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_col

I was able to get the table created by changing the collation to "utf8mb4_unicode_ci" (recommened in the blog post that's linked to from the paper_trail-generated migration file: https://mathiasbynens.be/notes/mysql-utf8mb4)

However, once the table was created I then got another error:

Mysql2::Error: Specified key was too long; max key length is 767 bytes: CREATE  INDEX `index_versions_on_item_type_and_item_id`  ON `versions` (`item_type`, `item_id`)

This is a lot less than the 191 characters you should be able to use for an index in InnoDB with utf8mb4 columns. Even when I manually shortened the name down to "type_item_id" it still complained about it being too long.

This database was originally created with utf8/utf8_general_ci and then modified to utf8mb4. Maybe that has something to do with it? I can do further testing (drop DB and re-create with different encoding/collation) if needed, just let me know!

@cannikin
Copy link
Author

I was able to get the migrations to complete by just commenting out the options completely.

That seems like the nuclear option...hopefully something else can be done. :)

@jaredbeck
Copy link
Member

Thanks for the issue report, Rob. Yeah, it seems like collations are kind of a disaster in MySQL. They're a big source of pain for developers, anyway. I prefer postgres' decision to have a unified collation for the entire database.

Deploying fails when trying to migration .. Mysql2::Error: Unknown collation: 'utf8mb4_col'

Are you using an old version of MySQL Server? I think utf8mb4 was added in MySQL 5.5.3. (https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html)

@cannikin
Copy link
Author

Hi @jaredbeck The AWS RDS instance is at 5.6.22. It doesn't seem to have a problem with the encoding being utf8mb4, but doesn't like that particular collation. When I switched to utf8mb4_unicode_ci it was okay (other than the following problem about index key length).

@jaredbeck
Copy link
Member

I was able to get the table created by changing the collation to "utf8mb4_unicode_ci"

Sounds like we should change the migration template to utf8mb4_unicode_ci then. I don't remember where I got utf8mb4_col from, but it doesn't appear in the list when I run show collation like 'utf8%';.

However, once the table was created I then got another error: Specified key was too long; max key length is 767 bytes: CREATE INDEX index_versions_on_item_type_and_item_id ON versions (item_type, item_id)
This is a lot less than the 191 characters you should be able to use for an index in InnoDB with utf8mb4 columns.

767 bytes divided by a worst-case scenario of 4 bytes per character does give you 191 characters.

Even when I manually shortened the name down to "type_item_id" it still complained about it being too long.

I don't think the error refers to the length of the index name, I think it refers to the length of the data. How is item_type defined? Is it varchar(255)? Maybe we should try varchar(191) instead.

References:

@carsonreinke
Copy link

The collation probably needs to be utf8mb4_general_ci (or _cs), I don't believe unicode is guaranteed to be there. See this.

@carsonreinke
Copy link

@cannikin What does SHOW COLLATION LIKE 'utf8mb4_%'; say?

@cannikin
Copy link
Author

@jaredbeck The migration that was created by the generator specifies the column as:

t.string   :item_type, null: false

@cannikin
Copy link
Author

@carsonreinke Here's the output:

Collation Charset Id Default Compiled Sortlen
utf8mb4_general_ci utf8mb4 45 Yes Yes 1
utf8mb4_bin utf8mb4 46 Yes 1
utf8mb4_unicode_ci utf8mb4 224 Yes 8
utf8mb4_icelandic_ci utf8mb4 225 Yes 8
utf8mb4_latvian_ci utf8mb4 226 Yes 8
utf8mb4_romanian_ci utf8mb4 227 Yes 8
utf8mb4_slovenian_ci utf8mb4 228 Yes 8
utf8mb4_polish_ci utf8mb4 229 Yes 8
utf8mb4_estonian_ci utf8mb4 230 Yes 8
utf8mb4_spanish_ci utf8mb4 231 Yes 8
utf8mb4_swedish_ci utf8mb4 232 Yes 8
utf8mb4_turkish_ci utf8mb4 233 Yes 8
utf8mb4_czech_ci utf8mb4 234 Yes 8
utf8mb4_danish_ci utf8mb4 235 Yes 8
utf8mb4_lithuanian_ci utf8mb4 236 Yes 8
utf8mb4_slovak_ci utf8mb4 237 Yes 8
utf8mb4_spanish2_ci utf8mb4 238 Yes 8
utf8mb4_roman_ci utf8mb4 239 Yes 8
utf8mb4_persian_ci utf8mb4 240 Yes 8
utf8mb4_esperanto_ci utf8mb4 241 Yes 8
utf8mb4_hungarian_ci utf8mb4 242 Yes 8
utf8mb4_sinhala_ci utf8mb4 243 Yes 8
utf8mb4_german2_ci utf8mb4 244 Yes 8
utf8mb4_croatian_ci utf8mb4 245 Yes 8
utf8mb4_unicode_520_ci utf8mb4 246 Yes 8
utf8mb4_vietnamese_ci utf8mb4 247 Yes 8

@cannikin
Copy link
Author

But, should paper_trail really be specifying an encoding and collation in the migration? If I've got UTF8 as my encoding maybe I'll miss out on higher order characters, but isn't that my own problem? It seems like hardcoding it into the generated migration is just asking for trouble.

I don't think I've ever seen a gem that outputs a migration with hardcoded encoding and collations like this. Would it be enough to simply return a warning message in the output of the generator, or a comment in the generated migration explaining the pitfalls of not having utf8mb4?

@jaredbeck
Copy link
Member

But, should paper_trail really be specifying an encoding and collation in the migration?

I think we should try to provide full unicode support out of the box if possible. If we do not specify an encoding, we'll get latin1 which would be a problem for our international users.

@cannikin
Copy link
Author

Ahh, so those settings only apply to this one table, that makes sense.

@jaredbeck
Copy link
Member

The collation probably needs to be utf8mb4_general_ci (or _cs), I don't believe unicode is guaranteed to be there.

Thanks Carson, I'll try utf8mb4_general_ci.

@jaredbeck
Copy link
Member

Rob, I've reproduced the "Unknown collation: 'utf8mb4_col'" error in our test suite, just FYI: https://travis-ci.org/airblade/paper_trail/jobs/119905232

@cannikin
Copy link
Author

Yup that's the one! Now if you change that collation lets see if you get the same key length error...

jaredbeck added a commit that referenced this issue Mar 31, 2016
Should fix the "Unknown collation: 'utf8mb4_col'" error from
#741
jaredbeck added a commit that referenced this issue Mar 31, 2016
Should fix the "Unknown collation: 'utf8mb4_col'" error from
#741
@carsonreinke
Copy link

I believe you can just leave out the COLLATION and just specify the CHARSET.

@jaredbeck
Copy link
Member

I believe you can just leave out the COLLATION and just specify the CHARSET.

Thanks Carson, but unless you see a problem with specifying COLLATE I think I'd rather be explicit.

@jaredbeck
Copy link
Member

Yup that's the one! Now if you change that collation lets see if you get the same key length error...

No dice: https://travis-ci.org/airblade/paper_trail/jobs/119910848 You can see in this build that versions was created with explicit COLLATE=utf8mb4_general_ci but it does not reproduce the "Specified key was too long". I wonder if we need a large value in the item_type column in order to reproduce it. Do you have any really long values in your versions.item_type column, greater than 191 chars?

@jaredbeck
Copy link
Member

Closed by 0435936 which sets the collation to utf8mb4_general_ci

[I cannot] reproduce the "Specified key was too long". I wonder if we need a large value in the item_type column in order to reproduce it. Do you have any really long values in your versions.item_type column, greater than 191 chars?

Rob, if you are able to provide steps to reproduce the "Specified key was too long" using 0435936 or later please open a new issue, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants