Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use utf8mb4 character set by default for MySQL database #33608

Merged
merged 4 commits into from Sep 11, 2018

Conversation

@yahonda
Copy link
Contributor

yahonda commented Aug 13, 2018

Summary

This pull request implements #33596. It includes these changes:

  • Replace utf8 character set with utf8mb4 to support supplementary characters including emoji
  • Removed utf8_unicode_ci collation from Active Record unit test databases to let MySQL server use the default collation for the character set
  • Bump the minimum version of MySQL to 5.7.9 and MariaDB to 10.2.2 to support utf8mb4 character set and 3072 bytes key length with InnoDB
  • Addressed Specified key was too long; max key length is 1000 bytes for MyISAM table in the test by using InnoDB storage engine
  • CI against MySQL 5.7
@rails-bot

This comment has been minimized.

Copy link

rails-bot commented Aug 13, 2018

r? @kamipo

(@rails-bot has picked a reviewer for you, use r? to override)

@yahonda

This comment has been minimized.

Copy link
Contributor Author

yahonda commented Aug 14, 2018

@jeremy I have opened a work in progress PR for #33596. There is one failure with PostgreSQL 9.2 https://travis-ci.org/rails/rails/jobs/415702539 . I don't think it is relevant to my pull request.

@yahonda yahonda force-pushed the yahonda:use_utf8mb4 branch Aug 14, 2018

@yahonda

This comment has been minimized.

Copy link
Contributor Author

yahonda commented Aug 14, 2018

Restarted CI by changing the last commit hash and found the failure with PostgreSQL 9.2 https://travis-ci.org/rails/rails/jobs/415715778 needs addressed by changing .travis.yml not to upgrade MySQL server if PostgreSQL 9.2 is configured or something like that.

Bottom line: All of CI against MySQL 5.7 is green.

@yahonda

This comment has been minimized.

Copy link
Contributor Author

yahonda commented Aug 14, 2018

Another idea is dropping PostgreSQL 9.2 support for Rails 6 since PostgreSQL 9.2 itself already EOLed https://www.postgresql.org/support/versioning/ .

@yahonda yahonda force-pushed the yahonda:use_utf8mb4 branch 24 times, most recently Aug 14, 2018

if version < "5.7.9"
raise "Your version of MySQL (#{version_string}) is too old. Active Record supports MySQL >= 5.7.9."
elsif mariadb? && version < "10.2.2"
raise "Your version of MariaDB (#{version_string}) is too old. Active Record supports MariaDB >= 10.2.2"

This comment has been minimized.

@jeremy

jeremy Sep 4, 2018

Member

Bumping the minimum MySQL/MariaDB versions is important for new apps creating a database using the default charset, but it also affects all existing apps, many of which have already switched to utf8mb4 on their own and would continue to behave properly after this change despite running an older database.

Perhaps we could require a newer db for new apps but allow existing apps to continue to use older versions. One possibility is to have create_database do the version check, but only when :charset is not specified.

This comment has been minimized.

@yahonda

yahonda Sep 5, 2018

Author Contributor

Thanks for the comment.

I still wanted to bump the minimum version of MySQL and MariaDB to support utf8mb4 character set without any my.cnf changes.

I believe many users have successfully migrated to utf8mb4 character set using MySQL 5.5 and 5.6.
Also, there is plenty number of users who just configured utf8mb4 character set and have been struggling with "ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes". i.e. Googled "utf8mb4 Specified key was too long" and found more than 10,000 articles.

To address this error with MySQL 5.5 and 5.6, my.cnf needs update to change innodb_file_per_table, innodb_file_format and innodb_large_prefix to non-default value. MySQL 5.7.9 and MariaDB 10.2.2 introduces innodb_default_row_format to allow longer 3072 byte key size by default. No need to change my.cnf at all to support utf8mb4 character set.

I think the biggest advantage to bump MySQL version to 5.7 and MariaDB 10.2.2 is no need to change my.cnf at all to support utf8mb4 character set.

I might have misunderstood your point. Further discussions are appreciated.

This comment has been minimized.

@rafaelfranca

rafaelfranca Sep 5, 2018

Member

I get all the reasoning but should not that decision come from the users? If they want to change the my.cnf file to have support to utf8mb4 but still use Rails 6.0 with MySQL 5.5 they should be able to. I don't see any reason that an application running MySQL 5.5 with the proper my.cnf config can't work with your current code. If it can't I believe it is better to only change the default when the version is 5.7.9 than forcing everyone to upgrade MySQL just to use Rails 6.

This comment has been minimized.

@yahonda

yahonda Sep 5, 2018

Author Contributor

Thanks for the feedback. I think I get points of review from Jeremy and Rafael and still, I'd like to keep this pull request as it is for now.

Because without this pull request users can use utf8mb4 with MySQL 5.5 with some necessary changes in my.cnf. I am thinking that the main point of this pull request is "by default".

Upgrading MySQL server version is not an easy task,
but still, it is a good timing bump the minimum requirement of MySQL when Rails major version bumps from 5 to 6.

I'd like to see some more feedback if many of these feedback prefers not to bump the minimum version of MySQL I'll revise or close this pull request.

Thanks again for the useful feedback.

This comment has been minimized.

@rafaelfranca

rafaelfranca Sep 6, 2018

Member

@matthewd @tenderlove can you give us your opinion about this?

This comment has been minimized.

@jeremy

jeremy Sep 6, 2018

Member

@yahonda Agreed that Rails 6 apps should have correct "by default" utf-8 charset behavior with MySQL/MariaDB and that bumping the required db version is a strong way to guarantee it.

I suggest we allow existing deployed apps to upgrade to Rails 6 if they have taken responsibility for compatibility. For example, if they have already migrated to utf8mb4 on an AWS Aurora cluster with MySQL 5.6, then they already set encoding: utf8mb4 in config/database.yml and we can trust their configuration.

So perhaps we can require modern databases by default, but also allow "migrated" databases:

  • New Rails 6 apps require modern MySQL/MariaDB.
  • Upgrading Rails 6 apps that do not specify encoding in database.yml, and hence inherit the new default encoding, also require modern MySQL/MariaDB.
  • Upgrading Rails 6 apps that explicitly specify encoding in database.yml are assumed to be using a compatible, properly configured database.

I think we could do this with a version check in initialize that's only triggered when !config.include?(:encoding).

This comment has been minimized.

@jeremy

jeremy Sep 6, 2018

Member

To be clear, I'm not suggesting closing this PR either! This change is very desirable, and so is an "escape hatch" for existing app deployments that are already compatible.

If there are other MySQL/MariaDB features we'd like to rely on, or old database support we'd like to drop from the adapter, we can certainly update our minimum version requirements for those reasons as well.

This comment has been minimized.

@matthewd

matthewd Sep 6, 2018

Member

I think I get lost on the version interactions here... if new/recent MySQL/MariaDB has good default behaviour for encoding, and older versions can be configured to have the good behaviour (but might not be), and if you're using an old version that isn't configured well then everything will still work fine but you'll get an error from some DDL if you aren't overriding default lengths (is that right?)... then I'm not sure we need to do anything.

Bumping our compatible server versions just to enforce "good encodings are good" seems pretty aggressive to me. It may not be our recommended configuration, but I'd want compelling internal driver benefits to warrant outright refusing to talk to peoples databases.

You should use utf8mb4, which is also what you get automatically from a modern DB server... but I don't see why even a brand new application shouldn't still work with an arbitrarily-encoded database, if the only implication is that you might have encoding issues. There are plenty of app-external things you should be doing, but it's not our job to enforce them. ("The kernel version you're using has a known bug in the ext4 driver; refusing to boot Rails"?)

This comment has been minimized.

@yahonda

yahonda Sep 8, 2018

Author Contributor

Thanks for additional feedbacks. I'll update my pull request to relax database version requirements.
I need to take a look at the differences between charset and encoding first.

@rafaelfranca rafaelfranca requested review from matthewd and tenderlove Sep 6, 2018

@@ -251,7 +253,7 @@ def create_database(name, options = {})
if options[:collation]
execute "CREATE DATABASE #{quote_table_name(name)} DEFAULT COLLATE #{quote_table_name(options[:collation])}"
else
execute "CREATE DATABASE #{quote_table_name(name)} DEFAULT CHARACTER SET #{quote_table_name(options[:charset] || 'utf8')}"
execute "CREATE DATABASE #{quote_table_name(name)} DEFAULT CHARACTER SET #{quote_table_name(options[:charset] || 'utf8mb4')}"

This comment has been minimized.

@jeremy

jeremy Sep 6, 2018

Member

I noticed that lib/active_record/tasks/mysql_database_tasks.rb is using config[:encoding] as default charset, whereas we fall back to utf8mb4 here regardless of configured encoding.

def creation_options
  Hash.new.tap do |options|
    options[:charset]     = configuration["encoding"]   if configuration.include? "encoding"

It's preexisting behavior, but should we do the same here? e.g. options[:charset] || @config[:encoding] || 'utf8mb4'

arunit2:
username: rails
encoding: utf8
encoding: utf8mb4

This comment has been minimized.

@jeremy

jeremy Sep 6, 2018

Member

I wish we had named this :charset instead of :encoding long ago 😅

yahonda added some commits Aug 13, 2018

CI against MySQL 5.7
Followed this instruction and changed root password to empty string.
https://docs.travis-ci.com/user/database-setup/#MySQL-57
Bump the minimum version to MySQL 5.7.9 and MariaDB 10.2.2
to support utf8mb4 character set and `innodb_default_row_format`

MySQL 5.7.9 introduces `innodb_default_row_format` to support 3072 byte length index by default.
Users do not have to change MySQL database configuration to support Rails string type.

https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_default_row_format

https://dev.mysql.com/doc/refman/5.7/en/innodb-restrictions.html
> If innodb_large_prefix is enabled (the default),
> the index key prefix limit is 3072 bytes for InnoDB tables that use DYNAMIC or COMPRESSED row format.

* Bump the minimum version of MariaDB to 10.2.2
MariaDB 10.2.2 is the first version of MariaDB supporting `innodb_default_row_format`
Also MariaDB says "MySQL 5.7 is compatible with MariaDB 10.2".

- innodb_default_row_format
https://mariadb.com/kb/en/library/xtradbinnodb-server-system-variables/#innodb_default_row_format

- "MariaDB versus MySQL - Compatibility"
https://mariadb.com/kb/en/library/mariadb-vs-mysql-compatibility/
> MySQL 5.7 is compatible with MariaDB 10.2

- "Supported Character Sets and Collations"
https://mariadb.com/kb/en/library/supported-character-sets-and-collations/
Use utf8mb4 character set by default
* Use utf8mb4 character set

`utf8mb4` character set supports supplementary characters including emoji.
`utf8` character set with 3-Byte encoding is not enough to support them.

There was a downside of 4-Byte length character set with MySQL 5.5 and 5.6:

"ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes"
for Rails string data type which is mapped to varchar(255) type.

MySQL 5.7 supports 3072 byte key prefix length by default.

* Remove `DEFAULT COLLATE` from Active Record unit test databases

There should be no "one size fits all" collation in MySQL 5.7.
Let MySQL server choose the default collation for Active Record
unit test databases.

Users can choose their best collation for their databases
by setting `options[:collation]` based on their requirements.

* InnoDB FULLTEXT indexes support since MySQL 5.6
it does not have to use MyISAM storage engine whose maximum key length is 1000 bytes.
Using MyISAM storag engine with utf8mb4 character set would cause
"Specified key was too long; max key length is 1000 bytes"

https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html

* References

"10.9.1 The utf8mb4 Character Set (4-Byte UTF-8 Unicode Encoding)"
https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb4.html

"10.9.2 The utf8mb3 Character Set (3-Byte UTF-8 Unicode Encoding)"
https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8.html

"14.8.1.7 Limits on InnoDB Tables"
https://dev.mysql.com/doc/refman/5.7/en/innodb-restrictions.html
> If innodb_large_prefix is enabled (the default), the index key prefix limit is 3072 bytes
> for InnoDB tables that use DYNAMIC or COMPRESSED row format.

@yahonda yahonda force-pushed the yahonda:use_utf8mb4 branch from bf0d709 to 1b047bf Sep 11, 2018

@jeremy jeremy added this to the 6.0.0 milestone Sep 11, 2018

@jeremy

jeremy approved these changes Sep 11, 2018

Copy link
Member

jeremy left a comment

Excellent. Thank you for this contribution @yahonda!

@jeremy jeremy merged commit d54d0c9 into rails:master Sep 11, 2018

2 checks passed

codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@yahonda

This comment has been minimized.

Copy link
Contributor Author

yahonda commented Sep 11, 2018

Thanks for merging.
Let me have some time to work on a pull request to fall back to utf8 if utf8mb4 is not available by version or by configuration.

yahonda added a commit to yahonda/rails that referenced this pull request Sep 12, 2018

Use `utf8` character set by default if MySQL server does not support …
…longer index key prefix

Once rails#33608 merged If users create a new database using MySQL 5.1.x, it will fail to create databases
since MySQL 5.1 does not know `utf8mb4` character set.

This pull request removes `encoding: utf8mb4` from `mysql.yml.tt`
to let create_database method handles default character set by MySQL server version.

`supports_longer_index_key_prefix?` method will need to validate if MySQL 5.5 and 5.6 server configured
correctly to support longer index key prefix, but not yet.

yahonda added a commit to yahonda/rails that referenced this pull request Sep 12, 2018

Use `utf8` character set by default if MySQL server does not support …
…longer index key prefix

Once rails#33608 merged If users create a new database using MySQL 5.1.x, it will fail to create databases
since MySQL 5.1 does not know `utf8mb4` character set.

This pull request removes `encoding: utf8mb4` from `mysql.yml.tt`
to let create_database method handles default character set by MySQL server version.

`supports_longer_index_key_prefix?` method will need to validate if MySQL 5.5 and 5.6 server configured
correctly to support longer index key prefix, but not yet.

yahonda added a commit to yahonda/rails that referenced this pull request Sep 12, 2018

Use `utf8` character set by default if MySQL server does not support …
…longer index key prefix

Once rails#33608 merged If users create a new database using MySQL 5.1.x, it will fail to create databases
since MySQL 5.1 does not know `utf8mb4` character set.

This pull request removes `encoding: utf8mb4` from `mysql.yml.tt`
to let create_database method handles default character set by MySQL server version.

`supports_longer_index_key_prefix?` method will need to validate if MySQL 5.5 and 5.6 server configured
correctly to support longer index key prefix, but not yet.

yahonda added a commit to yahonda/rails that referenced this pull request Sep 13, 2018

Validate if `utf8mb4` set and longer index key prefix is supported
Once rails#33608 merged If users create a new database using MySQL 5.1.x, it will fail to create databases
since MySQL 5.1 does not know `utf8mb4` character set.

This pull request removes `encoding: utf8mb4` from `mysql.yml.tt`
to let create_database method handles default character set by MySQL server version.

`supports_longer_index_key_prefix?` method will need to validate if MySQL 5.5 and 5.6 server configured
correctly to support longer index key prefix, but not yet.

yahonda added a commit to yahonda/rails that referenced this pull request Sep 13, 2018

Validate if `utf8mb4` character set and longer index key prefix is su…
…pported

Once rails#33608 merged If users create a new database using MySQL 5.1.x, it will fail to create databases
since MySQL 5.1 does not know `utf8mb4` character set.

This pull request removes `encoding: utf8mb4` from `mysql.yml.tt`
to let create_database method handles default character set by MySQL server version.

`supports_longer_index_key_prefix?` method will need to validate if MySQL 5.5 and 5.6 server configured
correctly to support longer index key prefix, but not yet.

yahonda added a commit to yahonda/rails that referenced this pull request Sep 13, 2018

Validate if `utf8mb4` character set and longer index key prefix is su…
…pported

Once rails#33608 merged If users create a new database using MySQL 5.1.x, it will fail to create databases
since MySQL 5.1 does not know `utf8mb4` character set.

This pull request removes `encoding: utf8mb4` from `mysql.yml.tt`
to let create_database method handles default character set by MySQL server version.

`supports_longer_index_key_prefix?` method will need to validate if MySQL 5.5 and 5.6 server configured
correctly to support longer index key prefix, but not yet.

@yahonda yahonda deleted the yahonda:use_utf8mb4 branch Sep 23, 2018

yahonda added a commit to yahonda/rails that referenced this pull request Dec 16, 2018

yahonda added a commit to yahonda/rails that referenced this pull request Dec 16, 2018

yahonda added a commit to yahonda/rails that referenced this pull request Dec 24, 2018

yahonda added a commit to yahonda/rails that referenced this pull request Dec 24, 2018

yahonda added a commit to yahonda/rails that referenced this pull request Dec 24, 2018

yahonda added a commit to yahonda/rails that referenced this pull request Dec 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.