Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emoji encoding issue with mysql adapter #11184

Closed
armhold opened this issue Jun 29, 2013 · 3 comments
Closed

emoji encoding issue with mysql adapter #11184

armhold opened this issue Jun 29, 2013 · 3 comments

Comments

@armhold
Copy link

armhold commented Jun 29, 2013

Hi,

I ran into this while migrating an app from Rails 3 to 4. It seems that some unicode escaped strings (Emoji specifically) cause an ActiveRecord::StatementInvalid: Mysql2::Error: Incorrect string value error.

Here's a gist: https://gist.github.com/armhold/5892370

Does not seem to be an issue when using the sqlite adapter, so you will have to set up a MySQL instance (sorry!) to demonstrate. I'm using mysql-5.5.23-osx10.6-x86_64.

If you run it against activerecord 3.2.13 it works fine.

Thanks for investigating.

@ryan-endacott
Copy link
Contributor

It looks like this could be caused by this MySQL issue. You could maybe try the suggestion in the answer for a temporary fix. These questions also look like they may have a fix. If it's supported in Rails 4, setting the encoding to utf8mb4 seems like it should work.

As far as ActiveRecord goes, it looks like it uses the same default charset (utf8) in both 3.2.13 and 4.0.0, so I still haven't found what is causing the issue. They also use the same default collation (utf8_unicode_ci). It looks like the bug is with MySQL itself, but that doesn't explain it working correctly for ActiveRecord 3.2.13.

I wasn't able to test with 3.2.13 because bundle install failed with the rails-dev-box on 3.2.13. I did find an article that seems to have the same issue as early as 3.2.12 though. It looks like support for utf8mb4 was added in 8744632.

I hope some of this helps!

@armhold
Copy link
Author

armhold commented Jul 1, 2013

@ryan-endacott , thanks for taking the time to investigate. I agree that the root of the problem is in fact a MySQL issue, specifically that their utf8 encoding seems to not entirely support utf8. As this commenter explains, you cannot store characters that are outside the Basic Multilingual Plane using only 3 bytes as MySQL does.

I was a bit mystified as to how it was working under 3.2.13 but not 4.0.0. Fleshing out my test a bit more I find that
while the write "succeeds" under 3.2.13, it silently truncates my data; I get an empty string back instead of \u{1f525}
if I try to read the record back. The exception thrown by 4.0.0 is obviously preferable to silently losing data.

The path forward here seems to be using the 'utf8mb4' encoding along with 'utf8mb4_unicode_ci' collation. Rails4 seems to support this, but unfortunately the latest released mysql2 gem does not. There's a nice writeup here on how to use the git head version of mysql2 to make that work.

Either that or I will bite the bullet and finally move all my apps to Postgres. :-)

@armhold armhold closed this as completed Jul 1, 2013
@ryan-endacott
Copy link
Contributor

@armhold, I was also mystified at it seemingly working for 3.2.13! I'm glad you figured it out.

It looks like it no longer silently truncates the data because of this change in the release notes:

mysql and mysql2 connections will set SQL_MODE=STRICT_ALL_TABLES by default to avoid silent data loss. This can be disabled by specifying strict: false in your database.yml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants