New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contributor forum content gets cut when there's an emoji #435
Comments
@kelimuttu thanks for reporting this. It seems to be happening because of some weird interaction between django and the db, from the limited debugging I did a string with an emoji in it seems to be properly passed to the db, but on retrieval is cut as you say: with everything including and after the emoji missing. Our planned upgrade to python 3 might magically fix this, so we'll revisit this after that. |
Thanks for looking into it, Leo. Can you please update here once it's land on the staging site so I could test it out? |
Python3 upgrade is live in prod since a few months. I suspect that it's a limitation of our DB but let's investigate if this is the case. |
Blocked by #765 |
Rediscovered this after looking at a recently filed bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1723190 This looks to be a bizarre problem in MySQL where |
currently our mysql db is set up to use utf8mb3 which cannot store 4-byte utf-8 characters. it'll truncate the field just before the offending character, which can lead to some pretty major data loss this change adds a couple of fields which will strip out those characters, or in the case of utf8mb3TextField can be configured to store them as html numeric character references all end-user modifiable fields have been changed to use these utf8mb3 fields, and in a few carefully tested cases will store 4-byte characters as html numeric references mozilla/sumo#435
currently our mysql db is set up to use utf8mb3 which cannot store 4-byte utf-8 characters. it'll truncate the field just before the offending character, which can lead to some pretty major data loss this change adds a couple of fields which will strip out those characters, or in the case of utf8mb3TextField can be configured to store them as html numeric character references all end-user modifiable fields have been changed to use these utf8mb3 fields, and in a few carefully tested cases will store 4-byte characters as html numeric references mozilla/sumo#435
When I'm trying to add the following content body in a new article ( /kb/new ):
Sentry fires a Unhandled (1366, "Incorrect string value: '\xF0\x9F\x9E\x87 t...' for column 'content' at row 1") Data Error -> https://mozilla.sentry.io/issues/4104208612/events/b4360388166b4217b968b2b8b0d8e40e/ |
Thanks for testing this issue, @emilghittasv |
This should no longer be an issue once we migrate to Postgres, which we're hoping to complete by the end of June 2023. |
This has been fixed as of today! 🎉 |
Related bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1723190
I posted a community announcement in the contributor forum containing emoji earlier today. But instead of displaying the full content, it got cut off exactly where I put the emoji although the preview seems to be able to display the emoji just fine. Not sure if it was possible in the past, but even if it's not possible, I think it's enough to just remove the emoji and display the rest of the content instead of cutting the content halfway.
The text was updated successfully, but these errors were encountered: