New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate the remaining database tables and columns to utf8mb4 #14597
Conversation
I ran this migration on a copy of the reference server a few times, and here are the results after tweaks:
|
Migration tested with this new commit on the production database copy
I have also tested the remaining columns that would remain utf8mb3 just in case. Those are unviable because of the time it would take to migrate them, mostly because of amount of rows and indexes that depend on them.
|
25f169a
to
f7d764b
Compare
086d30c
to
3893dd8
Compare
This commit uses utf8mb4_bin for columns where case sensitivity matters, like when we index by a column where multiple different capitalized versions of the same string may appear or where us fetching rows requires case sensitivity, for instance when comparing records.
(Won't happen. 6 bytes is max in utf8, but Unicode in 2003 declared 21 bits to be enough for everyone and every codepoint, so it's 4 bytes max in practical utf8.) |
That doesn't mean you can't include additional metadata with each character https://twitter.com/Foone/status/1345180935080321034 |
🤦. Let's leave error correction and cryptography to real cryptographers. |
The current database is a mixture of utf8mb3 and utf8mb4, which happens to bring up some issues with encoding characters from time to time. In order to actually fix this and avoid any encoding issues in the future (until we move to utf1024 or something) we need to change database charset and collation of everything that isn't currently utf8mb4.
These migrations will require downtime.
Fixes #14411