-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some of the "sjis" characters are not returned correctly #1464
Comments
Yes, this is by design. MySqlConnector delegates all character set conversions to MySQL Server and only transmits UTF-8 on the wire.
That sounds likely. Have you reported it at bugs.mysql.com? |
I have not reported a bug at mysql.com but I looked at it a bit further, and it seems that 0xFBFC character code is invalid in sjis encoding because:
Then, MySQL uses a simple static arrays for mapping the sjis characters to Unicode, and the region with 0xFB.. prefix is also empty: see ctype-sjis.cc, this is probably why it does not convert them to utf8mb4 correctly. However, customers use the sjis character set to store these extended characters, and MySQL alows that, and then these characters can be retrieved with the official dotnet connector as shown above, but not with MySqlConnector. |
This seems like a difference between Shift JIS (https://en.wikipedia.org/wiki/Shift_JIS) and CP932 (https://en.wikipedia.org/wiki/Code_page_932_(Microsoft_Windows)). Arguably it's a bug to define a DB column as using |
Customers use our software with their own databases, and we do not control the data structure |
I have no plans to support charset conversion in MySqlConnector; it only speaks UTF-8. Customers should be advised to not store |
Software versions
MySQL 5.7.34, MySQL 8.0.22
Describe the bug
Some of the CHARSET=sjis characters (Japanese) are returned as question marks when they are read by MySqlConnector. Exactly the same code works fine with the official MySql.Data.MySqlClient connector.
Exception
n/a
Code sample
The following app prints this when it is compiled with the official MySQL connector (using MySql.Data.MySqlClient namespace) as expected:
but the 髙 character gets replaced with question mark when using MySqlConnector namespace:
Expected behavior
A U+9AD9 character (髙) is expected to be returned, however we get the regular question mark placeholder U+003F (?)
Additional context
MySqlConnector seems to configure the connection to receive all content in Unicode, and there must be a bug in MySQL because it sends the question mark in this case. I can see the 0x3f character in the buffer under the debugger.
However, the official connector receives the 0xFB, 0xFC bytes, and it converts them to the correct 髙 character. The same behavior can be achieved by setting the character_set_results variable as
but then we still get the question mark in the end because we try to read it as UTF8 string and this sequence does not encode a UTF8 character:
MySqlConnector/src/MySqlConnector/ColumnReaders/StringColumnReader.cs
Line 12 in bbdbd78
This is how I created the test data:
The text was updated successfully, but these errors were encountered: