Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection is reset to SET NAMES utf8mb4 COLLATE utf8mb4_general_ci and cannot be changed #626

Closed
nicolasgarfinkiel opened this issue Mar 29, 2019 · 20 comments
Labels

Comments

@nicolasgarfinkiel
Copy link

nicolasgarfinkiel commented Mar 29, 2019

Since MySql 8.0.1, default collation is utf8mb4_0900_ai_ci. Current implementation of connection reset executes SET NAMES utf8mb4 COLLATE utf8mb4_general_ci; and it cannot be changed.

Is it possible to change this behaviour and be able to set charset and collation from connection string?

I'm more than willing to provide a pull request.

Thanks!

@bgrainger
Copy link
Member

bgrainger commented Mar 29, 2019

See also #585.

Given the requests to change this, it may make sense to provide a connection string option to set it. Connector/NET provides a CharSet option (but doesn't allow collation to be specified).

Alternatively, would it "work" to just execute SET NAMES utf8mb4 and let the server pick its default collation for that charset? That might cause unexpected behaviour and hard-to-reproduce errors on different servers; perhaps it's best to stay explicit.

@nicolasgarfinkiel
Copy link
Author

nicolasgarfinkiel commented Mar 29, 2019

Thanks for the quick reply!

Better yet, shouldn't it use SET NAMES DEFAULT; to match server defaults?

I tested it and works as expected and as far as I can tell, it applies up to version 5.5.

Another concern I have is how this plays out with MySQL router, MaxScale and specially ProxySQL.

@bgrainger
Copy link
Member

bgrainger commented Mar 29, 2019

I don't think that would do the right thing (from MySqlConnector's perspective) if the server default were latin1; we want all text on the wire to be UTF-8 (i.e. utf8mb4).

@bgrainger
Copy link
Member

bgrainger commented Mar 29, 2019

BTW, what's the use case for needing connection collation to be utf8mb4_0900_ai_ci? If comparing a column (or a literal to a column), the column collation already takes precedence?

@bgrainger
Copy link
Member

bgrainger commented Mar 29, 2019

Another concern I have is how this plays out with MySQL router, MaxScale and specially ProxySQL.

Can you be more specific about your concerns?

@nicolasgarfinkiel
Copy link
Author

nicolasgarfinkiel commented Mar 29, 2019

You have a point. But then again, if I have a default charset and collation on the server side, I would expect the client to default to that when I open a client connection without specifying otherwise.

So that brings me back to the original issue... Is it possible to add this options (charset and collation) to the connection string? If not set, we can leave the current behaviour, but leave the door open to customize it.

We can also just implement CharSet with SET NAMES <your_charset>; to keep compatible with Connector/NET.

@bgrainger
Copy link
Member

bgrainger commented Mar 29, 2019

But then again, if I have a default charset and collation on the server side, I would expect the client to default to that when I open a client connection without specifying otherwise.

I'm strongly opposed to this. There are dozens of different character sets that MySQL supports (and hundreds of Stack Overflow questions about "why is my string corrupted in the database"). Using the server defaults would require MySqlConnector to support all of them; instead, it will always enforce UTF-8 on the wire, no matter the server default.

Is it possible to add this options (charset and collation) to the connection string?

Yes, it is possible. I'm still curious about the use case that necessitates configuring the connection collation; can you provide more details?

@nicolasgarfinkiel
Copy link
Author

nicolasgarfinkiel commented Mar 29, 2019

My use case is connecting to a bunch of replicas through ProxySQL and don't know if multiple statements over the same command get balanced or go to the same server and if their connection multiplexing feature plays a role in there as well (from the docs it seems it doesn't). Same thing go for multiple commands on a single client connection.

@nicolasgarfinkiel
Copy link
Author

nicolasgarfinkiel commented Mar 30, 2019

I'm strongly opposed to this. There are dozens of different character sets that MySQL supports (and hundreds of Stack Overflow questions about "why is my string corrupted in the database"). Using the server defaults would require MySqlConnector to support all of them; instead, it will always enforce UTF-8 on the wire, no matter the server default.

Agreed. But now the client is setting for charset and collation (SET NAMES utf8mb4 COLLATE utf8mb4_general_ci;), maybe a solution is just SET NAMES utf8mb4?

Although, I would also add this to be changed in the connection string as an advanced option.

Yes, it is possible. I'm still curious about the use case that necessitates configuring the connection collation; can you provide more details?

Sure! Kindly see my reply above.

@bgrainger
Copy link
Member

bgrainger commented Mar 30, 2019

I don't know about ProxySQL internals, but I'd be highly surprised if the same physical connection was moved between different replicas. There's handshaking to connect to a specific server, then each server maintains state (e.g., server variables, user-defined variables) for the lifetime of the connection. It would be a major breaking change for clients if SET @myvar = 'a'; was executed against one backend, then the next command (on that same connection) was executed against another.

On the MySqlConnector side, a single MySqlConnection will represent the same physical connection until it's closed, at which point it'll be returned to the pool and made available to another MySqlConnection object. (There's an exception to this for EnlistTransaction() with AutoEnlist=false but that's a niche edge case.)

So changing the connection collation should be persistent until the connection is closed.

@nicolasgarfinkiel
Copy link
Author

nicolasgarfinkiel commented Mar 30, 2019

I agree with you, and I would expect ProxySQL to do right thing (although multiplexing might throw someone a curve ball there).

But now, using MySQL 8.0.1+ with MySqlConnector, and on a second command I execute ("fresh" connection from the pool), I get:

MySqlException: Illegal mix of collations (utf8mb4_0900_ai_ci,IMPLICIT) and (utf8mb4_general_ci,IMPLICIT) for operation '='

Which forces me to either set SET NAMES utf8mb4 on each connection, or use utf8mb4_general_ci for my server, tables, etc.

So, I beleive that for better onward support, we can default to SET NAMES utf8mb4 and add CharSet to the connection string. What do you think?

Happy to make the changes myself!

@bgrainger
Copy link
Member

bgrainger commented Mar 30, 2019

on a second command

What is the second command? (I just want to understand this failure better.)

So, I beleive that for better onward support, we can default to SET NAMES utf8mb4 and add CharSet to the connection string.

What possible values do you anticipate the CharSet connection string option taking?

@nicolasgarfinkiel
Copy link
Author

nicolasgarfinkiel commented Apr 1, 2019

What is the second command? (I just want to understand this failure better.)

Here's a test case that reproduces the exception. The first test is from our use case, the second is a simple query that sets a local variable (might be a problem inside stored procedures?). What's more, I haven't been able to replicate it for ut8mb4, but AFAIK, for some charsets, a difference in collation doesn't break the query but renders indexes unusable.

    public class UnitTest
    {
        
        const string Schema = @"
            DROP SCHEMA IF EXISTS test_db;
            CREATE SCHEMA test_db;
            USE test_db;
            CREATE TABLE `test_table` (
              `id` int(11) NOT NULL AUTO_INCREMENT,
              `test_col` varchar(10) DEFAULT NULL,
              PRIMARY KEY (`id`),
              KEY `ix_test` (`test_col`)
            );
            INSERT INTO `test_db`.`test_table` (`test_col`)
            VALUES ('a'), ('b'), ('c'), ('d'), ('e'), ('f'), ('g'), ('h'), ('i'), ('j');
        ";
        
        const string ConnectionString = "server=192.168.11.10;port=31000;uid=root;pwd=root;";
        
        [Fact]
        public async Task MySqlConnectorThrowsOnSecondCommandOverSecondConnection()
        {
            
            using (var db = new MySqlConnection(ConnectionString))
            {
                await db.ExecuteAsync(Schema);
            }

            var sql = @"
                USE test_db;
                WITH params AS (
                    SELECT * FROM JSON_TABLE(
                        @Params,
                        ""$[*]""
                        COLUMNS(
                            param varchar(10) PATH ""$.p""
                        )
                    ) JT
                )
                SELECT * FROM test_table a
                INNER JOIN params b ON a.test_col = b.param
            ";

            using (var db = new MySqlConnection(ConnectionString))
            {
                
                await Assert.ThrowsAsync<MySqlException>(
                    async () => await db.ExecuteReaderAsync(
                        sql,
                        new
                        {
                            Params = JsonConvert.SerializeObject((new [] { "b", "c" }).Select(x => new { param = x }))
                        }
                    )
                );

            }

        }
        
        [Fact]
        public async Task MySqlConnectorThrowsOnSecondCommandOverSecondConnectionTakeTwo()
        {
            
            using (var db = new MySqlConnection(ConnectionString))
            {
                await db.ExecuteAsync(Schema);
            }

            var sql = @"
                USE test_db;
                SET @param = 'b';

                EXPLAIN FORMAT = JSON
                SELECT * FROM test_table a
                WHERE a.test_col = @param
            ";

            using (var db = new MySqlConnection($"{ConnectionString};Allow User Variables=true"))
            {

                await Assert.ThrowsAsync<MySqlException>(
                    async () => await db.ExecuteReaderAsync(
                        sql
                    )
                );

            }

        }
        
    }

I'm using MySQL 8.0.14

What possible values do you anticipate the CharSet connection string option taking?

Well, if we default only to utfmb4, which I completely agree, CharSet wouldn't be of any use. I would make SET NAMES utf8mb4, without any collation, and add a cs option for collation.

Thanks!

@typesbb
Copy link

typesbb commented Apr 2, 2019

I also have this problem too, I'm using MySQL 8.0.15.
But i fixed it. Try change MySql's my.ini or my.cnf(linux), and add next options:

[mysqld]
init_connect='SET collation_connection = utf8mb4_general_ci'
init_connect='SET NAMES utf8mb4'
character-set-server=utf8mb4
collation-server=utf8mb4_general_ci

It's work for me!

@nicolasgarfinkiel
Copy link
Author

nicolasgarfinkiel commented Apr 2, 2019

Thanks @typesbb for your input!

I imagine that when MySqlConnector resets the connection the init_connect gets lost and I actually don't want to use utf8mb4_general_ci, as the new default is usually better and the former somewhat frowned upon.

BTW, current MySqlConnector implementation makes your init_connect queries not necessary as it does the same thing for you.

@bgrainger
Copy link
Member

bgrainger commented Apr 11, 2019

Thanks for the comprehensive repro; I'm investigating.

@bgrainger
Copy link
Member

bgrainger commented Apr 11, 2019

This was probably introduced by #585; before that (with SET NAMES utf8mb4 COLLATE utf8mb4_bin) this bug did not occur.

@bgrainger
Copy link
Member

bgrainger commented Apr 11, 2019

I plan to fix this by always executing SET NAMES utf8mb4; on all new connections (and connections being reset from the pool). This will use the server's default collation for that charset (which is utf8mb4_general_ci for MySQL 5.7 and utf8mb4_0900_ai_ci for MySQL 8.0).

This doesn't regress #585 because the default collation in MySQL 5.7 is the same as the one being explicitly picked there.

I considered adding a new Collation=(Default|Binary|Unicode|Unicode520|...) connection string option but decided against it because:

  • most collations (except utf8mb4_bin and utf8mb4_0900_ai_ci) still reproduce the bug in the code sample above, so picking a correct collation is difficult
  • a server upgrade might break a previously-working program by changing the server's default collation for a charset
  • it seems like an unnecessary feature to add now and have to support in the future

@bgrainger
Copy link
Member

bgrainger commented Apr 11, 2019

Fixed in 0.52.0.

@nicolasgarfinkiel
Copy link
Author

nicolasgarfinkiel commented Apr 17, 2019

Thank you so much for your efforts @bgrainger! That solves the issue perfectly. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants