Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

charset issues #780

Open
jkklee opened this issue Nov 4, 2016 · 29 comments
Open

charset issues #780

jkklee opened this issue Nov 4, 2016 · 29 comments
Assignees

Comments

@jkklee
Copy link

jkklee commented Nov 4, 2016

Dose proxysql communicate with backend using charset of latin?
My app use utf8, and my table use utf8 too. In normal case, app will send "SET NAMES utf8" to the mysql server, every thing is ok.

After I add proxysql in the middle of app and mysql server, all chinese display incorrect in my app. I found that proxysql send "SET NAMES latin" to the backend server, and I think this is the point of my question.

But in proxysql, I found only one variables about charset mysql-default_charset which contorl the charset between client and proxysql.

@renecannao
Copy link
Contributor

The default for mysql-default_charset is utf8 : this is the charset that the proxy communicates to the client as the default.
Although, the client sends to the server (the proxy in this case) the charset it wants to use during the connect phase, and after with SET NAMES or similar.
ProxySQL will ensure that the connections to backends use the charset specified by the client.
If this doesn't happen, the client is perhaps asking it in a way the proxy doesn't understand.

Can you send an extract of tcpdump traffic, to verify?

Thank you

@melson-jao
Copy link

Hi,

I have the same issue.
I run original service at 3306 and apply proxysql v1.3.0 at 3307 with the same DB and the same APP/PHPMyAdmin.
If I get data from 3306, everything is fine.
If I get data from 3307 via proxysql, the encoding is changed to latin1.

I run a test SQL and SHOW VARIABLES LIKE 'character_set%';" together to 3307 and it shows character_set_client/character_set_connection are both latin1.

I also have confirmed the mysql-default_charset in proxysql is utf8. Cannot figure out the root cause yet.

@renecannao
Copy link
Contributor

This is very odd, and I assume there is a something in your application that is not being handled correctly by ProxySQL. Can you please run this same test?

$ mysql -u root -h 127.0.0.1 -P6033
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 20
Server version: 5.5.30 (ProxySQL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

mysql> SET NAMES latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

mysql> SET NAMES utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
$ mysql -u root -h 127.0.0.1 -P6033 -e "SHOW VARIABLES LIKE 'character_set%'" --default-character-set=latin1
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
$ mysql -u root -h 127.0.0.1 -P6033 -e "SHOW VARIABLES LIKE 'character_set%'" --default-character-set=utf8
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

From the above, character set is handled correctly.
But it is possible that your application is changing charset in a way that the proxy doesn't understand: an extract of tcpdump will help debugging this issue:

tcpdump -i any -s 0 port 3307 -w proxy.3307.tcp

Thank you!

@jkklee
Copy link
Author

jkklee commented Nov 4, 2016

Of course.
here is more details
MySQL:
listening on port 3307

mysql> show global variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

ProxySQL:
listening on port 3306

mysql> select * from runtime_mysql_servers;
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 0            | 127.0.0.1     | 3307 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | 172.16.200.24 | 3307 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+

In fact the HG 0 and 1 are the same one mysql, distinguish from hostname, “127.0.0.1” is write node and "172.16.200.24" is read node.

tcpdump file as below:
In this tcpdump file, my app shoud just send one statment that have practical effect, which like select * from xxx where username= ? limit 1
3306-3307.txt

@jkklee
Copy link
Author

jkklee commented Nov 4, 2016

I did what you say:

mysql> show global variables like 'char%';
+--------------------------+-------------------------------------+
| Variable_name            | Value                               |
+--------------------------+-------------------------------------+
| character_set_client     | latin1                              |
| character_set_connection | latin1                              |
| character_set_database   | latin1                              |
| character_set_filesystem | binary                              |
| character_set_results    | latin1                              |
| character_set_server     | latin1                              |
| character_set_system     | utf8                                |
| character_sets_dir       | /usr/share/percona-server/charsets/ |
+--------------------------+-------------------------------------+
8 rows in set (0.00 sec)

mysql> SET NAMES utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> show global variables like 'char%';
+--------------------------+-------------------------------------+
| Variable_name            | Value                               |
+--------------------------+-------------------------------------+
| character_set_client     | latin1                              |
| character_set_connection | latin1                              |
| character_set_database   | latin1                              |
| character_set_filesystem | binary                              |
| character_set_results    | latin1                              |
| character_set_server     | latin1                              |
| character_set_system     | utf8                                |
| character_sets_dir       | /usr/share/percona-server/charsets/ |
+--------------------------+-------------------------------------+


[root@test-env ~]# mysql -uroot -pxxxx -h127.0.0.1 -P3306 -e "SHOW VARIABLES LIKE 'character_set%'" --default-character-set=latin1
Warning: Using a password on the command line interface can be insecure.
+--------------------------+-------------------------------------+
| Variable_name            | Value                               |
+--------------------------+-------------------------------------+
| character_set_client     | latin1                              |
| character_set_connection | latin1                              |
| character_set_database   | utf8                                |
| character_set_filesystem | binary                              |
| character_set_results    | latin1                              |
| character_set_server     | latin1                              |
| character_set_system     | utf8                                |
| character_sets_dir       | /usr/share/percona-server/charsets/ |
+--------------------------+-------------------------------------+
[root@test-env ~]# mysql -uroot -pxxxx -h127.0.0.1 -P3306 -e "SHOW VARIABLES LIKE 'character_set%'" --default-character-set=utf8
Warning: Using a password on the command line interface can be insecure.
+--------------------------+-------------------------------------+
| Variable_name            | Value                               |
+--------------------------+-------------------------------------+
| character_set_client     | utf8                                |
| character_set_connection | utf8                                |
| character_set_database   | utf8                                |
| character_set_filesystem | binary                              |
| character_set_results    | utf8                                |
| character_set_server     | latin1                              |
| character_set_system     | utf8                                |
| character_sets_dir       | /usr/share/percona-server/charsets/ |
+--------------------------+-------------------------------------+

In mysql-cli, my result is not same as what you comment previous, what's wrong maybe with my mysql-client? Please help me on this.Thanks

@melson-jao
Copy link

melson-jao commented Nov 4, 2016

Hi Renecannao,

Thanks for your quick reply. I also have done some test via CLI and confirm "--default-character-set" is okay to force proxysql use the specific charset.
But the problem is the method of our APP, ex Laravel, or PHPMyAdmin, to assign charset utf8 is using "SET NAMES utf8 COLLATE 'utf8_unicode_ci'" during creating connection.
It seems not work in this way.

The most strange thing is I run 2 SQL as below via phpmyadmin web GUI but get different results.
SQL 1: SELECT * FROM event limit 1; SHOW VARIABLES LIKE 'character_set%';
RESULT:
character_set_client latin1
character_set_connection latin1

SQL 2: SELECT * FROM event limit 1; SELECT @@session.character_set_client;SHOW VARIABLES LIKE 'character_set%';
RESULT:
character_set_client utf8
character_set_connection utf8

No idea why and maybe more information about how proxysql handle "SET NAMES xxx" is appreciated.

Thanks.

@jkklee
Copy link
Author

jkklee commented Nov 4, 2016

Im my case, mysql_query_rules is:

mysql> select rule_id,active,match_digest,destination_hostgroup,apply from mysql_query_rules;
+---------+--------+----------------------+-----------------------+-------+
| rule_id | active | match_digest         | destination_hostgroup | apply |
+---------+--------+----------------------+-----------------------+-------+
| 1       | 1      | ^select              | 1                     | 1     |
| 2       | 1      | ^select.*for update$ | 0                     | 1     |
+---------+--------+----------------------+-----------------------+-------+
2 rows in set (0.00 sec)

Seems that set names xxx been routed to the write node, but in this case, it shoud be routed to the read node. Is there any way to process this statement correctly?

@melson-jao
Copy link

Our environment is also read/write split case.
In the beginning, we also doubt it is caused by config charset to one node and read data from another node. But since proxysql has supported r/w split for long time, we think it should be okay. And in my above test, it is always Ok in same case which cannot be explained by r/w nodes. But maybe we are wrong. Any comment is appreciated.

@renecannao
Copy link
Contributor

One quick note while I analyze the rest of the data: read/write split and query rules should not affect charset, because ProxySQL tracks it internally.

@renecannao
Copy link
Contributor

From tcpdump, ProxySQL advertises itself to use utf8_general_ci (0x21 at offset 0x004f):

23:07:00.995381 IP 172.16.200.24.3306 > 172.16.200.24.60212: Flags [P.], seq 1:79, ack 1, win 256, options [nop,nop,TS val 3563706833 ecr 3563706832], length 78
        0x0000:  4500 0082 ddbc 4000 4006 7467 ac10 c818  E.....@.@.tg....
        0x0010:  ac10 c818 0cea eb34 c0be 046c 0ed1 1f79  .......4...l...y
        0x0020:  8018 0100 e8c6 0000 0101 080a d469 d9d1  .............i..
        0x0030:  d469 d9d0 4a00 0000 0a35 2e35 2e33 3000  .i..J....5.5.30.
        0x0040:  0403 0000 602d 5b64 6243 6555 002a b221  ....`-[dbCeU.*.!
        0x0050:  0200 0f80 1500 0000 0000 0000 0000 0058  ...............X
        0x0060:  3a58 2e78 384c 5442 2d59 5800 6d79 7371  :X.x8LTB-YX.mysq
        0x0070:  6c5f 6e61 7469 7665 5f70 6173 7377 6f72  l_native_passwor
        0x0080:  6400                                     d.

The client replies it want to use latin1_swedish_ci (0x08 at offset 0x0040):

23:07:00.995464 IP 172.16.200.24.60212 > 172.16.200.24.3306: Flags [P.], seq 1:95, ack 79, win 257, options [nop,nop,TS val 3563706833 ecr 3563706833], length 94
        0x0000:  4508 0092 c6ab 4000 4006 8b60 ac10 c818  E.....@.@..`....
        0x0010:  ac10 c818 eb34 0cea 0ed1 1f79 c0be 04ba  .....4.....y....
        0x0020:  8018 0101 e8d6 0000 0101 080a d469 d9d1  .............i..
        0x0030:  d469 d9d1 5a00 0001 8da2 0f00 0000 0040  .i..Z..........@
        0x0040:  0800 0000 0000 0000 0000 0000 0000 0000  ................
        0x0050:  0000 0000 0000 0000 726f 6f74 0014 8fed  ........root....
        0x0060:  6afa 4d42 c8ac d669 fc5e a182 1732 78e0  j.MB...i.^...2x.
        0x0070:  4b39 646d 7a6a 5f75 7365 7200 6d79 7371  K9dmzj_user.mysq
        0x0080:  6c5f 6e61 7469 7665 5f70 6173 7377 6f72  l_native_passwor
        0x0090:  6400                                     d.

Immediately after the explicitly ask to set names 'utf8' collate 'utf8_general_ci':

23:07:00.995612 IP 172.16.200.24.60212 > 172.16.200.24.3306: Flags [P.], seq 95:142, ack 90, win 257, options [nop,nop,TS val 3563706833 ecr 3563706833], length 47
        0x0000:  4508 0063 c6ac 4000 4006 8b8e ac10 c818  E..c..@.@.......
        0x0010:  ac10 c818 eb34 0cea 0ed1 1fd7 c0be 04c5  .....4..........
        0x0020:  8018 0101 e8a7 0000 0101 080a d469 d9d1  .............i..
        0x0030:  d469 d9d1 2b00 0000 1673 6574 206e 616d  .i..+....set.nam
        0x0040:  6573 2027 7574 6638 2720 636f 6c6c 6174  es.'utf8'.collat
        0x0050:  6520 2775 7466 385f 6765 6e65 7261 6c5f  e.'utf8_general_
        0x0060:  6369 27                                  ci'

The proxy replies with an unusual large OK packet:

23:07:00.995677 IP 172.16.200.24.3306 > 172.16.200.24.60212: Flags [P.], seq 90:106, ack 142, win 256, options [nop,nop,TS val 3563706833 ecr 3563706833], length 16
        0x0000:  4500 0044 ddbf 4000 4006 74a2 ac10 c818  E..D..@.@.t.....
        0x0010:  ac10 c818 0cea eb34 c0be 04c5 0ed1 2006  .......4........
        0x0020:  8018 0100 e888 0000 0101 080a d469 d9d1  .............i..
        0x0030:  d469 d9d1 0c00 0001 0001 0000 0000 0000  .i..............
        0x0040:  0000 0000                                ....

But then the proxy set latin1 on backend:

23:07:00.996037 IP 127.0.0.1.32794 > 127.0.0.1.3307: Flags [P.], seq 19:40, ack 23, win 273, options [nop,nop,TS val 3563706833 ecr 3563706833], length 21
        0x0000:  4508 0049 7a01 4000 4006 c2a3 7f00 0001  E..Iz.@.@.......
        0x0010:  7f00 0001 801a 0ceb 642b 30c5 bf7a a8ed  ........d+0..z..
        0x0020:  8018 0111 fe3d 0000 0101 080a d469 d9d1  .....=.......i..
        0x0030:  d469 d9d1 1100 0000 0353 4554 204e 414d  .i.......SET.NAM
        0x0040:  4553 206c 6174 696e 31                   ES.latin1

I believe this is related to #554 : it seems that USE ... COLLATE doesn't really work.

Grated that #554 is a bug, the reason why proxysql set latin1 is because the client explicitly asked latin1 during the connect phase (before sending use name).
Could you try to see if changing the charset at connect phase solves the issue? I think this is relevant:
http://php.net/manual/en/ref.pdo-mysql.connection.php

@jkklee
Copy link
Author

jkklee commented Nov 4, 2016

Could you try to see if changing the charset at connect phase solves the issue?

I understand your thoughts.
In the "Login Request" phase, client say it want the "latin1_swedish_ci"; but after login success, client send the

Command: Prepare Statement
Statement: set names 'utf8' collate 'utf8_general_ci'

immediately. But this statement hasn't be routed to the correct node.
I also can image that in R/W split case, there no good methods to deal set xxxx command

@renecannao
Copy link
Contributor

Oh, from dump I didn't even notice that the statement is sent as a prepared statement (0x16).
That's interesting, so it not even related to case #554 .

Changing the charset during the connect phase should be the best option right now.
Thanks

@jkklee
Copy link
Author

jkklee commented Nov 4, 2016

How proxysql process the "set xxxx" command, I test several kind of "set" commands via proxysql
334455
Which show that when a set names xxxx arrived proxysql, proxysql even not transmit it to the backends;
but other kind of set command, proxysql transmit them as the set of mysql_query_rules and mysql_users.default_hostgroup.
set-command-tcpdump.txt

@renecannao
Copy link
Contributor

Quoting myself:

One quick note while I analyze the rest of the data: read/write split and query rules should not affect charset, because ProxySQL tracks it internally.

That is the case: SET NAMES and few more commands are handed internally because they have to change the status of the session, not of a specific backend connection.
If a client wants to use a specific charset, that charset should be used on all the backends it will sends data to, not just one.

Similarly, if you issue a USE dbname to the proxy, the proxy will reply OK without sending the request to any backend: it will just track internally that from now on the client wants to use that schema and this will be relevant when processing query rules.

@jkklee
Copy link
Author

jkklee commented Nov 4, 2016

Got it, thanks for your reply.

but @renecannao, excuse me
In our previously discussion, you think PDO use the wrong charset in the Login Request phase.

But app(PDO or Laravel) send set names utf8 after it's Login Request(in tihs phase,it use latin1), so shoudn't the set names utf8 will take effet at final?

I simulate the same case with mysql-client (two read node with different data in table test.ljk)
(The result show that even though in Login Request phase mysql-client use latin1, but after is send the set names utf8 command, Chinese can display correct. )

[root@test ~]# mysql -P6033 -uuser -ppwd -h127.0.0.1 --default-character-set=latin1
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8958
Server version: 5.5.30 (ProxySQL)

mysql> select * from test.ljk;
+----+---------+
| id | name    |
+----+---------+
|  2 | ?       |    ###look at here
|  7 | lhy     |
|  9 | sdfa    |
| 14 | ???     |    ###look at here
+----+---------+
8 rows in set (0.00 sec)

mysql> select * from test.ljk;
+----+------+
| id | name |
+----+------+
|  1 | ljk  |
|  2 | lhy  |
+----+------+
4 rows in set (0.00 sec)

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from test.ljk;
+----+-----------+
| id | name      |
+----+-----------+
|  2 | 凯       |
|  7 | lhy       |
|  9 | sdfa      |
| 14 | 李建凯 |
+----+-----------+
8 rows in set (0.00 sec)

mysql> select * from test.ljk;
+----+------+
| id | name |
+----+------+
|  1 | ljk  |
|  2 | lhy  |
+----+------+
4 rows in set (0.00 sec)

That's the point, so with same phase same charset and same phase send same command, why mysql-client correct but PDO incorrect.

@melson-jao
Copy link

melson-jao commented Nov 7, 2016

Hi,

After double checking, I think proxysql internal cannot handle "SET NAMES XXX" as expected after connection is established. I am not sure it is a bug or a special design.
It will be good if proxysql can support "SET NAMES XXX" for dynamic changing of charset.

It means it is not possible to dynamically change charset after connection is created via proxysql.
The only method/workaround is giving a specific charset at connection initialization phase.
For example, in my case, PHP Laravel(PDO) and phpmyadmin(mysqli)

Laravel:
add one line at vendor/laravel/framework/src/Illuminate/Database/Connectors/Connector.php
protected $options = [
PDO::ATTR_CASE => PDO::CASE_NATURAL,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
PDO::ATTR_ORACLE_NULLS => PDO::NULL_NATURAL,
PDO::ATTR_STRINGIFY_FETCHES => false,
PDO::ATTR_EMULATE_PREPARES => false,

  •   PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8',
    
    ];

phpmyadmin:
add one line at /usr/share/phpmyadmin/libraries/dbi/mysqli.dbi.lib.php

$link = mysqli_init();
mysqli_options($link, MYSQLI_OPT_LOCAL_INFILE, true)

  • mysqli_options($link, MYSQLI_INIT_COMMAND, 'SET NAMES utf8');

@jkklee
Copy link
Author

jkklee commented Nov 7, 2016

@renecannao,@melson-jao
I add the setting which melson-jao said in Laravel, it take effect.

I tcpdump this again, it show us:

With PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8', added in vendor/laravel/framework/src/Illuminate/Database/Connectors/Connector.php, in the phase of "Login Request", client still use latin1, but after this phase, client send a Query Command "SET NAMES utf8", this comand make my app display Chinese correctly;

Then let's consider the previous case:(with out PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8')
After the "Login Request" phase, client send a Prepare Statement "set names 'utf8'", which I think comes from app/config/local/database.php 'charset' => 'utf8'. And in this case, app can't display Chinese correctly.

So, I think there maybe a bug in ProxySQL to process set names xxx with Prepare Command.

@renecannao
Copy link
Contributor

@melson-jao , @jkklee : proxysql handles SET NAMES if sent as a "Query command", but not as a "Prepared statement" .
Yes, I consider this a bug in ProxySQL to not handle SET NAMES as prepared statements.
Yet, to be honest, I don't understand why Laravel sends 3 requests (Prepare,Execute,Close) to change the charset instead of a single request using a "Query Command": seems absolutely inefficient!

Although possible, I do not think you should set a PDO::MYSQL_ATTR_INIT_COMMAND (that will change the charset after connection), as it will be more efficient to set charset at connection phase.
I have seen a PHP application connecting as the follow, so I believe Laravel can be configured similarly:

new PDO("mysql:host=$host;dbname=$name;port=$port;charset=UTF8", $user, $pass, array(PDO::ATTR_ERRMODE => PDO::ERRMODE_WARNING, PDO::ATTR_PERSISTENT => true));

Note: charset is specified in the connection string.

With regards to mysqli:
http://php.net/manual/en/mysqli.set-charset.php
http://php.net/manual/en/mysqlinfo.concepts.charset.php
Interesting quotes from the links above:
a)

Using mysqli_query() to set it (such as SET NAMES utf8) is not recommended.

b)

The character set should be understood and defined, as it has an affect on every action, and includes security implications. For example, the escaping mechanism (e.g., mysqli_real_escape_string() for mysqli, mysql_real_escape_string() for mysql, and PDO::quote() for PDO_MySQL) will adhere to this setting. It is important to realize that these functions will not use the character set that is defined with a query

@renecannao renecannao self-assigned this Nov 7, 2016
@renecannao
Copy link
Contributor

Note to myself on how to fix this.

  • COM_STMT_PREPARE won't be forwarded to any backend: proxysql will internally generate a COM_STMT_PREPARE_OK
  • COM_STMT_EXECUTE won't be forwarded to any backend: proxysql will respond with OK or ERR as it if was a normal COM_QUERY

@melson-jao
Copy link

melson-jao commented Nov 7, 2016

@renecannao
Appreciate your reply. "SET NAMES XXX" maybe not a best/recommended method to assign charset but it is widely used and it will be much better if it works with proxysql.

I do not want to focus too much on Laravel, even PHP.
I prefer to following this issue in a general view, sql prepare statement.
For example, with mysql client CLI via proxysql, case A works but case B fails.
It shows something wrong with prepared statement at proxysql internal and I am not sure if the "set names" is the only one case before root cause is clarified.

Case A => OK
mysql -u user -h localhost -P 3307 -p --default-character-set=latin1
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from event limit 1;
=> charset is utf8 => set name works => OK

Case B => FAIL
mysql -u user -h localhost -P 3307 -p --default-character-set=latin1
mysql> PREPARE stmt FROM 'SET NAMES utf8';
Query OK, 0 rows affected (0.01 sec)
Statement prepared
mysql> EXECUTE stmt;
Query OK, 0 rows affected (0.01 sec)
mysql> select * from event limit 1;
=> charset is latin 1 => set names not work => FAIL

PS. I run case B without proxysql and it works. The prepared statement set names take effect as expected. It means something wrong when proxysql handle the prepared statement in some case.

@jkklee
Copy link
Author

jkklee commented Nov 14, 2016

Currently,I think a better way for Laravel to set charset is modify the vendor/laravel/framework/src/Illuminate/Database/Connectors/MySqlConnector.php,
change from

    protected function getHostDsn(array $config)
    {
        extract($config);

        return isset($config['port'])
                        ? "mysql:host={$host};port={$port};dbname={$database}"
                        : "mysql:host={$host};dbname={$database}";
    }

to

    protected function getHostDsn(array $config)
    {
        extract($config);

        return isset($config['port'])
                        ? "mysql:host={$host};port={$port};dbname={$database};charset=utf8"
                        : "mysql:host={$host};dbname={$database};charset=utf8";
    }

This will set the charset in phase of the Login Request and do not need send a set names xxx command to proxysql

@renecannao
Copy link
Contributor

@melson-jao:
I fully agree with you.
On the other end, covering 100% of the protocol and syntax will require years, so I am trying to focus on the most common usage patterns (SET NAMES without prepared statements seem to be one) and try to find workaround when possible (like in the Laravel case).

Thank you

@jkklee
Copy link
Author

jkklee commented Nov 15, 2016

@renecannao
Proxysql also can't process the Query() : SET NAMES 'utf8' COLLATE 'utf8_general_ci' correctly.
If others use the PHP Framework CodeIgniter, they can modify the lib/system/database/drivers/mysql/mysql_driver.php as follows:

function db_set_charset($charset, $collation)
     {
-        return @mysql_query("SET NAMES '" . $this->escape_str($charset) . "' COLLATE '" . $this->escape_str($collation) . "'", $this->conn_id);
+        return @mysql_query("SET NAMES '" . $this->escape_str($charset), $this->conn_id);
     }

@renecannao
Copy link
Contributor

Yes, this is tracked in #554 . This has higher priority that handling SET NAMES in prepared statements.
Thanks

@jkklee
Copy link
Author

jkklee commented Nov 15, 2016

Thanks for your work.
But in "CodeIgniter" framework, it's a Query(): SET NAMES 'utf8' COLLATE 'utf8_general_ci', not a COM_STMT_PREPARE.
While Query(): SET NAMES 'utf8' is correct.
@renecannao

@dwillbrandt
Copy link

I had some issues with setting the right character-set, too. Maybe they are caused by my setup, and i need help :).
After reading the issues that belongs to charsets, i've tested different settings. But first i'll explain my setup:

TL;DR
All ProxySQL instances are in version 1.4.3. All is running on Ubuntu 16.04 LTS.
Some services (POP3/IMAP/SMTP/RADIUS/...), all on different machines ==connection to===> "simple-server" with ProxySQL ===distributes queries over===> ProxySQL(1-4) ===distributes queries over===> MySQL-Server(1-10)
Some of the services don't need utf8 and some other services need utf8.

-- Detailed explaination --
i have a server for simple services (like searchengine and so on) - later named "simple-server" :). And on this machine is a ProxySQL which is configured as simple as it can be. No special logic, only some users. This server distributes all queries over four other ProxySQL-Servers in the second level.

The second level, consists of four dedicated ProxySQL instances - here comes all together. All the logic, all the different database users, R/W-Splitting and so on. They distribute the traffic to our MySQL servers.

Thats the third level: All our different databases are on all of our servers. One of these is a replication master. All MySQL servers are configured the same way, no differences in software configuration.

The problem is this:
Yesterday i reconfigured the services (pop3/imap/smtp/authentication/ftp/...) to not connect directly to our database servers, but over the single ProxySQL instance on the "simple-server".
From that time on, all services that use this instance and needs utf8 are getting latin1 as used character set. Also if the services send a "SET NAMES utf8" as a prequery (or by setting utf8 as default-character-set).

I've tried different things with the mysql-client, for example starting with "--default-character-set=utf8" or "set names utf8". But nothing changed, everytime (before and after doing different queries) i got "latin1" as character_set_(client|results|connection).
At last i've tried a "set names latin1" within the same connection - nothing changed. Then i did a "set names utf8", again in the same connection... From this time on, all runs well - for exactly this single connection. After a reconnect, the steps had to be repeated. Also when the backend server didn't changed.

What i am missing?
Does the problem occours because the charset is set on the connection and not on the session? But why doesn't it work after a reconnect? Is "set names" not passed through all ProxySQL instances because one thinks "hey i'm on utf8, i don't need to send it through"?

Maybe someone can help?

@renecannao
Copy link
Contributor

@dwillbrandt : is this still an issue?

@olafz
Copy link

olafz commented Aug 30, 2018

Hello @renecannao,

We are experiencing the same issue. We are running ProxySQL 1.4.9 on Debian Stretch.

First, a Server Greeting proto=10 version=5.7.20 with Server Language: utf8 COLLATE utf8_general_ci:

screen shot 2018-08-30 at 16 46 25

Then, the client connects Login Request user=xxx db=xxx with Charset: latin1 COLLATE latin1_swedish_ci:

screen shot 2018-08-30 at 16 48 11

Then, the client issues as first request Statement: SET CHARACTER SET utf8:

screen shot 2018-08-30 at 16 50 47

Then, queries are issued to a database/table that has utf8 encoding. The weird thing is, some responses are latin1 encoded and some are encoded as utf8. In the following samples, the same query was executed multiple times on the same client connection:

screen shot 2018-08-30 at 16 51 49

screen shot 2018-08-30 at 16 53 37

In our case, we're hitting this because we are using Mariaex that has the latin1 hardcoded.

@renecannao
Copy link
Contributor

@olafz , what you reported is not relative to this issue #780 . I created a new issue #1692 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants