Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted resultsets in Monitor cause crash #1994

Closed
renecannao opened this issue Apr 2, 2019 · 11 comments
Closed

Corrupted resultsets in Monitor cause crash #1994

renecannao opened this issue Apr 2, 2019 · 11 comments
Assignees
Milestone

Comments

@renecannao
Copy link
Contributor

ProxySQL version affected 2.0, at least up to 2.0.3

It randomly happens that MySQL resultsets in Monitor module get corrupted, leading to crash during its processing. Although it happens rarely, crashes are not good.
The reason is not clear yet.
In ProxySQL 2.0.4 a series of workarounds will be introduced to prevent the crashes, trying to gracefully handle the corruption.

This issue is a placeholder, until the bug is completely fixed.

@renecannao
Copy link
Contributor Author

Closing

@rubavz
Copy link

rubavz commented Dec 29, 2020

Hello Im still having this issue on Proxysql 2.0.15 i reinstalled Proxysql and that didnt worked.

2020-12-29 00:20:57 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 10.9.0.18:3306 . See bug #1994
2020-12-29 00:20:57 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 10.9.0.19:3306 . See bug #1994
2020-12-29 00:20:57 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 10.9.0.12:3306 . See bug #1994
2020-12-29 00:20:57 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 10.9.0.15:3306 . See bug #1994
2020-12-29 00:20:57 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 10.9.0.16:3306 . See bug #1994
2020-12-29 00:21:02 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 10.9.0.18:3306 . See bug #1994
2020-12-29 00:21:02 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 10.9.0.19:3306 . See bug #1994
2020-12-29 00:21:02 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 10.9.0.12:3306 . See bug #1994

@barakseri1
Copy link

Also having this issue with "ProxySQL version 2.0.10-27-g5b319972, codename Truls" :
2021-01-05 10:17:25 MySQL_Monitor.cpp:1853:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 172.30.0.188:3306 . See bug #1994

I have 3 galera clusters behind proxysql, and this error prevents proxysql from moving the sql servers to the correct galera hostgroups for one of the clusters.
with the other 2 clusters the proxysql moves the sql servers into the correct hostgroups just fine

@a7lan
Copy link

a7lan commented Jan 6, 2021

I Also have this issue
ProxySQL version 2.0.15-20-g32bb92cd

2021-01-06 20:30:50 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 172.16.150.103:3306 . See bug #1994
2021-01-06 20:30:50 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 172.16.150.104:3306 . See bug #1994
2021-01-06 20:30:50 MySQL_Monitor.cpp:1319:monitor_group_replication_thread(): [ERROR] mysql_fetch_fields returns NULL, or mysql_num_fields is incorrect. Server 172.16.150.104:3306 . See bug #1994
2021-01-06 20:30:50 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 172.16.150.105:3306 . See bug #1994
2021-01-06 20:30:50 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 172.16.150.103:3306 . See bug #1994
2021-01-06 20:30:50 MySQL_Monitor.cpp:1319:monitor_group_replication_thread(): [ERROR] mysql_fetch_fields returns NULL, or mysql_num_fields is incorrect. Server 172.16.150.105:3306 . See bug #1994
2021-01-06 20:30:50 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 172.16.150.104:3306 . See bug #1994
2021-01-06 20:30:50 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 172.16.150.105:3306 . See bug #1994
2021-01-06 20:30:55 MySQL_Monitor.cpp:1319:monitor_group_replication_thread(): [ERROR] mysql_fetch_fields returns NULL, or mysql_num_fields is incorrect. Server 172.16.150.103:3306 . See bug #1994
2021-01-06 20:30:55 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 172.16.150.103:3306 . See bug #1994
2021-01-06 20:30:55 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 172.16.150.104:3306 . See bug #1994
2021-01-06 20:30:55 MySQL_Monitor.cpp:1319:monitor_group_replication_thread(): [ERROR] mysql_fetch_fields returns NULL, or mysql_num_fields is incorrect. Server 172.16.150.104:3306 . See bug #1994
2021-01-06 20:30:55 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 172.16.150.105:3306 . See bug #1994
2021-01-06 20:30:55 MySQL_Monitor.cpp:1319:monitor_group_replication_thread(): [ERROR] mysql_fetch_fields returns NULL, or mysql_num_fields is incorrect. Server 172.16.150.105:3306 . See bug #1994
2021-01-06 20:30:55 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 172.16.150.103:3306 . See bug #1994
2021-01-06 20:30:55 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 172.16.150.104:3306 . See bug #1994
2021-01-06 20:30:55 MySQL_Monitor.cpp:1832:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server 172.16.150.105:3306 . See bug #1994

I have one Percona Xtradb Cluster behind proxysql, and sql servers not moved to the correct hostgroups

@KohoSales
Copy link

+1, same issue still.

@zhaopinglu
Copy link

Hit the same error and solved it.

If the following query return nothing, then the monitor function will raise the error.
select ... from sys.gr_member_routing_candidate_status;

The definition of view sys.gr_member_routing_candidate_status (https://gist.github.com/lefred/77ddbde301c72535381ae7af9f968322):

CREATE OR REPLACE VIEW sys.gr_member_routing_candidate_status AS
SELECT
sys.gr_member_in_primary_partition(a.member_id) as viable_candidate,
IF( (
SELECT
(
SELECT
GROUP_CONCAT(variable_value)
FROM
performance_schema.global_variables
WHERE
variable_name IN ('read_only', 'super_read_only')) != 'OFF,OFF'),
'YES',
'NO') as read_only,
sys.gr_applier_queue_length() as transactions_behind,
Count_Transactions_in_queue as 'transactions_to_cert'
from
performance_schema.replication_group_member_stats a
JOIN performance_schema.replication_group_members b ON
a.member_id = b.member_id
WHERE
b.member_host IN (
SELECT
variable_value
FROM
performance_schema.global_variables
WHERE
variable_name = 'hostname')$$

In my case, the hostnames returned from the following 2 querys are different, then cause the issue.

select * from performance_schema.global_variables where variable_name='hostname';
select MEMBER_ID,MEMBER_HOST from performance_schema.replication_group_members;

Solution:
Check the following files in all mgr nodes and make sure to only use the real hostname (returned from command hostname) during mgr configuration.
/etc/hosts
/etc/my.cnf
/etc/proxysql.cnf
"datadir"/mysqld-auto.cnf # "datadir" defined in /etc/my.cnf

@nickcFRU
Copy link

nickcFRU commented Apr 29, 2022

ran into these error logs on my end, which was preventing the galera hostgroups feature from working correctly.
This happened on proxysql version 2.3.2 , my nodes were running percona version 5.7.36
022-04-29 12:41:44 MySQL_Monitor.cpp:2065:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server XXXXXXXX:3306 . See bug #1994 2022-04-29 12:41:49 MySQL_Monitor.cpp:2065:monitor_galera_thread(): [ERROR] mysql_fetch_fields returns NULL. Server XXXXXXXX:3306 . See bug #1994

In my case, the query being run by proxysql to retrieve the wsrep status variables was failing due to a specific subquery
SELECT (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='WSREP_LOCAL_STATE') wsrep_local_state, @@read_only read_only, (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='WSREP_LOCAL_RECV_QUEUE') wsrep_local_recv_queue , @@wsrep_desync wsrep_desync, @@wsrep_reject_queries wsrep_reject_queries, @@wsrep_sst_donor_rejects_queries wsrep_sst_donor_rejects_queries, (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='WSREP_CLUSTER_STATUS') wsrep_cluster_status , (SELECT COALESCE(MAX(VARIABLE_VALUE),'DISABLED') FROM performance_schema.global_variables WHERE variable_name='pxc_maint_mode') pxc_maint_mode;

mysql> select variable_value from performance_schema.global_variables where variable_name='pxc_maint_mode'; ERROR 1682 (HY000): Native table 'performance_schema'.'global_variables' has the wrong structure

This was most likely caused by me starting the cluster off of an xtrabackup created off of an older version of percona. Running mysql_upgrade fixed it
sudo mysql_upgrade -uroot -p
sudo service mysql restart

I'm not sure why proxysql couldn't see the sql error produced by my nodes, but was able to figure out the cause and wanted to share in case someone else runs into something similar.

@renecannao
Copy link
Contributor Author

Thank you @zhaopinglu and @nickcFRU .

Just to clarify: proxysql didn't crash, right?

To give more context, it was expecting a resultset with a specific number of columns, instead it get something else: a different number of columns, an empty result, etc, that was bug 1994.
I think we can declare bug 1994 really fixed, and log the error message reported by the server if any.

@nickcFRU
Copy link

@renecannao Correct proxysql did not crash, just produced the specified error in logs and kept running.

@renecannao
Copy link
Contributor Author

@nickcFRU : thank you for confirming!

@Fanduzi
Copy link

Fanduzi commented Aug 3, 2022

still hit this bug in ProxySQL2.3.2 and made proxysql crash

2022-08-03 11:00:09 MySQL_Monitor.cpp:2298:monitor_replication_lag_thread(): [ERROR] mysql_fetch_fields returns NULL, or mysql_num_fields is incorrect. Server 172.16.23.215:3308 . See bug #1994
2022-08-03 11:00:59 MySQL_Monitor.cpp:2298:monitor_replication_lag_thread(): [ERROR] mysql_fetch_fields returns NULL, or mysql_num_fields is incorrect. Server 172.16.23.215:3308 . See bug #1994
Error: signal 11:
/usr/bin/proxysql(_Z13crash_handleri+0x2a)[0x594b7a]
/lib64/libc.so.6(+0x36340)[0x7fab208bd340]
/usr/bin/proxysql(unpack_field+0x6c)[0xa340ac]
/usr/bin/proxysql(mthd_my_read_metadata_ex+0xac)[0xa353bc]
/usr/bin/proxysql(mthd_my_read_query_result+0x14a)[0xa356ca]
/usr/bin/proxysql[0xa4829b]
/usr/bin/proxysql(my_context_spawn+0x41)[0xa4cf91]
 ---- /usr/bin/proxysql(_Z13crash_handleri+0x2a) [0x594b7a] : crash_handler(int)
2022-08-03 11:00:59 main.cpp:1245:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2022-08-03 11:00:59 [INFO] ProxySQL version 2.3.2-10-g8cd66cf
2022-08-03 11:00:59 [INFO] ProxySQL SHA1 checksum: ab70d6884656c6dbe0dc400edba0665ee0d04229
2022-08-03 11:00:59 [INFO] Angel process started ProxySQL process 1905
2022-08-03 11:00:59 [INFO] Loaded built-in SQLite3
Standard ProxySQL MySQL Logger rev. 2.0.0714 -- MySQL_Logger.cpp -- Thu Sep 30 21:22:46 2021
Standard ProxySQL Cluster rev. 0.4.0906 -- ProxySQL_Cluster.cpp -- Thu Sep 30 21:22:46 2021
Standard ProxySQL Statistics rev. 1.4.1027 -- ProxySQL_Statistics.cpp -- Thu Sep 30 21:22:46 2021
Standard ProxySQL HTTP Server Handler rev. 1.4.1031 -- ProxySQL_HTTP_Server.cpp -- Thu Sep 30 21:22:46 2021

and two thing i don't understand is

1.Why do these coredump files show a size of 25G, but the actual size is only a few tens of megabytes?
image

2.What is the best practices for proxysql upgrades? I can see these logs after each crash, but they are not in the global_variables table

 ---- /usr/bin/proxysql(_Z13crash_handleri+0x2a) [0x594b7a] : crash_handler(int)
2022-08-03 11:20:51 main.cpp:1245:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2022-08-03 11:20:51 [INFO] ProxySQL version 2.3.2-10-g8cd66cf
2022-08-03 11:20:51 [INFO] ProxySQL SHA1 checksum: ab70d6884656c6dbe0dc400edba0665ee0d04229
2022-08-03 11:20:51 [INFO] Angel process started ProxySQL process 7461
2022-08-03 11:20:51 [INFO] Loaded built-in SQLite3
Standard ProxySQL MySQL Logger rev. 2.0.0714 -- MySQL_Logger.cpp -- Thu Sep 30 21:22:46 2021
Standard ProxySQL Cluster rev. 0.4.0906 -- ProxySQL_Cluster.cpp -- Thu Sep 30 21:22:46 2021
Standard ProxySQL Statistics rev. 1.4.1027 -- ProxySQL_Statistics.cpp -- Thu Sep 30 21:22:46 2021
Standard ProxySQL HTTP Server Handler rev. 1.4.1031 -- ProxySQL_HTTP_Server.cpp -- Thu Sep 30 21:22:46 2021
2022-08-03 11:20:51 ProxySQL_Admin.cpp:6178:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set not existing variable default_action with value "". Deleting. If the variable name is correct, this version doesn't 
support it
2022-08-03 11:20:51 ProxySQL_Admin.cpp:6178:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set not existing variable default_names with value "". Deleting. If the variable name is correct, this version doesn't s
upport it
2022-08-03 11:20:51 ProxySQL_Admin.cpp:6178:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set not existing variable default_sql_safe_updates with value "". Deleting. If the variable name is correct, this versio
n doesn't support it
2022-08-03 11:20:51 ProxySQL_Admin.cpp:6178:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set not existing variable default_sql_select_limit with value "". Deleting. If the variable name is correct, this versio
n doesn't support it
2022-08-03 11:20:51 ProxySQL_Admin.cpp:6178:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set not existing variable default_sql_mode with value "". Deleting. If the variable name is correct, this version doesn'
t support it
2022-08-03 11:20:51 ProxySQL_Admin.cpp:6178:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set not existing variable default_time_zone with value "". Deleting. If the variable name is correct, this version doesn
't support it
2022-08-03 11:20:51 ProxySQL_Admin.cpp:6178:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set not existing variable default_isolation_level with value "". Deleting. If the variable name is correct, this version
 doesn't support it
2022-08-03 11:20:51 ProxySQL_Admin.cpp:6178:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set not existing variable default_transaction_read with value "". Deleting. If the variable name is correct, this versio
n doesn't support it
2022-08-03 11:20:51 ProxySQL_Admin.cpp:6178:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set not existing variable default_sql_auto_is_null with value "". Deleting. If the variable name is correct, this versio
n doesn't support it
2022-08-03 11:20:51 ProxySQL_Admin.cpp:6178:flush_mysql_variables___database_to_runtime(): [WARNING] Impossible to set not existing variable default_net_write_timeout with value "". Deleting. If the variable name is correct, this versi
on doesn't support it

Impossible to set not existing variable default_names

admin@127.0.0.1 11:23:20 [(none)]>  select * from global_variables where variable_name like '%name%' order by variable_name;
+------------------------------------+----------------+
| variable_name                      | variable_value |
+------------------------------------+----------------+
| admin-cluster_username             | cluster        |
| mysql-auditlog_filename            |                |
| mysql-eventslog_filename           |                |
| mysql-monitor_username             | proxysql       |
| mysql-query_digests_track_hostname | false          |
+------------------------------------+----------------+
5 rows in set (0.00 sec)

admin@127.0.0.1 11:23:21 [(none)]> save mysql variables to disk;
Query OK, 147 rows affected (0.01 sec)

admin@127.0.0.1 11:24:20 [(none)]> save admin variables to disk;
Query OK, 46 rows affected (0.00 sec)

sqlite.db
image

seem's like that save variables to disk didn't delete the noexist variable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants