Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProxySQL reset connection if it starts with COM_QUIT #3342

Closed
2 tasks done
spumer opened this issue Mar 13, 2021 · 7 comments · Fixed by #3392
Closed
2 tasks done

ProxySQL reset connection if it starts with COM_QUIT #3342

spumer opened this issue Mar 13, 2021 · 7 comments · Fixed by #3392

Comments

@spumer
Copy link

spumer commented Mar 13, 2021

  • ProxySQL version: 2.0.15
  • OS version: Official Docker image: proxysql/proxysql:2.0.15

Description

I'm trying make zero-downtime migration from MASTER (MariaDB) to BALANCER (ProxySQL)
Now i have installed haproxy on my machines and all local applications use that address.
Now HAProxy balance connection between two MASTER IPs (main and reserved).

I have read-only replica (MariaDB) and want split traffic. For that i was add balancer (ProxySQL) address to HAProxy config.
ProxySQL address tagged as unhealthy and traffic not routed to him.

I'm investigate that issue and here is results.

Issue

HAProxy check mysql health status by sending COM_QUIT to the server and wait OK and close connection (see here)
This behaviour described in MySQL protocol official documentation

MariaDB flow (got OK, all fine)

изображение

ProxySQL flow (no OK response, TCP RST only):

изображение

Error log

ProxySQL print this errors for each HAProxy connection attempt:

2021-03-13 17:42:58 mysql_data_stream.cpp:411:check_data_flow(): [ERROR] Session=0x7f53a545b800, DataStream=0x7f53a5448c00 -- Data at both ends of a MySQL data stream: IN <1 bytes 0 packets> , OUT <1 bytes 0 packets>
2021-03-13 17:42:58 MySQL_Thread.cpp:4439:process_all_sessions(): [WARNING] Closing unhealthy client connection 94.19x.xx.xx:51320

@renecannao
Copy link
Contributor

Hi @spumer .
Thank you for the report.
What HAProxy does is, imho, a bit bizarre.
It seems an abuse of the protocol: sending a COM_QUIT when the connection handshake isn't even started:
https://dev.mysql.com/doc/internals/en/connection-phase.html

We can evaluate how easy or complicated is to implement this.

@JavierJF
Copy link
Collaborator

JavierJF commented Apr 2, 2021

Hi, after further inspection we have diagnosed which one was the underlying issue to this.

Looking at the connections that are being terminated by ProxySQL, we can see the exact conditions in which each of those connections are terminated.

(gdb) bt
#0  MySQL_Data_Stream::check_data_flow (this=0x7ffff2645780) at mysql_data_stream.cpp:443
#1  0x0000555555a4dfdb in MySQL_Thread::process_data_on_data_stream (this=0x7ffff2600000, myds=0x7ffff2645780, n=2) at MySQL_Thread.cpp:4690
#2  0x0000555555a4cf46 in MySQL_Thread::run (this=0x7ffff2600000) at MySQL_Thread.cpp:4403
#3  0x000055555598885b in mysql_worker_thread_func (arg=0x7ffff70d1eb0) at main.cpp:776
#4  0x00007ffff7d89299 in start_thread () from /usr/lib/libpthread.so.0
#5  0x00007ffff7948053 in clone () from /usr/lib/libc.so.6
(gdb) x/50hb PSarrayIN->pdata->ptr
0x7ffff260b400:	0x2e	0x00	0x00	0x01	0x00	0x82	0x00	0x00
0x7ffff260b408:	0x00	0x80	0x00	0x01	0x21	0x00	0x00	0x00
0x7ffff260b410:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x7ffff260b418:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x7ffff260b420:	0x00	0x00	0x00	0x00	0x68	0x61	0x70	0x72
0x7ffff260b428:	0x6f	0x78	0x79	0x5f	0x75	0x73	0x65	0x72
0x7ffff260b430:	0x00	0x00
(gdb) x/50hb PSarrayOUT->pdata->ptr
0x7ffff261f0f0:	0x4a	0x00	0x00	0x00	0x0a	0x35	0x2e	0x35
0x7ffff261f0f8:	0x2e	0x33	0x30	0x00	0x27	0x00	0x00	0x00
0x7ffff261f100:	0x4b	0x7e	0x66	0x71	0x56	0x68	0x3f	0x7f
0x7ffff261f108:	0x00	0x2f	0xf2	0x21	0x02	0x00	0x8f	0x81
0x7ffff261f110:	0x15	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x7ffff261f118:	0x00	0x00	0x00	0x1f	0x2d	0x12	0x47	0x67
0x7ffff261f120:	0x0c	0x40

ProxySQL was going to reply with a 'greeting response' packet when the connection is terminated, at this point ProxySQL is handling the connection just fine. But the state of the dataflow from the datastream is unexpected, since there is data in both ends of it, situation that should never happens, so, why it's taking place here? Theory of the issue was the following:

  1. New client connects, ProxySQL sends the Greeting Message.
  2. As soon as the connection is stablished, HAProxy doesn't wait at all for receiving the greeting message, but it sends the 'AuthRequest + COM_QUIT'.
  3. Now HAProxy starts listening for ProxySQL response (as it expects the OK packet for the COM_QUIT), this is the step that could lead to ProxySQL closing the connection, because if HAProxy hasn't started listening and reading, thus ProxySQL could find itself in the situation of Data at both ends of a MySQL data stream: 'IN <1 bytes 0 packets> , OUT <1 bytes 0 packets>' . This explains why this situation only takes places sometimes, and some other connections succeed just fine.
  4. If the connection hasn't been killed, and ProxySQL didn't fount himself in the 'data at both ends' situation, connection ends just fine, and HAProxy is happy with it.

This explanation is also backed up by the following experimental data. Fist this traffic capture from the unpatched version of ProxySQL:

133	1.811653340	127.0.0.1	127.0.0.1	TCP	121	50950 → 6033 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=55 TSval=1476503529 TSecr=1476503529
361	3.812948008	127.0.0.1	127.0.0.1	TCP	121	50972 → 6033 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=55 TSval=1476505531 TSecr=1476505531
363	3.813059052	127.0.0.1	127.0.0.1	TCP	144	6033 → 50972 [PSH, ACK] Seq=1 Ack=56 Win=65536 Len=78 TSval=1476505531 TSecr=1476505531
365	3.813116077	127.0.0.1	127.0.0.1	TCP	77	6033 → 50972 [PSH, ACK] Seq=79 Ack=56 Win=65536 Len=11 TSval=1476505531 TSecr=1476505531
482	5.813856360	127.0.0.1	127.0.0.1	TCP	121	50984 → 6033 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=55 TSval=1476507531 TSecr=1476507531
604	7.815441376	127.0.0.1	127.0.0.1	TCP	121	50996 → 6033 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=55 TSval=1476509533 TSecr=1476509533
606	7.815527353	127.0.0.1	127.0.0.1	TCP	144	6033 → 50996 [PSH, ACK] Seq=1 Ack=56 Win=65536 Len=78 TSval=1476509533 TSecr=1476509533
608	7.815582674	127.0.0.1	127.0.0.1	TCP	77	6033 → 50996 [PSH, ACK] Seq=79 Ack=56 Win=65536 Len=11 TSval=1476509533 TSecr=1476509533
791	9.816322349	127.0.0.1	127.0.0.1	TCP	121	51014 → 6033 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=55 TSval=1476511534 TSecr=1476511534
951	11.817193725	127.0.0.1	127.0.0.1	TCP	121	51030 → 6033 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=55 TSval=1476513535 TSecr=1476513535
1067	13.817351113	127.0.0.1	127.0.0.1	TCP	121	51044 → 6033 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=55 TSval=1476515535 TSecr=1476515535
1069	13.817453891	127.0.0.1	127.0.0.1	TCP	144	6033 → 51044 [PSH, ACK] Seq=1 Ack=56 Win=65536 Len=78 TSval=1476515535 TSecr=1476515535

There is repeated pattern, we can see there are messages of size 121 sent from the client to ProxySQL these are the auth requests by HAProxy:

Frame 133: 121 bytes on wire (968 bits), 121 bytes captured (968 bits) on interface lo, id 0
Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 50950, Dst Port: 6033, Seq: 1, Ack: 1, Len: 55
Data (55 bytes)
    Data: 2e0000010082000000800001210000000000000000000000000000000000000000000000…
    [Length: 55]
0000   00 00 00 00 00 00 00 00 00 00 00 00 08 00 45 00   ..............E.
0010   00 6b 99 60 40 00 40 06 a3 2a 7f 00 00 01 7f 00   .k.`@.@..*......
0020   00 01 c7 06 17 91 3d ec d2 a6 8a b3 ff a2 80 18   ......=.........
0030   02 00 fe 5f 00 00 01 01 08 0a 58 01 a7 e9 58 01   ..._......X...X.
0040   a7 e9 2e 00 00 01 00 82 00 00 00 80 00 01 21 00   ..............!.
0050   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0060   00 00 00 00 00 00 68 61 70 72 6f 78 79 5f 75 73   ......haproxy_us
0070   65 72 00 00 01 00 00 00 01                        er.......

So, the only reason there are more than 1 consecutive of these packets, is because of the connections that ProxySQL is closing, because they hit the timing described previously, and HAProxy is retrying, in fact, if more of 4 of these packets are founds, HAProxy will consider the server to be down. If this is correct, to prevent this situation, the simpler solution would be to enforce ProxySQL to always sent the 'initial handshake packet' as soon as it's generated, this way we will be preventing ProxySQL finding data at both ends of the data stream. After a minimal patch:

diff --git a/lib/MySQL_Thread.cpp b/lib/MySQL_Thread.cpp
index b37ec3ab..4bb97ef2 100644
--- a/lib/MySQL_Thread.cpp
+++ b/lib/MySQL_Thread.cpp
@@ -5245,6 +5245,12 @@ void MySQL_Thread::listener_handle_new_connection(MySQL_Data_Stream *myds, unsig
                ioctl_FIONBIO(sess->client_myds->fd, 1);
                mypolls.add(POLLIN|POLLOUT, sess->client_myds->fd, sess->client_myds, curtime);
                proxy_debug(PROXY_DEBUG_NET,1,"Session=%p -- Adding client FD %d\n", sess, sess->client_myds->fd);
+
+               sess->writeout();
        } else {
                free(addr);
                // if we arrive here, accept() failed

We can observe how the traffic being capture has changed:

1300	17.642014001	127.0.0.1	127.0.0.1	TCP	121	46996 → 6033 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=55 TSval=1508672998 TSecr=1508672997
1302	17.642130120	127.0.0.1	127.0.0.1	TCP	144	6033 → 46996 [PSH, ACK] Seq=1 Ack=56 Win=65536 Len=78 TSval=1508672998 TSecr=1508672998
1304	17.642179639	127.0.0.1	127.0.0.1	TCP	77	6033 → 46996 [PSH, ACK] Seq=79 Ack=56 Win=65536 Len=11 TSval=1508672998 TSecr=1508672998
1532	19.643891974	127.0.0.1	127.0.0.1	TCP	121	47018 → 6033 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=55 TSval=1508674999 TSecr=1508674999
1534	19.643987351	127.0.0.1	127.0.0.1	TCP	144	6033 → 47018 [PSH, ACK] Seq=1 Ack=56 Win=65536 Len=78 TSval=1508674999 TSecr=1508674999
1536	19.644050236	127.0.0.1	127.0.0.1	TCP	77	6033 → 47018 [PSH, ACK] Seq=79 Ack=56 Win=65536 Len=11 TSval=1508675000 TSecr=1508675000
1653	21.644972886	127.0.0.1	127.0.0.1	TCP	121	47030 → 6033 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=55 TSval=1508677000 TSecr=1508677000
1655	21.645084782	127.0.0.1	127.0.0.1	TCP	144	6033 → 47030 [PSH, ACK] Seq=1 Ack=56 Win=65536 Len=78 TSval=1508677001 TSecr=1508677000
1657	21.645138118	127.0.0.1	127.0.0.1	TCP	77	6033 → 47030 [PSH, ACK] Seq=79 Ack=56 Win=65536 Len=11 TSval=1508677001 TSecr=1508677001
1774	23.645653165	127.0.0.1	127.0.0.1	TCP	121	47044 → 6033 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=55 TSval=1508679001 TSecr=1508679001
1776	23.645736253	127.0.0.1	127.0.0.1	TCP	144	6033 → 47044 [PSH, ACK] Seq=1 Ack=56 Win=65536 Len=78 TSval=1508679001 TSecr=1508679001
1778	23.645789782	127.0.0.1	127.0.0.1	TCP	77	6033 → 47044 [PSH, ACK] Seq=79 Ack=56 Win=65536 Len=11 TSval=1508679001 TSecr=1508679001
2006	25.646402390	127.0.0.1	127.0.0.1	TCP	121	47066 → 6033 [PSH, ACK] Seq=1 Ack=1 Win=65536 Len=55 TSval=1508681002 TSecr=1508681002
2008	25.646481857	127.0.0.1	127.0.0.1	TCP	144	6033 → 47066 [PSH, ACK] Seq=1 Ack=56 Win=65536 Len=78 TSval=1508681002 TSecr=1508681002
2010	25.646533666	127.0.0.1	127.0.0.1	TCP	77	6033 → 47066 [PSH, ACK] Seq=79 Ack=56 Win=65536 Len=11 TSval=1508681002 TSecr=1508681002

Now the pattern is exactly the same as connecting directly to MySQL, no further messages are displayed in HAProxy, and no messages are displayed in ProxySQL informing about the termination of a connection.

JavierJF added a commit that referenced this issue Apr 6, 2021
renecannao added a commit that referenced this issue Apr 6, 2021
Closes #3342: ProxySQL reset connection if it starts with COM_QUIT
renecannao added a commit that referenced this issue Apr 6, 2021
Closes #3342: ProxySQL reset connection if it starts with COM_QUIT
@spumer
Copy link
Author

spumer commented Apr 10, 2021

Test on 2.0.18

Same problem on HAProxy side, but now no errors on ProxySQL side

изображение

@JavierJF
Copy link
Collaborator

Hi @spumer ,

I tested again in v2.0.18, and HAProxy version:

HA-Proxy version 2.3.8-e572195 2021/03/25 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2022.
Known bugs: http://www.haproxy.org/bugs/bugs-2.3.8.html
Running on: Linux 5.9.14-arch1-1 #1 SMP PREEMPT Sat, 12 Dec 2020 14:37:12 +0000 x86_64

And I don't see any issues either in ProxySQL side, or in HAProxy side, HAProxy log stays clear, with no signs of healthchecks failing. Besides, my output captures are also clean as the ones I previously shared (I'm adding a wireshark output too for clarity):

screenshot-2021-04-13-17:44:51

As you can see, the packets with length 77 correspond to the OK packets being set by ProxySQL, here you have a detailed of one of those packets:

screenshot-2021-04-13-17:46:28

Since I can't reproduce the issue you are experiencing, could you please either supply further information about:

  1. HAProxy logs (I assume there should be failures in HAProxy side regarding the failing mysql-check).
  2. Version of HAProxy being used.
  3. Configuration of HAProxy being used.

Or alternatively a reproducible test case showing your issue.

Thank you.

@spumer
Copy link
Author

spumer commented Apr 15, 2021

Yea, sorry for that. I had to provide it from the beggining :)

# lsb_release -r
Release:	18.04

# haproxy
HA-Proxy version 1.8.8-1ubuntu0.11 2020/06/22

# uname -r
4.15.0-140-generic
# cat /etc/haproxy/haproxy.cfg
global
	user haproxy
	group haproxy
	daemon

defaults
	log	global
	retries	2
        timeout connect 5000
        timeout client  50000
        timeout server  50000

listen mysql-cluster
    bind 127.0.0.1:4040
    mode tcp
    option mysql-check user haproxy_check
    option log-health-checks
    balance roundrobin
    server mysql-1 xx.xx.xx.xx:3306 check
    server mysql-2 xx.xx.xx.xx:3306 check
    server mysql-3 188.127.xx.xx:6033 check

No error logs, this messages occured only on haproxy restart

# cat /var/log/haproxy.log.1
Apr 11 07:35:56 localhost haproxy[103512]: [WARNING] 099/202615 (103512) : Reexecuting Master process
Apr 11 07:37:38 localhost haproxy[103512]: [WARNING] 100/073556 (103512) : Exiting Master process...
Apr 11 07:37:38 localhost haproxy[103512]: [ALERT] 100/073556 (103512) : Current worker 49380 exited with code 143
Apr 11 07:37:38 localhost haproxy[103512]: [WARNING] 100/073556 (103512) : Former worker 103514 exited with code 143
Apr 11 07:37:38 localhost haproxy[103512]: [WARNING] 100/073556 (103512) : All workers exited. Exiting... (143)

@JavierJF
Copy link
Collaborator

Hi @spumer,

testing your setup I have found that the version of HAProxy that you use, uses the old Protocol::HandshakeResponse320, this response is extremely old, and unsupported by ProxySQL, looks like HAProxy defaulted to post-41 as the default mode for MySQL checks in version 2.2-dev8 (https://www.haproxy.org/download/2.3/src/CHANGELOG):

MEDIUM: checks: Make post-41 the default mode for mysql checks

Reason why I didn't experienced your issue. If you replace this line of you config file:

    option mysql-check user haproxy_check

in favor of this one:

    option mysql-check user haproxy_check post-41

The problem should go away. Please confirm if this solves your issue. Thank you.

@spumer
Copy link
Author

spumer commented May 3, 2021

The problem should go away. Please confirm if this solves your issue.

Problem is gone! Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants