forked from percona/percona-server
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PS-8592: XCom connection stalled forever in read() syscall over network
https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION.
- Loading branch information
1 parent
8a7708d
commit 257f4d2
Showing
19 changed files
with
390 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
50 changes: 50 additions & 0 deletions
50
mysql-test/suite/group_replication/r/gr_ssl_socket_timeout.result
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
include/group_replication.inc | ||
Warnings: | ||
Note #### Sending passwords in plain text without SSL/TLS is extremely insecure. | ||
Note #### Storing MySQL user name or password information in the connection metadata repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START REPLICA; see the 'START REPLICA Syntax' in the MySQL Manual for more information. | ||
[connection server1] | ||
|
||
############################################################ | ||
# 1. Start one member with GCS SSL enabled. | ||
[connection server1] | ||
SET @group_replication_ssl_mode_save= @@GLOBAL.group_replication_ssl_mode; | ||
SET GLOBAL group_replication_ssl_mode= REQUIRED; | ||
SET @group_replication_xcom_ssl_socket_timeout_save= @@GLOBAL.group_replication_xcom_ssl_socket_timeout; | ||
SET @group_replication_xcom_ssl_accept_retries_save= @@GLOBAL.group_replication_xcom_ssl_accept_retries; | ||
SET GLOBAL group_replication_xcom_ssl_socket_timeout= 3; | ||
SET GLOBAL group_replication_xcom_ssl_accept_retries= 3; | ||
include/start_and_bootstrap_group_replication.inc | ||
Occurrences of 'Group communication SSL configuration: group_replication_ssl_mode: "REQUIRED"' in the input file: 1 | ||
|
||
############################################################ | ||
# 2. Start the second member with GCS SSL enabled, the member | ||
# will be able to join the group. | ||
[connection server2] | ||
SET @group_replication_ssl_mode_save= @@GLOBAL.group_replication_ssl_mode; | ||
SET GLOBAL group_replication_ssl_mode= REQUIRED; | ||
include/start_group_replication.inc | ||
include/rpl_gr_wait_for_number_of_members.inc | ||
Occurrences of 'Group communication SSL configuration: group_replication_ssl_mode: "REQUIRED"' in the input file: 1 | ||
|
||
############################################################ | ||
# 3. Verify that any connection on group_replication | ||
# communication port is aborted by the server after the | ||
# timout configured by the group_replication_xcom_ssl_socket_timeout. | ||
include/stop_group_replication.inc | ||
SET @group_replication_communication_debug_options_save = @@GLOBAL.group_replication_communication_debug_options; | ||
SET GLOBAL group_replication_communication_debug_options= "XCOM_DEBUG_BASIC"; | ||
START GROUP_REPLICATION; | ||
SET @@GLOBAL.group_replication_communication_debug_options= @group_replication_communication_debug_options_save; | ||
include/assert_grep.inc [Assert that the mysql connection has been ended by the server] | ||
include/assert_grep.inc [Assert that message about aborting the connection has been logged to GCS_DEBUG_TRACE file] | ||
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2 | ||
|
||
############################################################ | ||
# 4. Clean up. | ||
[connection server1] | ||
SET GLOBAL group_replication_ssl_mode= @group_replication_ssl_mode_save; | ||
SET GLOBAL group_replication_xcom_ssl_socket_timeout= @group_replication_xcom_ssl_socket_timeout_save; | ||
SET GLOBAL group_replication_xcom_ssl_accept_retries= @group_replication_xcom_ssl_accept_retries_save; | ||
[connection server2] | ||
SET GLOBAL group_replication_ssl_mode= @group_replication_ssl_mode_save; | ||
include/group_replication_end.inc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
130 changes: 130 additions & 0 deletions
130
mysql-test/suite/group_replication/t/gr_ssl_socket_timeout.test
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
################################################################################ | ||
# This test verifies that any unintended connection on group_replication | ||
# communication port is aborted by the server after the timout configured by | ||
# the group_replication_xcom_ssl_socket_timeout. | ||
# | ||
# Test: | ||
# 0. The test requires two servers: M1 and M2. | ||
# 1. Enable group_replication_ssl_mode = REQUIRED on both members and start GR. | ||
# 2. With both members ONLINE, stop GR on M2. | ||
# 3. Initiate a connection on the GR communication port of M1 as a background | ||
# process. | ||
# 4. Start GR on M2. | ||
# 5. Verify that START GR will be successful, after the server aborting the | ||
# connection. | ||
# 6. Cleanup | ||
################################################################################ | ||
|
||
--source include/have_group_replication_xcom_communication_stack.inc | ||
--source include/have_group_replication_plugin.inc | ||
--let $rpl_skip_group_replication_start= 1 | ||
--source include/group_replication.inc | ||
|
||
|
||
--echo | ||
--echo ############################################################ | ||
--echo # 1. Start one member with GCS SSL enabled. | ||
--let $rpl_connection_name= server1 | ||
--source include/rpl_connection.inc | ||
SET @group_replication_ssl_mode_save= @@GLOBAL.group_replication_ssl_mode; | ||
SET GLOBAL group_replication_ssl_mode= REQUIRED; | ||
|
||
# Set the group_replication_xcom_ssl_socket_timeout and group_replication_xcom_ssl_accept_retries | ||
SET @group_replication_xcom_ssl_socket_timeout_save= @@GLOBAL.group_replication_xcom_ssl_socket_timeout; | ||
SET @group_replication_xcom_ssl_accept_retries_save= @@GLOBAL.group_replication_xcom_ssl_accept_retries; | ||
|
||
SET GLOBAL group_replication_xcom_ssl_socket_timeout= 3; | ||
SET GLOBAL group_replication_xcom_ssl_accept_retries= 3; | ||
|
||
# Bootstrap and start group replication | ||
--source include/start_and_bootstrap_group_replication.inc | ||
|
||
# Verify that GR was started with group_replication_ssl_mode = REQUIRED | ||
--let $grep_file= $MYSQLTEST_VARDIR/log/mysqld.1.err | ||
--let $grep_pattern= Group communication SSL configuration: group_replication_ssl_mode: "REQUIRED" | ||
--let $grep_output= print_count | ||
--source include/grep_pattern.inc | ||
|
||
--echo | ||
--echo ############################################################ | ||
--echo # 2. Start the second member with GCS SSL enabled, the member | ||
--echo # will be able to join the group. | ||
--let $rpl_connection_name= server2 | ||
--source include/rpl_connection.inc | ||
--disable_query_log | ||
--eval SET GLOBAL group_replication_group_name= '$group_replication_group_name' | ||
--enable_query_log | ||
|
||
SET @group_replication_ssl_mode_save= @@GLOBAL.group_replication_ssl_mode; | ||
SET GLOBAL group_replication_ssl_mode= REQUIRED; | ||
--source include/start_group_replication.inc | ||
|
||
--let $group_replication_number_of_members= 2 | ||
--source include/gr_wait_for_number_of_members.inc | ||
|
||
--let $grep_file= $MYSQLTEST_VARDIR/log/mysqld.2.err | ||
--let $grep_pattern= Group communication SSL configuration: group_replication_ssl_mode: "REQUIRED" | ||
--let $grep_output= print_count | ||
--source include/grep_pattern.inc | ||
|
||
--echo | ||
--echo ############################################################ | ||
--echo # 3. Verify that any connection on group_replication | ||
--echo # communication port is aborted by the server after the | ||
--echo # timout configured by the group_replication_xcom_ssl_socket_timeout. | ||
|
||
# STOP GR on server2 | ||
--source include/stop_group_replication.inc | ||
|
||
# Connect to GR communication port on server1. For the purpose of testing, we | ||
# use mysql client here. | ||
--connection server1 | ||
SET @group_replication_communication_debug_options_save = @@GLOBAL.group_replication_communication_debug_options; | ||
SET GLOBAL group_replication_communication_debug_options= "XCOM_DEBUG_BASIC"; | ||
--let $gr_port= `SELECT SUBSTRING(@@group_replication_local_address, LOCATE(':',@@group_replication_local_address) + 1)` | ||
--let $command= $MYSQL | ||
--let $command_opt= --user=root --host=127.0.0.1 --port=$gr_port | ||
--let $output_file= $MYSQLTEST_VARDIR/tmp/mysql_output | ||
--let $pid_file= $MYSQLTEST_VARDIR/tmp/mysql_pid | ||
--let $redirect_stderr= 1 | ||
--source include/start_proc_in_background.inc | ||
|
||
--connection server2 | ||
START GROUP_REPLICATION; | ||
|
||
--connection server1 | ||
SET @@GLOBAL.group_replication_communication_debug_options= @group_replication_communication_debug_options_save; | ||
--source include/wait_proc_to_finish.inc | ||
|
||
# Assert that mysql command has failed | ||
--let $assert_text= Assert that the mysql connection has been ended by the server | ||
--let $assert_select= Lost connection to MySQL server at \'reading initial communication packet\' | ||
--let $assert_file= $output_file | ||
--let $assert_count= 1 | ||
--source include/assert_grep.inc | ||
|
||
# Assert that message about aborting the connection has been logged to GCS_DEBUG_TRACE file | ||
--let $assert_text= Assert that message about aborting the connection has been logged to GCS_DEBUG_TRACE file | ||
--let $assert_select= SSL_accept did receive any data on fd .* despite waiting for 12 seconds in total, aborting the connection. | ||
--let $assert_file= $MYSQLTEST_VARDIR/mysqld.1/data/GCS_DEBUG_TRACE | ||
--let $assert_count= 1 | ||
--source include/assert_grep.inc | ||
--exec cat $output_file | ||
|
||
--echo | ||
--echo ############################################################ | ||
--echo # 4. Clean up. | ||
--let $rpl_connection_name= server1 | ||
--source include/rpl_connection.inc | ||
SET GLOBAL group_replication_ssl_mode= @group_replication_ssl_mode_save; | ||
SET GLOBAL group_replication_xcom_ssl_socket_timeout= @group_replication_xcom_ssl_socket_timeout_save; | ||
SET GLOBAL group_replication_xcom_ssl_accept_retries= @group_replication_xcom_ssl_accept_retries_save; | ||
|
||
--let $rpl_connection_name= server2 | ||
--source include/rpl_connection.inc | ||
SET GLOBAL group_replication_ssl_mode= @group_replication_ssl_mode_save; | ||
|
||
--remove_file $pid_file | ||
--remove_file $output_file | ||
--remove_file $MYSQLTEST_VARDIR/mysqld.1/data/GCS_DEBUG_TRACE | ||
--source include/group_replication_end.inc |
Oops, something went wrong.