Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #3310: Improve error reporting for 'MySQL_Monitor' and 'MySQL_Session' backend connections via counters #3311

Merged
merged 7 commits into from
Mar 2, 2021

Conversation

JavierJF
Copy link
Collaborator

Closes #3310.

@@ -872,10 +874,12 @@ void * monitor_connect_thread(void *arg) {
(strncmp(mmsd->mysql_error_msg,"ProxySQL Error: Access denied for user",strlen("ProxySQL Error: Access denied for user"))==0)
) {
proxy_error("Server %s:%d is returning \"Access denied\" for monitoring user\n", mmsd->hostname, mmsd->port);
MyHGM->p_update_mysql_error_counter(p_mysql_error_type::proxysql, mmsd->hostgroup_id, mmsd->hostname, mmsd->port, ER_ACCESS_DENIED_ERROR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we hardcoding ER_ACCESS_DENIED_ERROR instead of using mysql_errno(mmsd->mysql) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for logging error is being performed on mmsd->mysql_error_msg, I assumed that if mysql_errno(mmsd->mysql) was guaranteed to be ER_ACCESS_DENIED_ERROR this check would much more simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because those are the only messages we want to write in the error log.
I don't see a valid reason to hardcode an error code, even if we know already what is.

}
else if (strncmp(mmsd->mysql_error_msg,"Your password has expired.",strlen("Your password has expired."))==0)
{
proxy_error("Server %s:%d is returning \"Your password has expired.\" for monitoring user\n", mmsd->hostname, mmsd->port);
MyHGM->p_update_mysql_error_counter(p_mysql_error_type::proxysql, mmsd->hostgroup_id, mmsd->hostname, mmsd->port, ER_MUST_CHANGE_PASSWORD_LOGIN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we hardcoding ER_MUST_CHANGE_PASSWORD_LOGIN instead of using mysql_errno(mmsd->mysql) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as in comment #3311 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reply

@@ -1138,6 +1145,7 @@ void * monitor_read_only_thread(void *arg) {
free(mmsd->mysql_error_msg);
mmsd->mysql_error_msg = new_error;
proxy_error("Timeout on read_only check for %s:%d after %lldms. Unable to create a connection. If the server is overload, increase mysql-monitor_connect_timeout. Error: %s.\n", mmsd->hostname, mmsd->port, (now-mmsd->t1)/1000, new_error);
MyHGM->p_update_mysql_error_counter(p_mysql_error_type::proxysql, mmsd->hostgroup_id, mmsd->hostname, mmsd->port, ER_PROXYSQL_READ_ONLY_CHECK_CONN_TIMEOUT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a connection is not created (for example, because of a timeout) the counter is already increased in create_new_connection().
This line can go away

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counter in create_new_connection is incrementing p_mysql_error_type::mysql counter, this is incrementing p_mysql_error_type::proxysql counter, the first one reflects the error in the backend connection, this one the internal ProxySQL error generated because it was a READ_ONLY_CHECK_CONN_TIMEOUT. This second one offers information missing in the first one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using ER_PROXYSQL_READ_ONLY_CHECK_TIMEOUT ?
Too many variables seems an over-engineering, especially because it means creating a double number of time series. A timeout happened, this is all we need to know.
Same comment applies to all the others.

@@ -1192,6 +1201,7 @@ void * monitor_read_only_thread(void *arg) {
if (now > mmsd->t1 + mysql_thread___monitor_read_only_timeout * 1000) {
mmsd->mysql_error_msg=strdup("timeout check");
proxy_error("Timeout on read_only check for %s:%d after %lldms. If the server is overload, increase mysql-monitor_read_only_timeout.\n", mmsd->hostname, mmsd->port, (now-mmsd->t1)/1000);
MyHGM->p_update_mysql_error_counter(p_mysql_error_type::proxysql, mmsd->hostgroup_id, mmsd->hostname, mmsd->port, ER_PROXYSQL_READ_ONLY_CHECK_TIMEOUT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a connection is not created (for example, because of a timeout) the counter is already increased in create_new_connection().
This line can go away

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as for #3311 (comment)

@@ -1297,20 +1307,23 @@ VALGRIND_ENABLE_ERROR_REPORTING;
if (resultset->rows_count) {
// disable host
proxy_error("Server %s:%d missed %d read_only checks. Assuming read_only=1\n", mmsd->hostname, mmsd->port, max_failures);
MyHGM->p_update_mysql_error_counter(p_mysql_error_type::proxysql, mmsd->hostgroup_id, mmsd->hostname, mmsd->port, ER_PROXYSQL_ASSUME_SRV_READ_ONLY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This itself it is not an error.
We can have a counter for when ProxySQL assumes that read_only=1 , but this is not an error.
The error is when the check fails, not when as action is performed.
This line must go away.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't tracked as a MySQL error, but as an ProxySQL generated error p_mysql_error_type::proxysql, the line is outdated, new code is ER_PROXYSQL_READ_ONLY_CHECKS_MISSED

@@ -2851,7 +2848,6 @@ bool MySQL_Session::handler_again___status_CHANGING_AUTOCOMMIT(int *_rc) {
myds->DSS = STATE_MARIADB_GENERIC;
NEXT_IMMEDIATE_NEW(st);
} else {
MyHGM->p_update_mysql_error_counter(p_mysql_error_type::mysql, myconn->parent->myhgc->hid, myconn->parent->address, myconn->parent->port, mysql_errno(myconn->mysql));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of replacing this with 3 calls, we can use this:

MyHGM->p_update_mysql_error_counter(p_mysql_error_type::mysql, myconn->parent->myhgc->hid, myconn->parent->address, myconn->parent->port, ( mysql_errno(myconn->mysql) ? mysql_errno(myconn->mysql) : ER_PROXYSQL_OFFLINE_SRV ) );

} else {
proxy_error(
"Detected a broken connection during SET AUTOCOMMIT on %s , %d : %d, %s\n",
myconn->parent->address,
myconn->parent->port,
myerr,
ER_PROXYSQL_OFFLINE_SRV,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can stay myerr .
For the same replace didn't happen in othe functions, like handler_again___status_SETTING_INIT_CONNECT

* Enum holding the different connection errors that could be reached for
* the backend connections oppened by 'MySQL_Monitor' module.
*/
enum PROXYSQL_MONITOR_ERR {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we have PROXYSQL_MONITOR_ERR and not just extend PROXYSQL_MYSQL_ERR ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it could be handy to have a clear separation between general errors and monitoring errors, easier to remember error code ranges that particular values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes the maintenance more difficult.
Especially in cases in which they overlap.
For example, a ping can timeout no matter if in MySQL Session or on Monitor.
On the same topic, I think I don't see a counter for ping timeout in Monitor

* the backend connections oppened by 'MySQL_Monitor' module.
*/
enum PROXYSQL_MONITOR_ERR {
ER_PROXYSQL_ASSUME_SRV_READ_ONLY = 10000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my other comments, I think 7 enums should go away, and 5 should stay

* Enum holding the different MySQL connection errors that are used to report
* invalid states in the backend connections.
*/
enum PROXYSQL_MYSQL_ERR {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While here, we could also have added error codes 9001 to 9003

* Enum holding the different connection errors that could be reached for
* the backend connections oppened by 'MySQL_Monitor' module.
*/
enum PROXYSQL_MONITOR_ERR {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes the maintenance more difficult.
Especially in cases in which they overlap.
For example, a ping can timeout no matter if in MySQL Session or on Monitor.
On the same topic, I think I don't see a counter for ping timeout in Monitor

@@ -872,10 +874,12 @@ void * monitor_connect_thread(void *arg) {
(strncmp(mmsd->mysql_error_msg,"ProxySQL Error: Access denied for user",strlen("ProxySQL Error: Access denied for user"))==0)
) {
proxy_error("Server %s:%d is returning \"Access denied\" for monitoring user\n", mmsd->hostname, mmsd->port);
MyHGM->p_update_mysql_error_counter(p_mysql_error_type::proxysql, mmsd->hostgroup_id, mmsd->hostname, mmsd->port, ER_ACCESS_DENIED_ERROR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because those are the only messages we want to write in the error log.
I don't see a valid reason to hardcode an error code, even if we know already what is.

}
else if (strncmp(mmsd->mysql_error_msg,"Your password has expired.",strlen("Your password has expired."))==0)
{
proxy_error("Server %s:%d is returning \"Your password has expired.\" for monitoring user\n", mmsd->hostname, mmsd->port);
MyHGM->p_update_mysql_error_counter(p_mysql_error_type::proxysql, mmsd->hostgroup_id, mmsd->hostname, mmsd->port, ER_MUST_CHANGE_PASSWORD_LOGIN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reply

@@ -1138,6 +1145,7 @@ void * monitor_read_only_thread(void *arg) {
free(mmsd->mysql_error_msg);
mmsd->mysql_error_msg = new_error;
proxy_error("Timeout on read_only check for %s:%d after %lldms. Unable to create a connection. If the server is overload, increase mysql-monitor_connect_timeout. Error: %s.\n", mmsd->hostname, mmsd->port, (now-mmsd->t1)/1000, new_error);
MyHGM->p_update_mysql_error_counter(p_mysql_error_type::proxysql, mmsd->hostgroup_id, mmsd->hostname, mmsd->port, ER_PROXYSQL_READ_ONLY_CHECK_CONN_TIMEOUT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using ER_PROXYSQL_READ_ONLY_CHECK_TIMEOUT ?
Too many variables seems an over-engineering, especially because it means creating a double number of time series. A timeout happened, this is all we need to know.
Same comment applies to all the others.

…_MYSQL_ERR' holding all ProxySQL errors for backend connections
…tor and replaced zero error codes with ER_PROXYSQL_OFFLINE_SRV
…ssion_status_flags-t' to avoid replication lag issues in the CI
@renecannao renecannao merged commit 1dd24cc into v2.1.1 Mar 2, 2021
@renecannao renecannao deleted the v2.1.1-3310 branch June 4, 2021 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve error reporting for 'MySQL_Monitor' and 'MySQL_Session' backend connections via counters
2 participants