[Feature] Implement a mechanism to automatically unshun nodes even if there is no traffic hitting that node #971

krzysztof-ksiazek · 2017-03-28T10:28:03Z

[28.03.2017, 11:00:54] René Cannaò: I get your point, but the current implementation shows how the node is seen by HGM
[28.03.2017, 11:01:44] René Cannaò: a possible implementation that may make sense and doesn't create overload is to apply the same "resume from shunned" algorithm when querying the runtime_mysql_servers table

utdrmac · 2017-07-04T17:40:47Z

Based on my testing, I would expect the status to change after X successful ping attempts. When X pings fail, the status changes to SHUNNED so why not revert the status after X successful pings?

renecannao · 2017-07-04T17:56:12Z

@utdrmac : a server not responding to ping is a server that has some failure, while a server responding to ping doesn't mean that is able to process traffic. Just to mention some random examples: a server that constantly generates "table is full" or "server is in read only" or "Unknown command" (Galera) is a server that for the Monitor module is online, while for the HostGroups Manager is a faulty node.
I think it is important to highlight that Monitor and HostGroups Manager are two distinct modules.
HostGroups Manager relies on Monitor only for circumstances where HostGroups Manager cannot understand if there is a network issue and therefore Hostgroups Manager doesn't know if a backend isn't replying because it is still processing requests or because there is a network issue.

renecannao · 2017-07-04T17:57:43Z

That also meaning that the "healing" algorithm used by HostGroups Manager should not depends from the Monitor module.
This is especially true for all the users that have Monitor module disabled.

utdrmac · 2017-07-04T17:59:30Z

That makes sense. It's just "odd" to have a node come back online and the status not reflect that it is online. Most monitoring tools, when a node come back online, the status changes.

As an admin, when the status doesn't change, it makes me, falsely, believe that there is either A) a problem with the node itself, or B) a problem with proxysql not recognizing the node is back.

renecannao · 2017-07-04T18:10:51Z

Most monitoring tools, when a node come back online, the status changes.

I agree, in fact the server is online in monitor tables, right?

I think it is important to understand which status "doesn't change" .
As pointed in #984 , the fact that currently is reported as SHUNNED it just means that the last time it was used was SHUNNED.
That means that HostGroups Manager didn't change the status of the node because no traffic was sent to that node, so it is correct: the last known status for the HostGroups Manager isn't changed.

To make an example excluding the proxy: application connects directly to the DB. If the application get an error after the DB goes down, if the application doesn't retry, the last known status is not online (even if the backend is online).

So if status doesn't change, it also means that no traffic is passing from the proxy, otherwise the status changes again.
The patch suggested in this issue is to avoid that admins get confused in low traffic environments (for example, while building PoC), because on busy systems this confusion shouldn't be present.

alangong114 · 2018-10-30T03:50:04Z

I agree @utdrmac suggest,We have the same problem in production.
At first I thought I had a problem with the database environment,But when I checked my database configuration and proxysql configuration.
We try to set up the mysql_replication_hostgroups, master database from SHUNNED to online status,
So,SHUNNED need traffic to online status, but SHUNNED state can't through the SQL statement.
Actually, this design is very misleading, it is hard to understand.

jurim76 · 2019-03-06T12:29:10Z

I'm using https://github.com/MaxFedotov/proxysql-zabbix/ template with backend status check
If backend has no activity, his status changed to SHUNNED and zabbix send alert on every check.
"Proxysql backend MySQL server 10.0.0.1:3306 is not ONLINE"

It's ok if proxysql node has no activity (as backup node for example)

srikiraju · 2019-08-08T06:58:35Z

Would love for it to get UNSHUNNED when the db is healthy again

gunnicom · 2020-01-07T15:15:39Z

I have seen the same today. A host that is in three hostgroups got "ONLINE" again in two hostgroups, but not the third.
Maybe some logic like (pseudocode):

If(host.statechange (SHUNNED => ONLINE)){
   SELECT host, hostgroup WHERE host=host AND status="SHUNNED"
   checkIfBackOnline(host,hostgroup);
}

renecannao mentioned this issue Apr 11, 2017

Automatic recovery from SHUNNED to ONLINE status not working #984

Open

renecannao self-assigned this Jun 13, 2017

renecannao added this to the v1.4.1 milestone Jun 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Implement a mechanism to automatically unshun nodes even if there is no traffic hitting that node #971

[Feature] Implement a mechanism to automatically unshun nodes even if there is no traffic hitting that node #971

krzysztof-ksiazek commented Mar 28, 2017

utdrmac commented Jul 4, 2017

renecannao commented Jul 4, 2017

renecannao commented Jul 4, 2017

utdrmac commented Jul 4, 2017

renecannao commented Jul 4, 2017

alangong114 commented Oct 30, 2018

jurim76 commented Mar 6, 2019 •

edited

srikiraju commented Aug 8, 2019

gunnicom commented Jan 7, 2020

[Feature] Implement a mechanism to automatically unshun nodes even if there is no traffic hitting that node #971

[Feature] Implement a mechanism to automatically unshun nodes even if there is no traffic hitting that node #971

Comments

krzysztof-ksiazek commented Mar 28, 2017

utdrmac commented Jul 4, 2017

renecannao commented Jul 4, 2017

renecannao commented Jul 4, 2017

utdrmac commented Jul 4, 2017

renecannao commented Jul 4, 2017

alangong114 commented Oct 30, 2018

jurim76 commented Mar 6, 2019 • edited

srikiraju commented Aug 8, 2019

gunnicom commented Jan 7, 2020

jurim76 commented Mar 6, 2019 •

edited