Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Implement a mechanism to automatically unshun nodes even if there is no traffic hitting that node #971

Open
krzysztof-ksiazek opened this issue Mar 28, 2017 · 9 comments
Assignees
Milestone

Comments

@krzysztof-ksiazek
Copy link
Contributor

[28.03.2017, 11:00:54] René Cannaò: I get your point, but the current implementation shows how the node is seen by HGM
[28.03.2017, 11:01:44] René Cannaò: a possible implementation that may make sense and doesn't create overload is to apply the same "resume from shunned" algorithm when querying the runtime_mysql_servers table

@utdrmac
Copy link

utdrmac commented Jul 4, 2017

Based on my testing, I would expect the status to change after X successful ping attempts. When X pings fail, the status changes to SHUNNED so why not revert the status after X successful pings?

@renecannao
Copy link
Contributor

@utdrmac : a server not responding to ping is a server that has some failure, while a server responding to ping doesn't mean that is able to process traffic. Just to mention some random examples: a server that constantly generates "table is full" or "server is in read only" or "Unknown command" (Galera) is a server that for the Monitor module is online, while for the HostGroups Manager is a faulty node.
I think it is important to highlight that Monitor and HostGroups Manager are two distinct modules.
HostGroups Manager relies on Monitor only for circumstances where HostGroups Manager cannot understand if there is a network issue and therefore Hostgroups Manager doesn't know if a backend isn't replying because it is still processing requests or because there is a network issue.

@renecannao
Copy link
Contributor

That also meaning that the "healing" algorithm used by HostGroups Manager should not depends from the Monitor module.
This is especially true for all the users that have Monitor module disabled.

@utdrmac
Copy link

utdrmac commented Jul 4, 2017

That makes sense. It's just "odd" to have a node come back online and the status not reflect that it is online. Most monitoring tools, when a node come back online, the status changes.

As an admin, when the status doesn't change, it makes me, falsely, believe that there is either A) a problem with the node itself, or B) a problem with proxysql not recognizing the node is back.

@renecannao
Copy link
Contributor

Most monitoring tools, when a node come back online, the status changes.

I agree, in fact the server is online in monitor tables, right?

I think it is important to understand which status "doesn't change" .
As pointed in #984 , the fact that currently is reported as SHUNNED it just means that the last time it was used was SHUNNED.
That means that HostGroups Manager didn't change the status of the node because no traffic was sent to that node, so it is correct: the last known status for the HostGroups Manager isn't changed.

To make an example excluding the proxy: application connects directly to the DB. If the application get an error after the DB goes down, if the application doesn't retry, the last known status is not online (even if the backend is online).

So if status doesn't change, it also means that no traffic is passing from the proxy, otherwise the status changes again.
The patch suggested in this issue is to avoid that admins get confused in low traffic environments (for example, while building PoC), because on busy systems this confusion shouldn't be present.

@alangong114
Copy link

I agree @utdrmac suggest,We have the same problem in production.
At first I thought I had a problem with the database environment,But when I checked my database configuration and proxysql configuration.
We try to set up the mysql_replication_hostgroups, master database from SHUNNED to online status,
So,SHUNNED need traffic to online status, but SHUNNED state can't through the SQL statement.
Actually, this design is very misleading, it is hard to understand.

@jurim76
Copy link

jurim76 commented Mar 6, 2019

I'm using https://github.com/MaxFedotov/proxysql-zabbix/ template with backend status check
If backend has no activity, his status changed to SHUNNED and zabbix send alert on every check.
"Proxysql backend MySQL server 10.0.0.1:3306 is not ONLINE"

It's ok if proxysql node has no activity (as backup node for example)

@srikiraju
Copy link

Would love for it to get UNSHUNNED when the db is healthy again

@gunnicom
Copy link

gunnicom commented Jan 7, 2020

I have seen the same today. A host that is in three hostgroups got "ONLINE" again in two hostgroups, but not the third.
Maybe some logic like (pseudocode):

If(host.statechange (SHUNNED => ONLINE)){
   SELECT host, hostgroup WHERE host=host AND status="SHUNNED"
   checkIfBackOnline(host,hostgroup);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants