New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Implement a mechanism to automatically unshun nodes even if there is no traffic hitting that node #971
Comments
Based on my testing, I would expect the status to change after X successful ping attempts. When X pings fail, the status changes to SHUNNED so why not revert the status after X successful pings? |
@utdrmac : a server not responding to ping is a server that has some failure, while a server responding to ping doesn't mean that is able to process traffic. Just to mention some random examples: a server that constantly generates "table is full" or "server is in read only" or "Unknown command" (Galera) is a server that for the Monitor module is online, while for the HostGroups Manager is a faulty node. |
That also meaning that the "healing" algorithm used by HostGroups Manager should not depends from the Monitor module. |
That makes sense. It's just "odd" to have a node come back online and the status not reflect that it is online. Most monitoring tools, when a node come back online, the status changes. As an admin, when the status doesn't change, it makes me, falsely, believe that there is either A) a problem with the node itself, or B) a problem with proxysql not recognizing the node is back. |
I agree, in fact the server is online in I think it is important to understand which status "doesn't change" . To make an example excluding the proxy: application connects directly to the DB. If the application get an error after the DB goes down, if the application doesn't retry, the last known status is not online (even if the backend is online). So if status doesn't change, it also means that no traffic is passing from the proxy, otherwise the status changes again. |
I agree @utdrmac suggest,We have the same problem in production. |
I'm using https://github.com/MaxFedotov/proxysql-zabbix/ template with backend status check It's ok if proxysql node has no activity (as backup node for example) |
Would love for it to get UNSHUNNED when the db is healthy again |
I have seen the same today. A host that is in three hostgroups got "ONLINE" again in two hostgroups, but not the third.
|
[28.03.2017, 11:00:54] René Cannaò: I get your point, but the current implementation shows how the node is seen by HGM
[28.03.2017, 11:01:44] René Cannaò: a possible implementation that may make sense and doesn't create overload is to apply the same "resume from shunned" algorithm when querying the runtime_mysql_servers table
The text was updated successfully, but these errors were encountered: