New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: picking the writable host based on `read_only` value, failover scenario #789

Closed
shlomi-noach opened this Issue Nov 9, 2016 · 2 comments

Comments

Projects
None yet
2 participants
@shlomi-noach

shlomi-noach commented Nov 9, 2016

I'd like to open a discussion about the heuristic by which ProxySQL picks the writable host out of the hostgroup.

As I understand, right now ProxySQL expects all hosts but one to be read_only (ie @@read_only=1), and one host to be writable (ie @@read_only=0).

In a master failover scenario we would either manage to set read_only=1 on the failed master, or manage to turn MySQL off, or nuke the machine, and then promote a replica and set its read_only to 0.
We go as far as having read_only=1 in my.cnf for all our servers, including masters; so any server that restarts is bound to be read_only.

However there's one scenario that will not play nice with the above: network partitioning. Suppose the master gets network partitioned from the world. To the world, it seems dead. We promote a new master and set read_only=0. We are unable to modify settings on the "dead" master because it is unreachable. A few seconds later the "dead" master is recovered. It knows nothing about what happened. It still has read_only=0. We now have two writable servers.

Is this scenario considered by ProxySQL?

A failover mechanism such as orchestrator would be able to tell 'ProxySQL that the "dead" master should be ignored henceforth. Can ProxySQL accept such an instruction? What would be that instruction?
If this can be done, then at least wit horchestrator the problem seems solved. orchestrator would notify ProxySQL to ignore/remove/forget the dead master before promoting the new master, which leads to mostly no-split-brain scenario.

Worth noting that the "dead" master can come back to life even as promotion takes place, or even before: just as decision to promote a new server takes place.

Otherwise, without orchestrator, is there anything existing in ProxySQL to tackle such case?

@renecannao

This comment has been minimized.

Show comment
Hide comment
@renecannao

renecannao Nov 9, 2016

Contributor

In ProxySQL there is no concept of what is a write and what is read: you need to define that in mysql_query_rules, therefore there you define to which hostgroup a type of query is routed.
If a destination hostgroup is not defined in mysql_query_rules, the default for that specific user applies based on mysql_users.default_hostgroup .

A bit over a year ago, I wrote how it is possible to perform a very quick switchover in less than one second .
In this example, there is absolutely no reference to read_only. What I did in that example using mysqlrpladmin (or mysqlfailover in case of failover) can be done with Orchestrator (surely better than mysqlfailover).

At Percona Live Amsterdam Europe 2015 I described this failover mechanism in which a manager (like Orchestrator) would notify all the proxies that a failover is happening, and also when the failover completed.
Someone objected to this approach: what if the network partition is affecting the proxies? In other words, if some of the proxies are not reachable, the manager won't be able to notify them, and when they came back online they will have an outdated vision of the world.

Following this valid concern, it was suggested to use Consul, and I still believe it is the best way to propagate the information:
Orchestrator notifies Consul that notifies all the proxies, that will converge sooner or later.

Some other mechanism can be introduced to improve fencing, any sort of STOMITH to stop the old master. In EC2 this is relatively easy: you the AWS API to stop the instance.
Or a local process in the old master can detect it is left alone, and can set it to read_only=1, or enable firewall on port 3306, kill mysqld, etc etc.

Note that with all I wrote so far, proxysql does not check read_only: it only forwards (or tries to) the traffic to the configured hosts based on defined routing.

From where it comes the read_only check and mysql_replication_hostgroups?
Many (really many) small and medium shops do not have any sort of replication manager. Surprisingly, so many shops are still running circular replication!
Therefore the read_only check and mysql_replication_hostgroups allows the adoption of ProxySQL and automatic failover without the need to have Orchestrator, or without the need for Orchestrator to know the list of all the available proxies at any time.

Orchestrator notifying ProxySQL or ProxySQL automatically detecting the status of the backends: both solutions have pros and cons.

Contributor

renecannao commented Nov 9, 2016

In ProxySQL there is no concept of what is a write and what is read: you need to define that in mysql_query_rules, therefore there you define to which hostgroup a type of query is routed.
If a destination hostgroup is not defined in mysql_query_rules, the default for that specific user applies based on mysql_users.default_hostgroup .

A bit over a year ago, I wrote how it is possible to perform a very quick switchover in less than one second .
In this example, there is absolutely no reference to read_only. What I did in that example using mysqlrpladmin (or mysqlfailover in case of failover) can be done with Orchestrator (surely better than mysqlfailover).

At Percona Live Amsterdam Europe 2015 I described this failover mechanism in which a manager (like Orchestrator) would notify all the proxies that a failover is happening, and also when the failover completed.
Someone objected to this approach: what if the network partition is affecting the proxies? In other words, if some of the proxies are not reachable, the manager won't be able to notify them, and when they came back online they will have an outdated vision of the world.

Following this valid concern, it was suggested to use Consul, and I still believe it is the best way to propagate the information:
Orchestrator notifies Consul that notifies all the proxies, that will converge sooner or later.

Some other mechanism can be introduced to improve fencing, any sort of STOMITH to stop the old master. In EC2 this is relatively easy: you the AWS API to stop the instance.
Or a local process in the old master can detect it is left alone, and can set it to read_only=1, or enable firewall on port 3306, kill mysqld, etc etc.

Note that with all I wrote so far, proxysql does not check read_only: it only forwards (or tries to) the traffic to the configured hosts based on defined routing.

From where it comes the read_only check and mysql_replication_hostgroups?
Many (really many) small and medium shops do not have any sort of replication manager. Surprisingly, so many shops are still running circular replication!
Therefore the read_only check and mysql_replication_hostgroups allows the adoption of ProxySQL and automatic failover without the need to have Orchestrator, or without the need for Orchestrator to know the list of all the available proxies at any time.

Orchestrator notifying ProxySQL or ProxySQL automatically detecting the status of the backends: both solutions have pros and cons.

@shlomi-noach

This comment has been minimized.

Show comment
Hide comment
@shlomi-noach

shlomi-noach Nov 9, 2016

Ahh, thank you for clarifying this!

shlomi-noach commented Nov 9, 2016

Ahh, thank you for clarifying this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment