Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I turn on SemiSyncEnforced, should it be automatically configured when the instance is discovered? #1360

Closed
cndoit18 opened this issue May 27, 2021 · 15 comments

Comments

@cndoit18
Copy link
Contributor

When I turn on semi-sync enforced replication and import the database nodes via discovery, it doesn't bother to configure the semi-sync option for me.
This makes me need additional operations when I initialize.

@binwiederhier
Copy link
Contributor

This is actually a good point. Orchestrator also does not react on when SemiSyncEnforced changes through the DetectSemiSync... query. We're just thinking about implementing something that does the toggling, but it'd be nice to have it happen within Orchestrator.

@binwiederhier
Copy link
Contributor

I think this is a dup of #1270 possibly.

@binwiederhier
Copy link
Contributor

@shlomi-noach if you think this is a good idea, I'd be more than happy to implement it.

@shlomi-noach
Copy link
Collaborator

Sorry for the delay.

History: SemiSyncEnforced was developed a few years back by the Vitess authors to handle a failover scenario, such that after failover a new replica is enabled as semi-sync. It was about enforcing an operation after failover, never about enforcing a state.

Later on I began work on enforcing a state. I went as far as analyzing the topology and generating a structure analysis: NotEnoughValidSemiSyncReplicasStructureWarning. See analysis.go and analysis_dao.go.

Let's first discuss how we might want to go about it. Will SemiSyncEnforced imply exactly SemiSyncMasterWaitForReplicaCount number of replicas have to be semi-sync? Is more OK? Which replicas do we pick if there's less?

@binwiederhier
Copy link
Contributor

I just wanted to let you know that I am very much thinking about a solution for this, and that my team has collectively spent hours and hours on a solution outside of orchestrator, so I am sure we can come up with something elegant within it. I didn't have a ton of time today but I should be able to respond tomorrow. I want to make sure that everything I post is concise and doesn't waste your time.

@shlomi-noach
Copy link
Collaborator

Awesome. Please take your time.

@binwiederhier
Copy link
Contributor

Ok so I don't have a solution yet, but I feel like we can start the conversation (I'm happy to chat on Slack/whatever or even to Zoom if that makes it easier for you). My focus is obviously on our own topology, but we can generalize it later. It's just much easier to think and talk about a concrete example.

Example topology

We have dozens (soon hundreds) of 3-host clusters that look like this:

         ---> 2 (prefer, semi-sync replica)
       /
1 (source) 
       \
         ---> 3 (must_not, async replica)

1 is the source, 2 and 3 are replicas. We use 3 exclusively as a backup host, so we don't ever want to fail over to it, even when 1 and 2 are down.

A few scenarios

Requirements:

  • Consistency over availability (a short blocked semi-sync source is fine)
  • We want to have only 1 semi-sync replica active at a time

MySQL settings (in all scenarios):

  • rpl_semi_sync_master_wait_for_slave_count = 1 (wait for 1 semi-sync replica)
  • rpl_semi_sync_master_timeout = 3600000 (1h)

a. Failure of 1 (source)
If 1 dies, Orchestrator should fail over to 2 (prefer) and make 3 a semi-sync replica (despite the must_not, see #1369).

b. Re-adding 1 as a replica
When 1 is re-introduced as a replica of 2 (after 1 failed earlier), Orchestrator should make 2 the semi-sync replica again and disable semi-sync on 3, restoring the original topology/settings.

c. Failure of 2 (current semi-sync replica)
If 2 dies (the current semi-sync replica), all transactions on 1 will block "forever" (= 1h). Orchestrator should downtime 2 and enable semi-sync on 3.

d. Recovery of 2
When 2 is re-added or recovers, it remains an async replica until its downtime is over. When the downtime is over, Orchestrator disables semi-sync on 3 and enables semi-sync on 2.

(... There are more, but I'll leave it at that for now ...)

Current implementation of DetectSemiSyncEnforcedQuery & SemiSyncEnforced

To my undestanding, DetectSemiSyncEnforcedQuery is only evaluated currently at the end of a failover (during the postponed functions) when the replicas are re-attached/changed (rpl_semi_sync_slave_enabled = ..; stop slave; change master; start slave).

It is never proactively enforced in other topology changes (such as a replica failing). For instance, if a replica dies (like in scenario c above), semi-sync is not changed.

Possible implementations

Let me spitball a little -- I don't know if this covers it all!

We introduce EnforceSemiSyncReplicaCount = true which triggers the new logic. If this is false it behaves like today.

We introduce one background process per master (and probably also per intermediate master, not sure here though) that monitors the semi-sync state (Rpl_...) and compares it to the desired state. If it differs, it does the EnableSemiSync() magic accordingly.

The desired state could be defined through a priority returned in the second column of the DetectSemiSyncEnforcedQuery query, e.g. DetectSemiSyncEnforcedQuery = SELECT enforced, priority FROM ....

Assuming the above topology (1 is master, 2 + 3 are replicas), the queries could return this:

  • On 1 (master): enforced = 1, priority = 10 --> Priority doesn't matter since it's a master
  • On 2 (desired semi-sync replica): enforced = 1, priority = 10
  • On 3 (desired async replica): enforced = 1, priority = 20 (lower priority)

Because rpl_semi_sync_master_wait_for_slave_count = 1 and EnforceSemiSyncReplicaCount = true, we know that we only want one semi-sync replica, and 2's priority is "higher" than 3's, so we only enable it on 2, not 3.

Scenarios:
Let's apply this solution to some scenarios:

c. Failure of 2 (semi-sync replica):
The background process will not include 2 in the list anymore and 3 will become a semi-sync replica.

a. Failure of 1 (source):
I am not 100% sure on this one. Best guess: When the failover process starts, the background process is paused (if (failover ongoing) continue). Instead, the desired state is enforced once at the end of the failover in a postponed function.

What do you think?

@binwiederhier
Copy link
Contributor

binwiederhier commented Jun 18, 2021

Okay after looking at the code for a little bit I think that the "background process" thing is not how Orchestrator currently works, and it seems unnecessary too. It looks like I could "simply" check for "too many" or "too few" semi sync replicas in GetReplicationAnalysis and then react to it in CheckAndRecover (aka executeCheckAndRecoverFunction, aka getCheckAndRecoverFunction).

I believe the "too few semi sync replicas" analysis already exists as LockedSemiSyncMaster. So we could "simply" behave differently if EnforceSemiSyncReplicaCount is set.

I will give this a go I think. Possibly this weekend, though very likely early next week. Any input appreciated.

@binwiederhier
Copy link
Contributor

Here is a WIP: #1373

@shlomi-noach
Copy link
Collaborator

Still thinking this over. Just one quick note:

"too few semi sync replicas" analysis already exists as LockedSemiSyncMaster

Not exactly. The primary can be unlocked and still have too few semi-sync replicas. The primary is only locked if the semi sync wait timeout is long, otherwise it eventually reverts to async replication and becomes unlocked.

@binwiederhier
Copy link
Contributor

Still thinking this over.

Take your time. I will hold off on any work for this issue (as well as #1369, since it's so closely related) for now until you had time to think on this. I also joined the Vitess Slack if you want to talk less asynchronously.

Not exactly. The primary can be unlocked and still have too few semi-sync replicas. The primary is only locked if the semi sync wait timeout is long, otherwise it eventually reverts to async replication and becomes unlocked.

Ah yes of course. We've actually been bitten pretty bad by this fallback, so I'm not sure why I didn't think about that. Good catch.

Later on I began work on enforcing a state. I went as far as analyzing the topology and generating a structure analysis: NotEnoughValidSemiSyncReplicasStructureWarning. See analysis.go and analysis_dao.go. (from: #1360 (comment))

I am not quite sure how I didn't see this part of your comment before, and that NotEnoughValidSemiSyncReplicasStructureWarning existed. I'll look at the code some while you think more.

@binwiederhier
Copy link
Contributor

I know it's only been a couple of days, but did you have some time to think about this?

Just wanted to mention that I am more than happy to implement the solution that we come up with. However, I'm sure you understand that I don't really want to put any a ton of work in unless I can be somewhat sure that the code will end up getting merged in. That's why to PR is only a proof-of-concept, and not really doing everything the right way. I don't want to be stuck maintaining a fork at my company. That's a recipe for disaster :-D

At my company, we currently have automatic failovers turned off entirely on all our clusters because we "lost" a ton of transactions during a catastrophic failover in which semi-sync had failed, so we're really eager to get something within or outside of Orchestrator going soon. I realize that you are doing this in your spare time and I'd like to thank you again for your fantastic work on this project. It's really awesome!!

@shlomi-noach
Copy link
Collaborator

Hi @binwiederhier and thank you again. Sorry for the delays as I jump in and out of context between various tasks. I too appreciate your time. It looks like you're on a good path and I'll be happy to incorporate such changes into orchestrator.

Okay after looking at the code for a little bit I think that the "background process" thing is not how Orchestrator currently works,

Correct

It looks like I could "simply" check for "too many" or "too few" semi sync replicas in GetReplicationAnalysis and then react to it in CheckAndRecover

Again, correct. Le'ts use this path. Basically orchestrator already collectes all necessary information and persists it in database_instance. Most of orchestrator work is stateless. One part writes down probes metrics to the backend database, another part periodically checks those metrics.

Let's completely ignore the existing SemiSyncEnforced logic please. The SemiSyncEnforced mechanism does not follow orchestrator's design. Assume it does not exist and build a new one; we will later think how to handle the duality, or better yet, reimplement SemiSyncEnforced based on your solution.

More thoughts/comments on the PR.

@binwiederhier
Copy link
Contributor

This is great. Thank you for responding. I will take a few days this week to work on it. I'm actually looking forward to it. :-)

@cndoit18
Copy link
Contributor Author

cndoit18 commented Aug 4, 2021

Can I assume the issue has been resolved and close it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants