Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meaning of ERROR Mismatching entries #25

Closed
mysqldesu opened this issue Dec 29, 2016 · 7 comments
Closed

Meaning of ERROR Mismatching entries #25

mysqldesu opened this issue Dec 29, 2016 · 7 comments
Labels

Comments

@mysqldesu
Copy link

mysqldesu commented Dec 29, 2016

Hello,

I have a mysql database backend with many servers that all have the same schema but orchestrator usually fails when I try relocating slaves. I always get this error about mismatching entries. It isn't clear to me what it means. The data (as far as I can tell) and tables are the same on the master and all the slaves and I don't understand what causes this. Can you provide any additional information on what might cause this "Mismatching entries"?

2016-12-06 22:12:12 ERROR Mismatching entries, aborting: table_id: ### (mydb.priorities) <-> table_id: ### (mydb.queue)
2016-12-06 22:12:12 INFO Started slave on prddba100:3306
2016-12-06 22:12:13 ERROR Unexpected: 0 events processed while iterating logs. Something went wrong; aborting. nextBinlogCoordinatesToMatch: <nil>
2016-12-06 22:12:13 DEBUG auditType:end-maintenance instance:prddba100:3306 cluster:prddba101:3306 message:maintenanceToken: 1384
2016-12-06 22:12:13 FATAL Unexpected: 0 events processed while iterating logs. Something went wrong; aborting. nextBinlogCoordinatesToMatch: <nil>
@shlomi-noach
Copy link
Collaborator

You are using Pseudo-GTID to move replicas around. The logic behind this is to align binlog coordinates by inspecting binlog contents. orchestrator looks for matching Pseudo-GTID entries in binlog coordinates of both involved servers, and then proceeds to iterate the binary logs, event by event.

It expects the binary logs to be identical in content. However in your case it found a mismatch. The binary logs on one server had an event not found in the other server (or, more likely, not in same order).

This might happen if you directly write onto your replicas. Or if you have active-active master-master. Do you have such a case?

@mysqldesu
Copy link
Author

Thank you for the useful information. Yes, I am using Pseudo-GTID to move replicas around. Oracle GTID is not enabled. Previously, we did master-master and the application could feasibly write to both masters. However, we have stopped doing that as part of using Orchestrator and now only have master-slave but I still get this error even though all the slaves are read-only. We have many other MySQL severs that work great with PGTID and Orchestrator. I don't have this problem with any of the other stacks so it appears to be isolated to this particular application. In order to work around this error, I've had to write a script to try and simulate what Orchestrator is doing by capturing the binlog coordinates and executing a CHANGE MASTER command. However, I'd really like to figure what is causing this to happen because Orchestrator does a better job of relocating slaves.

@shlomi-noach
Copy link
Collaborator

Is there perhaps a local cronjob, or a event_scheduler that has SUPER privileges to override the read_only configuration?

Are you using ROW based replication everywhere? Perhaps you have ROW on one server and MIXED on another?

@theTibi
Copy link

theTibi commented Jan 11, 2017

Hi,

I had the same issue, and I am a bit confused now. When we are using Pseudi-GTID , the event which inserts the unique entries to the binary log has to run on all the servers or just only the master. If it has to run only on the master and when it goes down some job has to start this event on the new master again?

Thanks.

@shlomi-noach
Copy link
Collaborator

@theTibi the pseudo-GTID insertion must only run on the master. If it runs on multiple nodes, then nothing makes sense to orchestrator, because each node generates its own unique entries.

This agrees with your second observation. When you fail over, something needs to stop writing to the old master and start writing to the new master.

I've done this in various different ways:

  • an event_scheduler, which is always implicitly disabled on replicas ; need to enable on failover
  • a pt-heartbeat running on the master: need to service stop on old master, service start on new master
  • a pt-hearteat or similar, running from completely elsewhere, using the master's VIP; when the VIP changes, the heartbeat writes implicitly go to the new master, nothing to change in the heartbeat mechanism.

@theTibi
Copy link

theTibi commented Jan 12, 2017

Thank you @shlomi-noach to made this clear for me.

@mysqldesu
Copy link
Author

Thank you @shlomi-noach for the information. I didn't find any cronjobs running locally. The event_scheduler is on for the replicas but the only event I have is for P-GTID. There are a number of non-application users that have SUPER so it is possible that something is writing to the slave. I will continue to investigate. We are using ROW based replication everywhere so that doesn't seem to be a problem either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants