-
Notifications
You must be signed in to change notification settings - Fork 926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
orchestrator api graceful-master-takeover does not always return a code #949
Comments
Interesting timing, seeing that just today I proposed #947 and #948 which address some of these problems. The SQL error is an unfortunate bug fixed in #931 but also in #948 Let me look into formalizing the output for that ; though I'd suggest using the API for most automated tasks. Also, what's a "looooong pause"? A number would be great. Please see if #948 reduces that number. |
You're working faster than we can open issues :P I'll run again the tests next week (starting 12 of August). I'll add some numbers for the looong pause too. |
Sheer coincidence, I assure you 😄 |
Sorry for the delay, had to learn how to compile orchestrator.
Perfect! The SQL syntax errors are gone!
I should have mention the problem with the api not always returning a code. It generate this error : |
@Honiix thank you for looking into; in this case, I'm interested in what the JSON does provide. Since your output only permits If you could possibly repeat the two short tests and paste the |
That's the thing. The message "GracefulMasterTakeover: indicated designated instance mysql-c1-t1:33005 must be directly replicating from the master mysql-c1-t1:33005" is a string, not a json. In case of master no longer having the binlog, the message is also a string: |
@Honiix ohhhh! OK cool, thanks for this info; I'll look into it. |
Fixed in #1166 |
As concrete #1166 example to this issue: $ orchestrator-client -c api -path graceful-master-takeover-auto/ci/127.0.0.1/10114 | jq .
{
"Code": "ERROR",
"Message": "GracefulMasterTakeover: Recovery attempted yet no replica promoted; err=RecoverDeadMaster: failed 127.0.0.1:10114 promotion; PreventCrossRegionMasterFailover: will not promote server in rgn-west when failed server in rgn-east",
"Details": null
} $ orchestrator-client -c api -path graceful-master-takeover-auto/ci/127.0.0.1/10112 | jq .
{
"Code": "ERROR",
"Message": "GracefulMasterTakeover: indicated designated instance 127.0.0.1:10112 must be directly replicating from the master 127.0.0.1:10111",
"Details": null
} $ orchestrator-client -c api -path graceful-master-takeover-auto/ci/127.0.0.1/10113 | jq .
{
"Code": "OK",
"Message": "graceful-master-takeover: successor promoted",
"Details": {
"Id": 192,
"UID": "1589784607961527000:b4a441fbf7f276fc841c0b01d9f1985c0bfe0c754605f4d4bfdf33bde5b5ada7",
"AnalysisEntry": {
"AnalyzedInstanceKey": {
"Hostname": "127.0.0.1",
"Port": 10111
},
"AnalyzedInstanceMasterKey": {
"Hostname": "",
"Port": 0
},
"ClusterDetails": {
"ClusterName": "127.0.0.1:10111",
"ClusterAlias": "ci",
"ClusterDomain": "",
"CountInstances": 4,
"HeuristicLag": 0,
"HasAutomatedMasterRecovery": true,
"HasAutomatedIntermediateMasterRecovery": true
},
"AnalyzedInstanceDataCenter": "dc-east-1",
"AnalyzedInstanceRegion": "rgn-east",
"AnalyzedInstancePhysicalEnvironment": "prod",
"IsMaster": true,
"IsCoMaster": false,
"LastCheckValid": true,
"LastCheckPartialSuccess": true,
"CountReplicas": 1,
"CountValidReplicas": 1,
"CountValidReplicatingReplicas": 1,
"CountReplicasFailingToConnectToMaster": 0,
"CountDowntimedReplicas": 0,
"ReplicationDepth": 0,
"SlaveHosts": [
{
"Hostname": "127.0.0.1",
"Port": 10113
}
],
"IsFailingToConnectToMaster": false,
"Analysis": "DeadMaster",
"Description": "",
"StructureAnalysis": null,
"IsDowntimed": false,
"IsReplicasDowntimed": false,
"DowntimeEndTimestamp": "",
"DowntimeRemainingSeconds": 0,
"IsBinlogServer": false,
"PseudoGTIDImmediateTopology": false,
"OracleGTIDImmediateTopology": true,
"MariaDBGTIDImmediateTopology": false,
"BinlogServerImmediateTopology": false,
"CountLoggingReplicas": 1,
"CountStatementBasedLoggingReplicas": 0,
"CountMixedBasedLoggingReplicas": 0,
"CountRowBasedLoggingReplicas": 1,
"CountDistinctMajorVersionsLoggingReplicas": 1,
"CountDelayedReplicas": 0,
"CountLaggingReplicas": 0,
"IsActionableRecovery": true,
"ProcessingNodeHostname": "shlomi-mbp",
"ProcessingNodeToken": "302337ab8c792a5e8fc71c3919fa35c7da04e275c9b915b2457a6ce0b00b354c",
"CountAdditionalAgreeingNodes": 0,
"StartActivePeriod": "",
"SkippableDueToDowntime": false,
"GTIDMode": "ON",
"MinReplicaGTIDMode": "ON",
"MaxReplicaGTIDMode": "ON",
"MaxReplicaGTIDErrant": "",
"CommandHint": "graceful-master-takeover",
"IsReadOnly": false
},
"SuccessorKey": {
"Hostname": "127.0.0.1",
"Port": 10113
},
"SuccessorAlias": "",
"IsActive": false,
"IsSuccessful": true,
"LostReplicas": [],
"ParticipatingInstanceKeys": [],
"AllErrors": [],
"RecoveryStartTimestamp": "",
"RecoveryEndTimestamp": "",
"ProcessingNodeHostname": "",
"ProcessingNodeToken": "",
"Acknowledged": false,
"AcknowledgedAt": "",
"AcknowledgedBy": "",
"AcknowledgedComment": "",
"LastDetectionId": 0,
"RelatedRecoveryId": 0,
"Type": "MasterRecovery",
"RecoveryType": "MasterRecoveryGTID"
}
} |
I'm trying to catch the return code of graceful-master-takeover command. Either a success, a failure or a refusal (If the destination is already the master per ex.).
Here's what I got so far (orchestrator 3.1.0):
I was hopping orchestrator-client will always return a code. It's not the case.
The text was updated successfully, but these errors were encountered: