New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cant initiate failover of applicationSet based apps after hub recovery #861
Fix cant initiate failover of applicationSet based apps after hub recovery #861
Conversation
|
/hold |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: GowthamShanmugam The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
3fb47ac
to
3ce32ef
Compare
|
/hold cancel |
| // While failing over failoverCluster cluster is the target cluster | ||
| // Find deployment cluster using dr cluster list | ||
| } else if (currStatus === DRPC_STATUS.FailingOver) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits
| // While failing over failoverCluster cluster is the target cluster | |
| // Find deployment cluster using dr cluster list | |
| } else if (currStatus === DRPC_STATUS.FailingOver) { | |
| } else if (currStatus === DRPC_STATUS.FailingOver) { | |
| // While failing over failoverCluster cluster is the target cluster | |
| // Find deployment cluster using dr cluster list |
| const cluster = findCluster( | ||
| drClusters, | ||
| drpc?.spec?.preferredCluster, | ||
| false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is confusing:
- "matchCluster" argument for the "findCluster" function is false by default, so no need to pass "false" again.
- second argument for the "findCluster" function is named as "deploymentClusterName", but in reality u want to figure out the deployment cluster, parameter that u are passing is target cluster naming convention is wrong/confusing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, the name is wrong.
| const cluster = findCluster( | ||
| drClusters, | ||
| drpc?.spec?.failoverCluster, | ||
| false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same 2 points as above...
| @@ -488,3 +489,36 @@ export const filterPVCDataUsingAppsets = ( | |||
|
|
|||
| export const filterDRAlerts = (alert: Alert) => | |||
| alert?.annotations?.alert_type === 'DisasterRecovery'; | |||
|
|
|||
| export const findDeploymentClusterName = ( | |||
| plsDecision: ACMPlacementDecisionKind, | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no issue with current approach, but if "drpc" is the best way to determine the deployment cluster, do we even need to have check for "plsDecision" ?? Can't we always check based upon "drpc" only ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- PlacementDecision is our source of truth to find out the deployment cluster.
- Finding the deployment cluster name from DRPC won't work always, User may accidently change the "Prefered cluster / Failover cluster" on the DRPC spec without changing the spec "Action".
- Also, if we find from DRPC, these are the following challenges:
- If the current DRPC status is Deployed, then the deployment cluster is a preferred cluster spec.
- If the current DRPC status is Initiating, then:
If the Action is Failover, then the deploy deployment cluster is the opposite cluster from the failover cluster.
If the Action is Relocate, then the deploy deployment cluster is the opposite cluster from the preferred cluster. - If the current DRPC status is FailingOver, then the deployment cluster is the opposite cluster from the failover cluster.
- If the current DRPC status is Relocating, then the deployment cluster is the opposite cluster from the preferred cluster.
- If the current DRPC status is FailedOver, then the deployment cluster is the preferred cluster.
- If the current DRPC status is Relocated, then the deployment cluster is the preferred cluster.
- If any new state comes then we need to handle that also, If the user changes the preferred cluster/failover cluster/action spec then it's a problem.
- Instead of checking these many conditions, I decided to check placementRule. Which always gives me the correct deployment cluster name.
- But one corner case is happening here, ramen is removing the cluster names from placementRule, I decided to check DRPC only for this corner case, (i.e. Relocating/Failing Over). I am assuming that, while Relocating/FailingOver users won't change the DRPC spec.
- Another alternate solution is to ask the user to input the cluster name from UI, But if we introduce bulk failover and relocate in the future this will be a problem.
- Again it is not a solid solution, We need to discuss with the DR about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix will at least save us from raising blocker BZ-like failover/relocation is blocked (Or) like UI is not displaying target cluster name on the UI.
| @@ -53,7 +53,7 @@ export const FailoverRelocateModal: React.FC<FailoverRelocateModalProps> = ( | |||
| ModalFooterStatus.INITIAL | |||
| ); | |||
| const [placement, setPlacement] = React.useState<PlacementProps>({}); | |||
| const [canInitiate, setCanInitiate] = React.useState(false); | |||
| const [canInitiate, setCanInitiate] = React.useState<Boolean>(undefined); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: if we are defining the type to be Boolean, then shouldn't; we initialise it with false ??
a594bde
to
1f0ea80
Compare
|
/test images |
…overy BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2209288 Signed-off-by: Gowtham Shanmugasundaram <gshanmug@redhat.com>
1f0ea80
to
e68ed2d
Compare
|
/lgtm |
|
/test odf-console-e2e-aws |
1 similar comment
|
/test odf-console-e2e-aws |
11f4070
into
red-hat-storage:master
|
/cherry-pick release-4.13 |
|
/cherry-pick release-4.13-compatibility |
|
@SanjalKatiyar: #861 failed to apply on top of branch "release-4.13": In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@SanjalKatiyar: #861 failed to apply on top of branch "release-4.13-compatibility": In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2209288
Fix explanation:
Other errors observed from QE setup once after fixing the above one:
2. Failover/Relocation readiness text in UI should not rely only on DRPC peer-ready conditions and available conditions. It is common for all errors, So if any error occurs the readiness should display "Not Ready".
3. While Failing/Relocating ramen is removing the cluster name from the placement rule, and adding it back once the failover/relocation operation is completed. In between if the cluster went down, and if a user wants to trigger Failover, Then UI is unable to find out the target cluster and deployment cluster.
The fix is, Find the deployment cluster name from DRPC, which is only possible at the time of failover and relocation operation is ongoing.