Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for existing Argo Workflows #955

Merged

Conversation

VaishnaviHire
Copy link
Member

@VaishnaviHire VaishnaviHire commented Apr 4, 2024

Description

Jira Issue: https://issues.redhat.com/browse/RHOAIENG-4570
This PR introduces changes to handle existing Argo workflows for new install and upgrade. Following is the expected workflow

For version n

1. New Install
    - Argo Workflow exists
        - All components in n deployed successfully except DSP
        - New Dashboard(n) shows warning banner

2. Upgrade from n-1 --> n
   - Argo workflow exists and DSP is enabled
      - Do not upgrade any component. All components in (n-1)
      - Dashboard will not show any warning
      - DSC(Operator) will trigger events stating remove Argo or disable DSP to upgrade to new version
      - Note, (n-1) manifests are no longer managed here

How Has This Been Tested?

New Install

  1. Deploy Workflow CRD .
  2. Deploy operator using catalogsource: quay.io/vhire/opendatahub-operator-catalog:v2.10.1
  3. Create DSCI and DSC with DataSciecnePipelines enabled
  4. Verify all components except DSP are updated
  5. Verify conditiontype CapabilityDSPv2Argo is triggered .
  6. Verify DSP status is False

Upgrade

  1. Install an ODH version with v1 dsp pipelines(e.g uay.io/vhire/opendatahub-operator-catalog:v2.7.0-711)
  2. Deploy Workflow CRD
  3. Create DSCi and DSC with DataSciecnePipelines enabled
  4. Replace catalog source image with new version quay.io/vhire/opendatahub-operator-catalog:v2.10.1
  5. Verify no components are upgraded to new version
  6. Verify conditiontype CapabilityDSPv2Argo is triggered .
  7. Verify conditiontype Upgradeable is set to False

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link
Member

@zdtsw zdtsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nitty gritty but i am fine with the change.

one more thing, what will happen from Dashboard side.
if we keep throwing error of "CRD already there, go remove it" will user still be able to see the feature of enable DSPV2 from dashboard?

components/datasciencepipelines/datasciencepipelines.go Outdated Show resolved Hide resolved
components/datasciencepipelines/datasciencepipelines.go Outdated Show resolved Hide resolved
components/datasciencepipelines/datasciencepipelines.go Outdated Show resolved Hide resolved
@zdtsw
Copy link
Member

zdtsw commented Apr 5, 2024

/test opendatahub-operator-e2e

@VaishnaviHire
Copy link
Member Author

some nitty gritty but i am fine with the change.

one more thing, what will happen from Dashboard side. if we keep throwing error of "CRD already there, go remove it" will user still be able to see the feature of enable DSPV2 from dashboard?

We still will have v1 manifests deployed. @lucferbux Do you know what will be the dashboard behavior in this case?

@kywalker-rh
Copy link

Can you provide a screenshot of where/when this will be shown? Also, it would be good to provide the user with a link to further documentation about why exactly they need to remove the CRD/operator.

@zdtsw
Copy link
Member

zdtsw commented Apr 5, 2024

/onhold
just to get more review comments, feel free to remove label @VaishnaviHire

@VaishnaviHire
Copy link
Member Author

VaishnaviHire commented Apr 5, 2024

Can you provide a screenshot of where/when this will be shown? Also, it would be good to provide the user with a link to further documentation about why exactly they need to remove the CRD/operator.

This is shown under Details for DataScienceCluster resource. I have attached a screen grab
CheckArgoWorkflow.mov.zip

To resolve this error users will need to delete the Workflow crd by running -

oc delete crd workflows.argoproj.io

@kywalker-rh
Copy link

Thanks for sharing that @VaishnaviHire, I'm worried that's pretty buried though. Will an admin typically check those statuses after starting an upgrade?

With the status set as Ready I wouldn't expect to need to dig into it further to find out something failed. I want to prevent users being frustrated and confused as to why Pipelines won't work. Would it make sense to add a Banner for admins to see in the dashboard?

@zdtsw
Copy link
Member

zdtsw commented Apr 6, 2024

Thanks for sharing that @VaishnaviHire, I'm worried that's pretty buried though. Will an admin typically check those statuses after starting an upgrade?

With the status set as Ready I wouldn't expect to need to dig into it further to find out something failed. I want to prevent users being frustrated and confused as to why Pipelines won't work. Would it make sense to add a Banner for admins to see in the dashboard?

should we move this discussion into original JIRA to get more eyes on?
changes on Dashboard side, will need UI team's work

@@ -281,12 +296,15 @@ func (r *DataScienceClusterReconciler) reconcileSubComponent(ctx context.Context
instance = r.reportError(err, instance, "failed to reconcile "+componentName+" on DataScienceCluster")
instance, _ = status.UpdateWithRetry(ctx, r.Client, instance, func(saved *dsc.DataScienceCluster) {
if enabled {
status.SetComponentCondition(&saved.Status.Conditions, componentName, status.ReconcileFailed, fmt.Sprintf("Component reconciliation failed: %v", err), corev1.ConditionFalse)
if componentName == datasciencepipelines.ComponentName && strings.Contains(err.Error(), datasciencepipelines.ArgoWorkflowCRD) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably no need check componentname here,

Suggested change
if componentName == datasciencepipelines.ComponentName && strings.Contains(err.Error(), datasciencepipelines.ArgoWorkflowCRD) {
if strings.Contains(err.Error(), datasciencepipelines.ArgoWorkflowCRD) {

@zdtsw zdtsw removed the approved label Apr 12, 2024
@Gkrumbach07
Copy link
Member

This condition looks good from my side where we are testing against:

condition.type === 'CapabilityDSPv2Argo' && condition.status === 'False',

@zdtsw
Copy link
Member

zdtsw commented Apr 12, 2024

tested again with latest code in the negative case
Screenshot from 2024-04-12 16-55-48

@openshift-ci openshift-ci bot added the lgtm label Apr 12, 2024
Copy link

openshift-ci bot commented Apr 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: asanzgom, zdtsw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the lgtm label Apr 12, 2024
Copy link

openshift-ci bot commented Apr 12, 2024

New changes are detected. LGTM label has been removed.

@VaishnaviHire
Copy link
Member Author

@etirelli Added the error phase
Screenshot 2024-04-12 at 2 43 32 PM

Copy link

openshift-ci bot commented Apr 12, 2024

New changes are detected. LGTM label has been removed.

@VaishnaviHire VaishnaviHire merged commit 7389fcb into opendatahub-io:incubation Apr 12, 2024
6 of 8 checks passed
VaishnaviHire added a commit to VaishnaviHire/opendatahub-operator that referenced this pull request Apr 12, 2024
* Add check for existing Argo Workflows

* Move resource cleanup before DSC creation

* Add separate workflow for upgrades and installs

* Remove conditions that are not required

* Remove check for DSP component

* Add more specific error comparison

* Fix linter

* Add Error phase

* Fix linter

(cherry picked from commit 7389fcb)
DeleteFunc: func(e event.DeleteEvent) bool {
if e.Object.GetName() == datasciencepipelines.ArgoWorkflowCRD {
labels := e.Object.GetLabels()
if _, ok := labels["app.opendatahub.io/"+datasciencepipelines.ComponentName]; !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use labels.ODH.Component(datasciencepipelines.ComponentName)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

7 participants