Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): Enable setting OwnerReference on ResourceOps. Fixes #1779 #4831

Merged

Conversation

elikatsis
Copy link
Member

@elikatsis elikatsis commented Nov 26, 2020

Argo supports a field in the ResourceTemplate that makes the controller
add an owner reference of the workflow to the created resource since
v2.4.0 [1].
With the upgrade of Argo client [2] and deployment [3] we are now able
to exploit it.

We set it to 'false' by default on all ResourceOps (actually, leave it
empty) and 'true' for VolumeOps. This allows the garbage collection of
PVCs upon workflow cleanup without any further change [4].

[1] https://github.com/argoproj/argo/blob/v2.4.0/pkg/apis/workflow/v1alpha1/workflow_types.go#L1044-L1045
[2] #4498
[3] #3537
[4] #1779

Signed-off-by: Ilias Katsakioris elikatsis@arrikto.com

Closes #1779

Checklist:

@elikatsis
Copy link
Member Author

/assign @Ark-kun

@elikatsis
Copy link
Member Author

cc @PatrickXYS
Sorry, forgot to cc you when I created the PR!

Also, @Ark-kun friendly ping

@PatrickXYS
Copy link
Member

Thank @elikatsis it lgtm to me

/cc @Bobgy

To see if we can move forward

@Bobgy Bobgy added this to Needs triage in KFP SDK Triage via automation Dec 15, 2020
@Bobgy
Copy link
Contributor

Bobgy commented Dec 15, 2020

/assign @chensun

@Bobgy
Copy link
Contributor

Bobgy commented Dec 15, 2020

This seems like a breaking change for volumeOp though, are you sure users always expect their volumes GCed after workflow finishes?

@elikatsis
Copy link
Member Author

@Bobgy after giving it some thought, you are right. Let's not change the current behavior, and since this is something easily configurable we can just instruct users to use it.

Thanks!

I'm changing this, rebasing on top of current master, and pushing.

@elikatsis
Copy link
Member Author

@Bobgy I rebased on top of master and resolved conflicts

@Bobgy
Copy link
Contributor

Bobgy commented Jan 14, 2021

/assign @chensun @numerology

@numerology
Copy link

Thanks!

Do you mind adding a testcase or modifying one of the existing cases to cover this?

@elikatsis
Copy link
Member Author

@numerology

Do you mind adding a testcase or modifying one of the existing cases to cover this?

You are right. With the first iteration the VolumeOp-related tests were affected, so we were covered. But the second iteration reverted this and I forgot to extend them.

Thanks!

@elikatsis
Copy link
Member Author

@numerology friendly ping

@numerology
Copy link

/lgtm
/approve

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elikatsis, numerology

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elikatsis, numerology

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@elikatsis
Copy link
Member Author

/retest

@elikatsis
Copy link
Member Author

It seems like the test is failing because of the new pip dependency resolver. This means that some we have a version range for a dependency X which is incompatible with the range of versions another dependency Y requires for X.

This is triggered by

python3 -m pip install . --upgrade --use-feature=2020-resolver --extra-index-url https://pypi-nightly.tensorflow.org/simple

@elikatsis
Copy link
Member Author

/retest

…ow#1779

Argo supports a field in the ResourceTemplate that makes the controller
add an owner reference of the workflow to the created resource since
v2.4.0 [1].
With the upgrade of Argo client [2] and deployment [3] we are now able
to exploit it.

We set it to 'false' by default on all ResourceOps (actually, leave it
empty).

Setting the field to 'true' for VolumeOps allows the garbage collection
of PVCs upon workflow cleanup [4].

[1] https://github.com/argoproj/argo/blob/v2.4.0/pkg/apis/workflow/v1alpha1/workflow_types.go#L1044-L1045
[2] kubeflow#4498
[3] kubeflow#3537
[4] kubeflow#1779

Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>
@elikatsis
Copy link
Member Author

@Bobgy or @numerology could you /lgtm again?

Tests were failing for irrelevant reasons and then there were conflicts :/ I've rebased and everything seems ok

@Bobgy
Copy link
Contributor

Bobgy commented Mar 12, 2021

/lgtm

@elikatsis
Copy link
Member Author

/retest

@google-oss-robot google-oss-robot merged commit a12e88d into kubeflow:master Mar 12, 2021
KFP SDK Triage automation moved this from Needs triage to Closed Mar 12, 2021
@elikatsis elikatsis deleted the feature-sdk-resource-ownerref branch March 12, 2021 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

VolumeOp doesn't support GC after workflow deletion
8 participants