Skip to content
This repository has been archived by the owner on Sep 19, 2022. It is now read-only.

feat: Support running although it is uesless #194

Merged
merged 1 commit into from Aug 6, 2019

Conversation

gaocegege
Copy link
Member

We are using common.CleanPodPolicy now in pytorchjob, and if users set the policy to running, we will delete all pods and services. I think the behaviour is not consistent with tfjob. In pytorchjob, maybe we could support running, which does not delete anything just like None, to be user-friendly.

WDYT @johnugeorge

One of our users does not know we do not support running, and use it. Then he found that the pods are deleted although they are not running

Signed-off-by: Ce Gao gaoce@caicloud.io

@coveralls
Copy link

coveralls commented Jul 31, 2019

Coverage Status

Coverage remained the same at 85.281% when pulling 520a360 on gaocegege:running into e98eb3c on kubeflow:master.

@gaocegege
Copy link
Member Author

/cc @johnugeorge

@johnugeorge
Copy link
Member

@gaocegege Sorry.I missed it.
What does "Running" policy mean in case of pytorch which has master+worker? I think, it makes sense only with PS pods where it can be kept running or not.
The pytorch behavior was itself little inconsistent when I verified CleanPodPolicy as running in the earlier implementation phase(didn't try with pytorch 1.0). At the instant when job gets completed, I saw that some pods were in running while some in completed. This will cause these pods to get deleted while some remain. i felt, this to be confusing to the users. WDYT? Can you verify if you can?

As you are saying, it makes sense to have consistent behavior since PodPolicy is shared now. It doesn't hurt anyways to add them. Please share your thoughts

@gaocegege
Copy link
Member Author

gaocegege commented Aug 6, 2019

@johnugeorge

Thanks for your comments. I also think Running policy should be avoided in PytorchJob. But we have such case:

The users want to stop running pods in pytorch-operator as tf-operator does. And they set the policy to Running. But all pods are deleted since we treat Running as All.

Then how about:

func (pc *PyTorchController) deletePodsAndServices(job *pyv1.PyTorchJob, pods []*v1.Pod) error {
	if len(pods) == 0 {
		return nil
	}
	// Delete nothing when the cleanPodPolicy is None.
	if *job.Spec.CleanPodPolicy == common.CleanPodPolicyNone || *job.Spec.CleanPodPolicy == common.CleanPodPolicyRunning {
		return nil
	}

We do not delete all pods, we follow the same behavior as CleanPodPolicyNone

@johnugeorge
Copy link
Member

Looks good to me. It would be better than deleting the pods which adds more confusion.

@gaocegege
Copy link
Member Author

Yeah, I think so. I will update it

Signed-off-by: Ce Gao <gaoce@caicloud.io>
@gaocegege
Copy link
Member Author

/assign @johnugeorge

@johnugeorge
Copy link
Member

/lgtm

@johnugeorge
Copy link
Member

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit ca2c9c4 into kubeflow:master Aug 6, 2019
@gaocegege gaocegege deleted the running branch October 30, 2019 03:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants