Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFJob 1.0 #968

Closed
4 tasks done
richardsliu opened this issue Mar 27, 2019 · 13 comments
Closed
4 tasks done

TFJob 1.0 #968

richardsliu opened this issue Mar 27, 2019 · 13 comments

Comments

@richardsliu
Copy link
Contributor

richardsliu commented Mar 27, 2019

Tracking items needed for TFJob 1.0:

/cc @gaocegege /cc @johnugeorge

@johnugeorge
Copy link
Member

Sounds good

@gaocegege
Copy link
Member

SGTM

@johnugeorge
Copy link
Member

Performance issues: #965

@k82cn
Copy link
Collaborator

k82cn commented Mar 30, 2019

I can help on the "Scheduling policy" to integrate with related scheduler :)

BTW, should "Scheduling policy" also be part of common operator?

@richardsliu
Copy link
Contributor Author

I've included SchedulingPolicy in the common API here: https://github.com/kubeflow/common/blob/master/operator/v1/types.go#L184

@richardsliu richardsliu added this to To Do in TFJob-PyTorch 1.0 Apr 8, 2019
@richardsliu richardsliu moved this from To Do to In Progress in TFJob-PyTorch 1.0 Apr 22, 2019
@richardsliu
Copy link
Contributor Author

richardsliu commented May 9, 2019

Discussed with community members - for TFJob 1.0, we will not be migrating to the new common API until the latter has been stabilized and tested with a new operator.

Instead the goal will be to release v1 from the v1beta2 implementation, with smaller improvements like enhancing metrics.

/cc @johnugeorge
/cc @gaocegege

@jlewi
Copy link
Contributor

jlewi commented May 13, 2019

Update from TFJob 1.0
TFJob 1.0 API is done
But not for PyTorch

@johnugeorge
Copy link
Member

@jlewi I will complete it in 2 days

@richardsliu
Copy link
Contributor Author

We can close this after #965 is fixed.

@jlewi
Copy link
Contributor

jlewi commented Jun 10, 2019

Regarding the UI bug: #1019. Should we get eventually rid of or keep the TFJobs dashboard? Would it be better to have a more general UI that isn't specific to TFJob? How far can we get with existing monitoring tools like the K8s dashboard?

Should we open up a separate issue to figure out the future of the TFJobs dashboard?

@johnugeorge
Copy link
Member

Related slack conversation: https://kubeflow.slack.com/archives/C985VJN9F/p1560184440012500

@richardsliu
Copy link
Contributor Author

/close

@k8s-ci-robot
Copy link

@richardsliu: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

TFJob-PyTorch 1.0 automation moved this from In Progress to Done Jun 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

6 participants