New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SDK] Get Job Pods Events #1863
Comments
Good idea. +1 |
It sounds good! Also, at the same time, showing the event of FrameworkJob (e.g., TFJob) at the top might be helpful. |
Should this be a different API for providing more clarity ? |
It would be helpful! Maybe we should add new API |
@tenzen-y That sounds good. The question is how to identify which Job user created ?
The same labels could have multiply jobs (e.g. PyTorchJob, XGBoostJob). That ties to my other question, if we are going to introduce mandatory We can do the same for all other APIs: After refactoring our SDK: #1719, I noticed that it is very confusing for the user that we have some CRUD operations job specific (e.g. |
I am not sure, if users who are not familiar with Kubernetes should know differences between events and logs.
I like the idea @kuizhiqing, what would be easier to understand |
+1 Agree with you @andreyvelich |
@andreyvelich Thanks for the clarification. |
@andreyvelich Well, you'er right, it would be better to use |
/assign @andreyvelich |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/assign @andreyvelich |
Our SDK should provide a better visibility to debug/monitor Training Operator Jobs for our users.
Users should not use
kubectl
to get information about their Training Jobs.For example, Pod might stuck in
Pending/ContainerCreating
status for a few minutes (especially if image is huge or pod can't be scheduled to the Node) and user has to usekubectl
to understand it.Therefore, SDK should provide an API to expose Training Operator Job's Pods Kubernetes Events to give users better visibility.
I propose to extend
get_job_logs()
API to return Pod Events in addition to Pod logs.For example, the return might look as follows:
What do you think @kubeflow/wg-training-leads @tenzen-y @kuizhiqing ?
The text was updated successfully, but these errors were encountered: