New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openmpi: master/workers sync mechanism is replaced with k8s api from redis #696
openmpi: master/workers sync mechanism is replaced with k8s api from redis #696
Conversation
de9a84a
to
62cb90d
Compare
/ok-to-test |
62cb90d
to
41aa2d1
Compare
@pdmack I'm so sorry. I force pushed my branch because I had typo... Could you |
It's fine. Test is running. Stay tuned. |
/approve Thanks @everpeace ! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pdmack The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@everpeace @pdmack The worker reaches "running" state once init.sh starts running, but at that point the worker is not ready yet - it still has a bunch of initialization to finish. |
But I do agree that introducing redis for synchronization wasn't an optimal solution. |
ah, you're right!! Then, the master can wait until I created PR about this issue. Could you review this?? |
…redis (kubeflow#696) * openmpi: master/workers sync mechanism is replaced with k8s api from redis. * openmpi: add everpeace to contributors.
…f8 (kubeflow#696) * image * Image built from kubeflow/kubeflow@
Currently, openmpi package uses redis to sync master/workers. But I think it is overkilled a bit. And we can watch master/workers status via k8s api.
In a pod, service account's token and kubernetes api server endpoint is embedded. So, we can see pods status just executing curl like this.
So, I replaced master/workers sync mechanism with k8s api from redis.
What do you think?? I'm very happy if I could have some feedbacks.
This change is