New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[openmpi] Introduce a sidecar container for inter-pod synchronization #704
Conversation
/assign @jlewi |
/retest |
@jiezhang Thank you for the solution 👍 And my apologies to have introduced insufficient syncs... Don't get me wrong. No offence was meant 🙇 . I just wanted to propose how to modify this package by opening PRs.
|
@evergreen Thanks for your contribution. The package is still in its early stages. Changes are welcome. I think it’s hard to maintain init.sh in the long term without making changes to the docker image. I’m introducing a separate container where we have complete control to make future changes easier. We should be able to run the script locally to validate its functionality. I’m planning to add more features to it, e.g. backing up the logs and trained model to persistent storage. It’s much easier to implement more advanced features using python. And it should be compatible with older versions of k8s according to the documentation: https://github.com/kubernetes-client/python/blob/master/README.md |
I agree with it.
That's nice. I would be very happy if I could contribute to it. Is there any space for it?
I misunderstood the meaning of '+' mark on the doc. right, It should work. I'm not one of a reviewer, but, this PR looks good to me 😀 |
51f1d31
to
3abe08d
Compare
/retest |
@jiezhang @everpeace thumbs up? We good? |
3d9a59a
to
a61a13a
Compare
* openmpi-controller monitors the master pod's status and creates a semaphore file "term.sig" to signal openmpi-job to terminate * openmpi-job is now decoupled from kubernetes * openmpi-controller and openmpi-job shares a volume for inter-container communication * openmpi-controller can be extended in the future to support data snapshot
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pdmack The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@everpeace I opened #713 to track the future work in the controller. Feel free to provide your feedback there. |
…kubeflow#704) * openmpi-controller monitors the master pod's status and creates a semaphore file "term.sig" to signal openmpi-job to terminate * openmpi-job is now decoupled from kubernetes * openmpi-controller and openmpi-job shares a volume for inter-container communication * openmpi-controller can be extended in the future to support data snapshot
This change is