New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create & label P1 issues needed for an initial release of Horovod support #778
Comments
Thanks @jlewi. What's the timeline for 0.2 release? I think we should rename the area as openmpi. The package was written in a way that it's not coupled with horovod. For example, @everpeace is using the package to run distributed machine learning tasks which has nothing to do with horovod. cc @alsrgv |
/area openmpi |
@jlewi Looks like the area command is not working for me. |
The area/openmpi label doesn't exist. Can we use the area/horvod? |
@jlewi The package is not limited to horovod. It can run any MPI job. That's why I propose to rename the label as openmpi. |
@jlewi Yeah, naming is a bit complicated. as jiezhang said, openmpi package is technically independent from Horovod. And, actually, I'm using this package with Chainer/ChainerMN.
Sure. I'm happy to do it. Here are several new issues which I would like to put in 0.2. "new" means that they have not created yet. What do you think @jiezhang ??
@jlewi I opened three PR yesterday, How should I handle them? I think I would create stab isseues for them and put the label. Is that ok?? Or can I put labels to PR directly? cc/ @jiezhang off topic here though, I'm now thinking I would make a proposal of |
I created the label area/openmpi |
@everpeace securityContext(RunAs) does not support group id, how can you specify gid in container when your programs read/write data using NFS |
/area openmpi |
@pineking However, this To secure NFS data, you will need to strip // I updated my original comment. |
@everpeace Great, Thanks for your information. I will try the |
Thanks @jlewi . We'll create/tag issues needed for initial release. |
|
@jiezhang @everpeace I see two issues labeled area/openmpi ? Is the list complete? Does it include items related to Horovod and Open MPI support? |
/area 0.4.0 |
@everpeace What is the status openmpi support? Can we close this issue and open up appropriate issues tagged 0.4.0? |
yes, please 🙇 But as a personal opinion, encouraging users to migrate to MPIJob CRDs would be nicer. |
@everpeace Can you explain that last comment? Are you saying that in 0.4 we should remove And tell people to use the |
@jlewi sorry for the confusion.
Yes, I was. But, in this several days, I understood that there are some active users using openmpi package. So, I think we don't need to remove this package at least in 0.4. However, I would like to inform users the fact that they have an option to use mpi crd instead of openmpi package. Because, openmpi package is based on bare pods so no fault tolerance (it can NOT retry when failure), but mpi-operator expands mpi crd to |
SGTM. Any suggestion about how we should start pushing users to use the CRD? |
I would suggest putting some note on openmpi package README which recommends using |
Hello all, could someone give status update of this issue? would love to connect with the driver for this..is this you @everpeace and @jiezhang ? |
I filed kubeflow/website#272 to update the docs. #1859 to remove the existing package. |
Thanks for creating issues 🙇 |
kubeflow#778) * Use kustomize to make it easier to maintain the versioned KFDef specs. * For each KF release we need to define a KFDef spec that overrides certain values (e.g. the repo of kubeflow/manifests it uses) * Previously we just did this by modifying the KFDef specs on the release branch. But this was very costly to maintain; i.e. backporting changes on master to the release branch * To make that easier we can generate the KFDef YAML files using kustomize; this allows us to use overlays to define the changes needed to customize the specs for a particular version * We can keep these versioned overlays on master so that the divergence between master and the branches is very low * To preserve existing behavior we still check in YAML files. a simple script build_kfdef_specs.py is provided to generate them. Related to: kubeflow#4685 * Set namespace. * Update docs.
We should create issues for everything that needs to happen to do an initial release of Horvod support. All such issues should be priority/p1.
We'd like to include Horvod in our 0.2 release. We need to figure out all the work that needs to happen to support that.
I created the label area/horvod
Anyone in the org should be able to label issues as follows
/area horvod
/priority p1
@jiezhang @everpeace Could you take a stab at coming up with a list of issues and labeling existing issues?
The text was updated successfully, but these errors were encountered: