-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JobSet networking - allow setting (shared) name? #136
Comments
Hey @vsoch! This sounds like a pretty useful ask. Are you interested in contributing? API updates can be made here (https://github.com/kubernetes-sigs/jobset/blob/main/api/v1alpha1/jobset_types.go#L124) and I think you can just update this function https://github.com/kubernetes-sigs/jobset/blob/main/pkg/controllers/jobset_controller.go#L677 to use the name in the sub domain. And then some integration tests to round this out. And no worries if you can't do it. Just wanted to see. |
I would love to! ❤️ I'm chewing on some autoscaling stuff right now but I'll make some time this week. |
hey @kannon92 I started on this today! I'm fairly green with testing controllers (terrible I know, but I have to start at some point I suppose!) could you point me to the test file that I should update, and can we talk about the overall strategy for the tests for this? I'm guessing something added to |
I'm mostly on the PTO the next few weeks so I won't be able to help too much. I glanced at your PR and I think you have the right idea. For integration tests you can't really test correctness with things like services but you can make sure that services match subdomains or something like that. The controller-runtime test library does not run all the controllers so you really only get things like creating the objects. I think @ahg-g or @danielvegamyhre will give you a detailed review.. ;) |
Enjoy your PTO! I'll probably do some sanity checks with the Flux Operator soon, because I know how I want/expect it to work there. |
heyo! Just wanted to give quick feedback that I was able to update the hyperqueue operator to use 0.2.0 of JobSet, and the new "use this Subdomain" to get everything on the same network worked great. Thanks again for this! |
What would you like to be added:
It would be nice if two replicated jobs could be put under the same networking space.
Why is this needed:
For cases of a JobSet with two or more ReplicatedJobs that want to interact, the current default (given that networking is enabled) is to create separate headless services (and addresses) for both. This would likely work in many cases, but for our specific case, we have a broker that is expecting an entire cluster (and components from different ReplicatedJobs) to be under the same top level name (just with different beginning pod names). E.g, so this would work:
But this would not:
In #132 I provide an example of trying to get it working with the current design, and needing to fall back to creating my own headless service (just one) that is shared by the two ReplicatedJobs. So my suggestion is to allow this - to allow specifying the name:
to generate the working ips (first block above) and doing this by just applying a selector label to each ReplicatedJob. In my case I used the "job-group" label instead of "job-name" because I think the latter needs to be unique.
After this implementation, it should be possible to eliminate all custom network logic from the Flux Operator PR with JobSet I'm testing, and just specify the name of the network that I want created (and shared between the jobs) and I think uniqueness of hostnames should be OK because the job names are still in there, which are required to be unique from one another.
The text was updated successfully, but these errors were encountered: