You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What would you like to be added:
We need a mechanism to enable Kueue's jobframework.integrationManager to recognize GVKs that are being managed by an instantiation of jobframework.ReconcilerFactory that is defined/running in an external controller. In particular, methods like jobframework.IsOwnerManagedByKueue and GetEmptyOwnerObject should be extended to consult an additional table of GVKs that are known to be Kueue enabled. This table would be populated from information added to the Integrations sub-structure of Kueue's Configuration Resource.
Why is this needed:
Being able to cleanly extend Kueue with the ability to manage additional GVKs without needing to modify Kueue itself would make it easier to grow the Kueue ecosystem.
In the AppWrapper project (https://github.com/project-codeflare/appwrapper), we have a working example of such an external controller that extends Kueue to manage a new GVK. As described in more detail in the Working Group call of 4/11 and this presentation the inability to inform the integrationManager of the new Kueue managed type results in a failure to correctly recognize child jobs, which then requires a fragile workaround (our child admission controller).
Completion requirements:
With this enhancement, when an AppWrapper containing another Kueue-managed GVK (for example a PyTorch Job) is admitted by Kueue, the wrapped PyTorch Job should be properly recognized by Kueue as a child job of an already admitted Job and be admitted. To test, we would disable our child admission controller in the AppWrapper operator and verify that the child was admitted as expected.
Design doc
API change
Docs update
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered:
I intend to work on an implementation once there is agreement on a design. I think an extension to the Configuration.Integrations to add a new array of strings listing externally-managed GVKs should work, but I'm happy to do something else if there is a preferred alternative.
Another option could be to just add all the frameworks into the same frameworks list, and have kueue identify which frameworks are built in. But then it wouldn't be possible for users to "override" a built-in framework, if they have such need.
What would you like to be added:
We need a mechanism to enable Kueue's
jobframework.integrationManager
to recognize GVKs that are being managed by an instantiation ofjobframework.ReconcilerFactory
that is defined/running in an external controller. In particular, methods likejobframework.IsOwnerManagedByKueue
andGetEmptyOwnerObject
should be extended to consult an additional table of GVKs that are known to be Kueue enabled. This table would be populated from information added to the Integrations sub-structure of Kueue's Configuration Resource.Why is this needed:
Being able to cleanly extend Kueue with the ability to manage additional GVKs without needing to modify Kueue itself would make it easier to grow the Kueue ecosystem.
In the AppWrapper project (https://github.com/project-codeflare/appwrapper), we have a working example of such an external controller that extends Kueue to manage a new GVK. As described in more detail in the Working Group call of 4/11 and this presentation the inability to inform the integrationManager of the new Kueue managed type results in a failure to correctly recognize child jobs, which then requires a fragile workaround (our child admission controller).
Completion requirements:
With this enhancement, when an AppWrapper containing another Kueue-managed GVK (for example a PyTorch Job) is admitted by Kueue, the wrapped PyTorch Job should be properly recognized by Kueue as a child job of an already admitted Job and be admitted. To test, we would disable our child admission controller in the AppWrapper operator and verify that the child was admitted as expected.
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: