Skip to content

[proposal] Provide an evolvable End to End Solution for Koordinator Device Management #2181

@ZiMengSheng

Description

@ZiMengSheng

What is your proposal:

Provide an evolvable End to End Solution for Koordinator Device Management

Why is this needed:

Koordinator already supports two functions in the scheduler: GPU shared scheduling and GPU & RDMA joint allocation. It supports users to apply for GPU or RDMA resources using kubrenetes extended resources and Hints defined on Pod Annotation. The extended resource method was originally introduced into Kubernetes mainly to describe discrete and countable node resources. The Kubelet Device Plugin interface is the main way for the Kubernetes community to support such resource reporting and allocation.

However, the allocation logic of Kubelet Device Manager does not support the refined joint allocation of multiple resources according to the device topology, such as the scenario where GPU and RDMA need to be allocated under a PCIESwitch. The only topology allocation supported by Kubelet is allocation according to NUMA. However, even in the scenario where only NUMA allocation is required, Kubelet intervenes a little late. Users will have to face performance degradation due to topology mismatch after pod has been scheduled.

To solve this problem, Koordinator moved the device allocation logic from Kubelet to the scheduler, and used cri-runtime-proxy on the node side to set up device isolation and visibility. However, the cri-runtime-proxy approach is indeed heavy and inconvenient to install. In addition, although the Koordinator scheduler provides the GPU and RDMA joint allocation function, there is no end-to-end solution available overall, especially on the node side, it has not yet been connected to the community standard RDMA logic. This proposal attempts to solve the above problems for Koordinator and provide an end-to-end feasible solution.

Finally, in the field of device management, the community proposed Dynamic Resource Allocation after the Device Plugin interface to overcome the various limitations of the current Device Plugin solution. This proposal will also show how Koordintor's GPU sharing and GPU & RDMA joint allocation are implemented under the DRA mode, and how the current solution evolves to DRA.

Key Results:

  • Provide a convenient End to End Solution for Koordinator Device Magement, especially solution as a replcament for runtime-proxy. This will be checked on the GPU Share use case.
  • Collabrate with DCGM to provide an GPU monitoring solution.
  • Collabrate with rdma and SRIOV solution Providers, such as sriov-device-plugin, multus, to provide an end to end solution for GPU & RDMA joint allocation.
  • Collabrate with Device Sharing and Isolation Solution Providers, such as HAMI, cGPU, MIG and vGPU, to enhance Koordinator GPU Share Solutions. This will be checked on the GPU Share use case with strict gpu usage limit.
  • Provide a End to End Solution with DRA structured parameters on k8s version 1.31. This will be checked on the GPU Share use case and GDR use case.

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions