Skip to content

Unlabeled Node Roles #234

@doronkg

Description

@doronkg

The following is described under node roles doc:

## Dedicated GPU & CPU Nodes

Separate nodes into those that:

* Run GPU workloads
* Run CPU workloads
* Do not run Run:ai at all. these jobs will not be monitored using the Run:ai Administration User interface. 

This is actually not true, all nodes in the cluster are displayed under Nodes tab in the Administration UI.
That includes Run:ai worker nodes, Run:ai system nodes, regular workers, and cluster masters.

All nodes containing GPUs and having DCGM exporting metrics upon them, would count as "GPU nodes" in the Overview dashboard.
That includes nodes that don't have the runai-container-toolkit & runai-container-toolkit-exporter DaemonSets running on them - that means that any Run:ai pod won't be scheduled upon them, but they are still counted.

Review nodes names using `kubectl get nodes`. For each such node run:

'```
runai-adm set node-role --gpu-worker <node-name>
'```

or 

'```
runai-adm set node-role --cpu-worker <node-name>
'```

Nodes not marked as GPU worker or CPU worker will not run Run:ai at all.

That's also not true, nodes that are not marked as GPU workers nor CPU workers would run any kind of Run:ai workload.
The same behavior will be achieved if both roles are assigned to a node.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions