Flag to not advertise NUMA information #320

blackgold · 2021-01-29T21:41:00Z

What would you like to be added?

Flag to not advertise NUMA information

What is the use case for this feature / enhancement?

Logic to generate placement hints in topology manager is exponential to number of numa cores.
When we have like 8 nodes it takes really long.
We are not using numa information for rdma, so if device plugin does not send it (configurable by flag)it will be helpful.

killianmuldoon · 2021-01-31T13:12:51Z

@blackgold This is really interesting - have you got numbers for how long the TM calculation is taking? Does it impact your container startup time? It would be really helpful to understand the impact of the Topology calculation.

zshi-redhat · 2021-02-01T01:55:26Z

@blackgold I assume you also have other device plugin instances running in the same cluster that requires NUMA advertising, correct? so disabling NUMA policy in kubelet is not an option here.

blackgold · 2021-02-01T16:52:55Z

@blackgold This is really interesting - have you got numbers for how long the TM calculation is taking? Does it impact your container startup time? It would be really helpful to understand the impact of the Topology calculation.

It takes more than 20 minutes. Jobcontroller kills the jobs in pending state for more than 20 mins after binding to node.
I will try to add some logs in kubelet to time it.

blackgold · 2021-02-01T16:55:24Z

@blackgold I assume you also have other device plugin instances running in the same cluster that requires NUMA advertising, correct? so disabling NUMA policy in kubelet is not an option here.

Ack. we have gpu device plugin advertising topology information so cannot disable it in kubelet.
Jobs requiring less than 8 gpu's don't request rdma resources.So we need it enabled in kubelet for this case.

killianmuldoon · 2021-02-01T17:14:23Z

@blackgold Is this an 8 NUMA zone node? I didn't realize the Topology Manager calculation could take so long - any extra information on the set up and config would be great.

@zshi-redhat this seems like a must-have for these sorts of situations. Do you think it would work as a cmd flag i.e. daemonset wide(but not necessarily cluster wide) , or would it be better to have it as a per-pool config (would allow TM active for SRIOV on some pools but not on others)

blackgold · 2021-02-01T18:38:48Z

Yup its a 8 NUMA zone node, 8 gpu, 8 RDMA devices and 255 cpus.

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/topologymanager/policy.go#L142
Here the size of allProviderHints is 10x239.

When I timed it took 220 seconds to generate permutations from [8,0] to [8,96] ~= 22944 function calls. 220 seconds seems a lot for those many function calls. Need to debug more.

zshi-redhat · 2021-02-02T01:53:51Z

@zshi-redhat this seems like a must-have for these sorts of situations. Do you think it would work as a cmd flag i.e. daemonset wide(but not necessarily cluster wide) , or would it be better to have it as a per-pool config (would allow TM active for SRIOV on some pools but not on others)

I think having a per-pool config would allow more flexibility and ultimately solve any relevant issues. For example, running one device plugin instance would be possible for several resource pools, with NUMA enabled for some pools but not the others.
If we only have cli option, then user would need run multiple instances of device plugin, with each using different NUMA cli config.

For this particular case, my understanding is GPU is advertised by a different device plugin (may not be sriov), so having a cli option would be enough.

adrianchiris · 2021-02-02T07:31:00Z

was an issue filed against topology manager ? maybe the algorithm can be improved

blackgold · 2021-02-03T01:38:30Z

was an issue filed against topology manager ? maybe the algorithm can be improved
not yet. @klueska

If you guys think its reasonable to control this using a cli option i can send out a mr

zshi-redhat · 2021-02-03T02:07:46Z

was an issue filed against topology manager ? maybe the algorithm can be improved
not yet. @klueska

If you guys think its reasonable to control this using a cli option i can send out a mr

I'm fine with using a cli option, this is aligned with the discussion we had in #320 and resource mgmt meeting - to have a featureGate for features that may need to be enabled/disabled. I think numa could be one example of such.

/cc @killianmuldoon @ahalim-intel @adrianchiris @martinkennelly

killianmuldoon · 2021-02-03T11:56:34Z

@zshi-redhat I think a feature gate is a good idea here for sure, but we should think about implementing per-pool numa-awareness (default on, opt out for a specific pool) for advanced cases where sriov topology may not be important (one NIC per node, multi-resource NUMA contstraints).

adrianchiris mentioned this issue Feb 8, 2021

Introduce Process for API changes, experimental features and feature gates #321

Open

martinkennelly added the enhancement New feature or request label Jul 22, 2021

rthakur-est mentioned this issue Mar 21, 2022

Make advertising of NUMA node optional #416

Merged

rthakur-est mentioned this issue May 9, 2022

Update docs for excludeTopology config #425

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flag to not advertise NUMA information #320

Flag to not advertise NUMA information #320

blackgold commented Jan 29, 2021

killianmuldoon commented Jan 31, 2021

zshi-redhat commented Feb 1, 2021

blackgold commented Feb 1, 2021

blackgold commented Feb 1, 2021

killianmuldoon commented Feb 1, 2021

blackgold commented Feb 1, 2021

zshi-redhat commented Feb 2, 2021

adrianchiris commented Feb 2, 2021 •

edited

Loading

blackgold commented Feb 3, 2021

zshi-redhat commented Feb 3, 2021

killianmuldoon commented Feb 3, 2021

Flag to not advertise NUMA information #320

Flag to not advertise NUMA information #320

Comments

blackgold commented Jan 29, 2021

What would you like to be added?

What is the use case for this feature / enhancement?

killianmuldoon commented Jan 31, 2021

zshi-redhat commented Feb 1, 2021

blackgold commented Feb 1, 2021

blackgold commented Feb 1, 2021

killianmuldoon commented Feb 1, 2021

blackgold commented Feb 1, 2021

zshi-redhat commented Feb 2, 2021

adrianchiris commented Feb 2, 2021 • edited Loading

blackgold commented Feb 3, 2021

zshi-redhat commented Feb 3, 2021

killianmuldoon commented Feb 3, 2021

adrianchiris commented Feb 2, 2021 •

edited

Loading