You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What steps did you take and what happened:
When running KFS triton torchscript inference on kubernetes nodes with high number of cpu cores, the inference requests get heavily throttled leading to poor performance because the torchscript library by default spawns number of intra-op threads which is equal to the number of cores available on the nodes the pod is scheduled to. If there are 40 cores on a node and cpu limit is set to 4, each thread only gets 4*100(cfs cpu period)/40=10ms to run for a given period and throttled for the next 90ms(stop the world).
What did you expect to happen:
According to the torch doc we can set OMP_NUM_THREADS(openmp) and MKL_NUM_THREADS(mkl) to control the number of threads, KFS can set these environment variables by default based on the cpu limit.
I think this is not just for triton, sklearn, xgboost, tensorflow might need similar fix.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
CPU throttle metrics
/kind bug
What steps did you take and what happened:
When running KFS triton torchscript inference on kubernetes nodes with high number of cpu cores, the inference requests get heavily throttled leading to poor performance because the torchscript library by default spawns number of intra-op threads which is equal to the number of cores available on the nodes the pod is scheduled to. If there are 40 cores on a node and cpu limit is set to 4, each thread only gets 4*100(cfs cpu period)/40=10ms to run for a given period and throttled for the next 90ms(stop the world).
What did you expect to happen:
OMP_NUM_THREADS
(openmp) andMKL_NUM_THREADS
(mkl) to control the number of threads, KFS can set these environment variables by default based on the cpu limit.Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
CPU throttle metrics
Latency(ms) based on input length
Environment:
kubectl version
):/etc/os-release
):The text was updated successfully, but these errors were encountered: