High CPU throttling when running torchscript inference with triton on high number cores node #1031

yuzisun · 2020-08-19T13:54:14Z

/kind bug

What steps did you take and what happened:
When running KFS triton torchscript inference on kubernetes nodes with high number of cpu cores, the inference requests get heavily throttled leading to poor performance because the torchscript library by default spawns number of intra-op threads which is equal to the number of cores available on the nodes the pod is scheduled to. If there are 40 cores on a node and cpu limit is set to 4, each thread only gets 4*100(cfs cpu period)/40=10ms to run for a given period and throttled for the next 90ms(stop the world).

What did you expect to happen:

According to the torch doc we can set OMP_NUM_THREADS(openmp) and MKL_NUM_THREADS(mkl) to control the number of threads, KFS can set these environment variables by default based on the cpu limit.
I think this is not just for triton, sklearn, xgboost, tensorflow might need similar fix.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
CPU throttle metrics

Latency(ms) based on input length

Environment:

Istio Version: 1.6.2
Knative Version: 0.12.1
KFServing Version: 0.4
Kubeflow version:
Kfdef:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
Minikube version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

issue-label-bot · 2020-08-19T13:54:22Z

Issue Label Bot is not confident enough to auto-label this issue.
See dashboard for more details.

yuzisun · 2021-01-31T16:56:42Z

now with KFS 0.5 you can set the OMP_NUM_THREADS env variables.

k8s-ci-robot added the kind/bug label Aug 19, 2020

yuzisun added the kfserving/v1beta1 label Aug 19, 2020

yuzisun added this to To do in KFServing 0.5 Sep 19, 2020

yuzisun moved this from To do to In progress in KFServing 0.5 Oct 28, 2020

yuzisun closed this as completed Jan 31, 2021

yuzisun moved this from In progress to Done in KFServing 0.5 Jan 31, 2021

langong347 mentioned this issue Mar 6, 2024

CPU Throttling when Deploying Triton with ONNX Backend on Kubernetes triton-inference-server/onnxruntime_backend#245

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU throttling when running torchscript inference with triton on high number cores node #1031

High CPU throttling when running torchscript inference with triton on high number cores node #1031

yuzisun commented Aug 19, 2020 •

edited

Loading

issue-label-bot bot commented Aug 19, 2020

yuzisun commented Jan 31, 2021

High CPU throttling when running torchscript inference with triton on high number cores node #1031

High CPU throttling when running torchscript inference with triton on high number cores node #1031

Comments

yuzisun commented Aug 19, 2020 • edited Loading

issue-label-bot bot commented Aug 19, 2020

yuzisun commented Jan 31, 2021

yuzisun commented Aug 19, 2020 •

edited

Loading