Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU throttling when running torchscript inference with triton on high number cores node #1031

Closed
yuzisun opened this issue Aug 19, 2020 · 2 comments

Comments

@yuzisun
Copy link
Member

yuzisun commented Aug 19, 2020

/kind bug

What steps did you take and what happened:
When running KFS triton torchscript inference on kubernetes nodes with high number of cpu cores, the inference requests get heavily throttled leading to poor performance because the torchscript library by default spawns number of intra-op threads which is equal to the number of cores available on the nodes the pod is scheduled to. If there are 40 cores on a node and cpu limit is set to 4, each thread only gets 4*100(cfs cpu period)/40=10ms to run for a given period and throttled for the next 90ms(stop the world).

What did you expect to happen:

  • According to the torch doc we can set OMP_NUM_THREADS(openmp) and MKL_NUM_THREADS(mkl) to control the number of threads, KFS can set these environment variables by default based on the cpu limit.
  • I think this is not just for triton, sklearn, xgboost, tensorflow might need similar fix.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
CPU throttle metrics

image

Latency(ms) based on input length

image

Environment:

  • Istio Version: 1.6.2
  • Knative Version: 0.12.1
  • KFServing Version: 0.4
  • Kubeflow version:
  • Kfdef:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
  • Minikube version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):
@issue-label-bot
Copy link

Issue Label Bot is not confident enough to auto-label this issue.
See dashboard for more details.

@yuzisun yuzisun added this to To do in KFServing 0.5 Sep 19, 2020
@yuzisun yuzisun moved this from To do to In progress in KFServing 0.5 Oct 28, 2020
@yuzisun
Copy link
Member Author

yuzisun commented Jan 31, 2021

now with KFS 0.5 you can set the OMP_NUM_THREADS env variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
KFServing 0.5
  
Done
Development

No branches or pull requests

2 participants