Error deploying vLLM with Cluster Autoscaling #22

strangiato · 2024-05-23T22:52:59Z

Currently the vllm example fails to deploy before knative serving times out due to the time it takes to scale a GPU node.

An bug report has been filed here to track the issue:

https://issues.redhat.com/browse/RHOAIRFE-193

As a work around, once the GPU node as scaled, the deployment object created the knative service can be deleted to reset the timeout.

Alternatively if you create and scale a GPU node before the inference service is deployed it will be able to deploy the model server before the timeout is reached.

As a temporary workaround we may be able to configure the timeout as per the instructions here:

https://knative.dev/docs/serving/configuration/deployment/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error deploying vLLM with Cluster Autoscaling #22

Error deploying vLLM with Cluster Autoscaling #22

strangiato commented May 23, 2024

Error deploying vLLM with Cluster Autoscaling #22

Error deploying vLLM with Cluster Autoscaling #22

Comments

strangiato commented May 23, 2024