Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to change http-usermetric Port? #15223

Open
robertgshaw2-neuralmagic opened this issue May 18, 2024 · 2 comments
Open

Possible to change http-usermetric Port? #15223

robertgshaw2-neuralmagic opened this issue May 18, 2024 · 2 comments
Labels
kind/question Further information is requested

Comments

@robertgshaw2-neuralmagic
Copy link

robertgshaw2-neuralmagic commented May 18, 2024

Ask your question here:

Hello! I am working on an integration between Kserve/Knative with vLLM for deploying LLMs. vLLM is a production inference server for LLMs, and I have instrumented it with Prometheus metrics that are specific to LLM serving. For instance, the key items include TTFT (time-to-first-token) and TPOT (time-per-output-token). I want to use these metrics in addition to the generic metrics exposed by the queue-proxy container.

KServe has a feature called qpext, which enables aggregation of the queue-proxy container metrics with the vllm container metrics. qptext exposes the aggregated metrics on port 9088 and exposes the queue-proxy metrics on port 9091. The issue I am running into is that when I create my InferenceService (which uses Knative Serving), only port 9091 is exposed (this port is named http-usermetric):

NAME                                      TYPE           CLUSTER-IP       EXTERNAL-IP                                            PORT(S)                                              AGE
tinyllama                                 ExternalName   <none>           knative-local-gateway.istio-system.svc.cluster.local   <none>                                               8m34s
tinyllama-predictor                       ExternalName   <none>           knative-local-gateway.istio-system.svc.cluster.local   80/TCP                                               8m35s
tinyllama-predictor-00001                 ClusterIP      10.125.177.215   <none>                                                 80/TCP,443/TCP                                       8m55s
tinyllama-predictor-00001-private         ClusterIP      10.125.180.69    <none>                                                 80/TCP,443/TCP,9090/TCP,9091/TCP,8022/TCP,8012/TCP   8m56s

As a result, when I create a ServiceMonitor to monitor my InferenceService, I am unable to query port 9088 where the vLLM metrics are aggregated with the queue-proxy metrics.

I am going to proceed by using PodMonitor for the time being, but I would prefer to use a ServiceMonitor as this seems like best practice after my review of the Prometheus Operator documentation.

So my question is:

  • Is there any way to change the http-usermetrics port that is exposed by the KNative services?
  • If not, is using Podmonitor best practices for monitoring user-defined metrics from applications inside Knative?

Apologies if this is the wrong place to ask this. I was not quite sure whether this made more sense to ask in the KServe or KNative forums.

@robertgshaw2-neuralmagic robertgshaw2-neuralmagic added the kind/question Further information is requested label May 18, 2024
@robertgshaw2-neuralmagic robertgshaw2-neuralmagic changed the title Possible to change http-usermetrics Port? Possible to change http-usermetric Port? May 18, 2024
@skonto
Copy link
Contributor

skonto commented May 20, 2024

Hi @robertgshaw2-neuralmagic I think the right place to ask this is KServe community. In the meantime here is my understanding. When qptext gets a request on 9088, it combines metrics from 9091 and the app port(vllm runtime for this case) and returns the aggregated metrics.

Now, you could create your own K8s service and point to the aggregated port. The service you are referring to above is an internal Knative service and only exposes the 9091 port. This does not stop you from exposing metrics to some other port and scrape it independently with a service monitor. You could do the same with port 9088.

Btw 9088 is the qptext aggregation port, what is the vllm runtime port from which the qptext will get the metrics (is it the default 8080)? Have you test if those ports work within the container, do you get any metrics back?
Another question is whether you are using Istio or not as the latter provides metrics aggregation as it affects the setup.

@robertgshaw2-neuralmagic
Copy link
Author

robertgshaw2-neuralmagic commented May 20, 2024

Thanks @skonto - this is very helpful. I am somewhat new to Knative/KServe so I am trying to learn best practices for creating additional services vs updating configs of Knative/Kserve.

The vllm runtime uses port 8000 for both metrics and user interaction API. I am going to change this.

Right now I have setup using Istio for client connections from outside the cluster. Since the Prometheus server is running inside my cluster, I was not going though Istio for metrics aggregation.

Would you suggest I use Istio for scraping the prom metrics as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants