Possible to change http-usermetric
Port?
#15223
Labels
kind/question
Further information is requested
http-usermetric
Port?
#15223
Ask your question here:
Hello! I am working on an integration between Kserve/Knative with vLLM for deploying LLMs. vLLM is a production inference server for LLMs, and I have instrumented it with Prometheus metrics that are specific to LLM serving. For instance, the key items include TTFT (time-to-first-token) and TPOT (time-per-output-token). I want to use these metrics in addition to the generic metrics exposed by the
queue-proxy
container.KServe has a feature called
qpext
, which enables aggregation of thequeue-proxy
container metrics with thevllm
container metrics.qptext
exposes the aggregated metrics on port 9088 and exposes thequeue-proxy
metrics on port 9091. The issue I am running into is that when I create myInferenceService
(which uses Knative Serving), only port 9091 is exposed (this port is namedhttp-usermetric
):As a result, when I create a
ServiceMonitor
to monitor myInferenceService
, I am unable to query port9088
where the vLLM metrics are aggregated with thequeue-proxy
metrics.I am going to proceed by using
PodMonitor
for the time being, but I would prefer to use aServiceMonitor
as this seems like best practice after my review of the Prometheus Operator documentation.So my question is:
http-usermetrics
port that is exposed by the KNative services?Podmonitor
best practices for monitoring user-defined metrics from applications inside Knative?Apologies if this is the wrong place to ask this. I was not quite sure whether this made more sense to ask in the KServe or KNative forums.
The text was updated successfully, but these errors were encountered: