Using v1, I'm interested in measuring the conversation latency as defined in the v0 docs
While the various ttfts and durations are exposed in the v1 metrics models, I'm unsure how to identify each "turn" of a conversation to correlate latencies of the same. Is there an example that demonstrates how this might be implemented?
To add onto this, and assuming there is a way to identify a specific conversation turn, a cascading pipeline approach may involve the use of tool calls in which case the conversation latency needs to include the full duration of the tool calling LLM inference step. Is there a way to easily identify the type of LLM call made through the metrics framework or would we have to rely heuristics (i.e. only count ttft of the last llm call and any prior llm calls count its full duration)?
Using v1, I'm interested in measuring the conversation latency as defined in the v0 docs
While the various ttfts and durations are exposed in the v1 metrics models, I'm unsure how to identify each "turn" of a conversation to correlate latencies of the same. Is there an example that demonstrates how this might be implemented?
To add onto this, and assuming there is a way to identify a specific conversation turn, a cascading pipeline approach may involve the use of tool calls in which case the conversation latency needs to include the full duration of the tool calling LLM inference step. Is there a way to easily identify the type of LLM call made through the metrics framework or would we have to rely heuristics (i.e. only count
ttftof the last llm call and any prior llm calls count its fullduration)?