Capture clickhouse profile events in query summaries.#10172
Capture clickhouse profile events in query summaries.#10172
Conversation
ClickHouse includes a collection of profile events by default when using the native tcp client. This patch captures those events, aggregating them by type and including aggregated profile events in the optional query profile section. We also make use of these profile summaries in the oxql benchmark, adding a new benchmark type that measures query cpu usage rather than latency.
bnaecker
left a comment
There was a problem hiding this comment.
This is great! I completely forgot that we already received and parsed these event messages.
I have a few suggestions, but they're fairly minor. Thanks!
| fn bench_metric() -> BenchMetric { | ||
| match std::env::var("BENCH_METRIC").as_deref() { | ||
| Ok("cpu") => BenchMetric::CpuTime, | ||
| _ => BenchMetric::Latency, |
There was a problem hiding this comment.
I'd probably match explicitly here, rather than the _ wildcard, and fail if the benchmark isn't a supported one.
|
|
||
| fn bench_metric() -> BenchMetric { | ||
| match std::env::var("BENCH_METRIC").as_deref() { | ||
| Ok("cpu") => BenchMetric::CpuTime, |
There was a problem hiding this comment.
Should the string match the variant name? E.g. "cpu_time"?
| s.profile_summary | ||
| .get("UserTimeMicroseconds") | ||
| .copied() | ||
| .unwrap_or(0) |
There was a problem hiding this comment.
I'm a little suspect of the unwrap_or(0). If the key isn't there, e.g., because of a change when we upgrade ClickHouse, I would want to know. Maybe at least an eprintln!() would help? Same note below too.
| /// Wall clock latency. | ||
| Latency, | ||
| /// Total cpu time. | ||
| CpuTime, |
There was a problem hiding this comment.
Let's make a note that this is user + system time as reported by the DB.
ClickHouse includes a collection of profile events by default when using the native tcp client. This patch captures those events, aggregating them by type and including aggregated profile events in the optional query profile section. We also make use of these profile summaries in the oxql benchmark, adding a new benchmark type that measures query cpu usage rather than latency.
Context: I wanted to evaluate #10110 more rigorously, and Claude noticed that we had access to clickhouse profiling events already. Looking at cpu profiles for that patch actually showed that latency improvements came at the cost of higher cpu use, which is annoying but useful to know.