Support customizable telemetry data retention policy#10366
Conversation
bnaecker
commented
May 4, 2026
- Add clickhouse-admin types for represeting the telemetry retention policy and the disk usage of the database tables or oximeter timeseries specifically.
- Add clickhouse-admin-server APIs for manipulating the policy and listing the database and oximeter-timeseries usage.
|
This still needs a good bit of work, but I'd like to get early eyes on it. This is supposed to resolve #10357 by letting us set a customizable retention policy (in days) on the tables in the oximeter database. It does that through new endpoints on the Like I said, this needs work, but I really want some kind of guardrails and procedure around manipulating the retention policy at our customer sites, rather than manually writing SQL against the database. Hopefully this can be made good enough by the time we want to release R19.3. |
e9d9ca7 to
78ba9ed
Compare
- Add clickhouse-admin types for represeting the telemetry retention policy and the disk usage of the database tables or oximeter timeseries specifically. - Add clickhouse-admin-server APIs for manipulating the policy and listing the database and oximeter-timeseries usage. - Add `omdb` subcommands for exercising the new APIs
78ba9ed to
e09811f
Compare
jmcarp
left a comment
There was a problem hiding this comment.
Not approving yet since this is a draft and still in progress, but the approach LGTM. We can write up follow-ups at lower urgency related to migrations (low priority, since we're not currently running ch migrations on update) and per-timeseries stats.
|
I got this installed on Updating the retention policy also works, and is reflected on the actual tables in ClickHouse: Although I'm going to remove it in this PR, the per-timeseries data is also there. We might want to resurrect it, assuming we can find a reasonably accurate way to compute it: |
- Remove oximeter usage computation, types, and API - Add -w flag to `omdb` subcommand for setting retention
|
Alright, I've removed all the code for computing the usage by timeseries. We'll flesh that out later if we need it. This is ready for proper review. |
jgallagher
left a comment
There was a problem hiding this comment.
Just a bunch of nitpicks and questions from me; feel free to take or leave as you see fit. I know there's urgency here.
| ); | ||
|
|
||
| // Jump forward until we actually do compute the usage again. | ||
| tokio::time::pause(); |
There was a problem hiding this comment.
How does pausing time in this tokio runtime interact with the ClickHouseDeployment we spawned above? Is it also affected by this pause?
There was a problem hiding this comment.
It is affected by the pause, since they're using the same test runtime. As for what actually happens, I'm not sure. I could avoid all this shittiness by defining the update interval to be smaller during tests. That would make them take a few seconds, but never have to worry about this particular wart.
- wait_for_condition over sleep - record errors setting retention policy and keep trying other tables - expectorate tests for brittle string matching - rename oximeter client for clarity - allow replicated for omdb tooling
|
I ran this one more time on |