Skip to content

Support customizable telemetry data retention policy#10366

Merged
bnaecker merged 7 commits intomainfrom
ben/add-clickhouse-retention-policy
May 5, 2026
Merged

Support customizable telemetry data retention policy#10366
bnaecker merged 7 commits intomainfrom
ben/add-clickhouse-retention-policy

Conversation

@bnaecker
Copy link
Copy Markdown
Collaborator

@bnaecker bnaecker commented May 4, 2026

  • Add clickhouse-admin types for represeting the telemetry retention policy and the disk usage of the database tables or oximeter timeseries specifically.
  • Add clickhouse-admin-server APIs for manipulating the policy and listing the database and oximeter-timeseries usage.

@bnaecker bnaecker marked this pull request as draft May 4, 2026 06:00
@bnaecker
Copy link
Copy Markdown
Collaborator Author

bnaecker commented May 4, 2026

This still needs a good bit of work, but I'd like to get early eyes on it. This is supposed to resolve #10357 by letting us set a customizable retention policy (in days) on the tables in the oximeter database. It does that through new endpoints on the clickhouse-admin server(s), and follow-up commits will add omdb subcommands to exercising those endpoints.

Like I said, this needs work, but I really want some kind of guardrails and procedure around manipulating the retention policy at our customer sites, rather than manually writing SQL against the database. Hopefully this can be made good enough by the time we want to release R19.3.

@bnaecker bnaecker requested review from jgallagher and jmcarp May 4, 2026 06:14
@bnaecker bnaecker force-pushed the ben/add-clickhouse-retention-policy branch from e9d9ca7 to 78ba9ed Compare May 4, 2026 06:15
- Add clickhouse-admin types for represeting the telemetry retention
  policy and the disk usage of the database tables or oximeter
  timeseries specifically.
- Add clickhouse-admin-server APIs for manipulating the policy and
  listing the database and oximeter-timeseries usage.
- Add `omdb` subcommands for exercising the new APIs
@bnaecker bnaecker force-pushed the ben/add-clickhouse-retention-policy branch from 78ba9ed to e09811f Compare May 4, 2026 06:23
Comment thread oximeter/db/src/client/mod.rs Outdated
Comment thread oximeter/db/src/client/mod.rs Outdated
Comment thread oximeter/db/src/client/mod.rs
Comment thread oximeter/db/src/client/mod.rs
Copy link
Copy Markdown
Contributor

@jmcarp jmcarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not approving yet since this is a draft and still in progress, but the approach LGTM. We can write up follow-ups at lower urgency related to migrations (low priority, since we're not currently running ch migrations on update) and per-timeseries stats.

@bnaecker
Copy link
Copy Markdown
Collaborator Author

bnaecker commented May 4, 2026

I got this installed on dublin this morning to kick the tires. It does what it says on the tin. We can list the retention policy and usage:

ben@castle ~ $ pilot -r dublin tp login any
The illumos Project     helios-2.0.23966        May 2026
root@oxz_switch1:~# omdb clickhouse-admin
Operate on a single-node ClickHouse admin server

Usage: omdb clickhouse-admin [OPTIONS] <COMMAND>

Commands:
  retention-policy      Fetch the current database retention policy
  set-retention-policy  Set the current database retention policy
  database-usage        Fetch the current database table usage
  oximeter-usage        Fetch the current usage for each Oximeter timeseries
  help                  Print this message or the help of the given subcommand(s)

Options:
      --log-level <LOG_LEVEL>  log level filter [env: LOG_LEVEL=] [default: warn]
      --color <COLOR>          Color output [default: auto] [possible values: auto, always, never]
  -h, --help                   Print help

Connection Options:
      --clickhouse-admin-url <CLICKHOUSE_ADMIN_URL>
          URL of the ClickHouse admin server to query [env: OMDB_CLICKHOUSE_ADMIN_URL=]
      --dns-server <DNS_SERVER>
          [env: OMDB_DNS_SERVER=]

Safety Options:
  -w, --destructive  Allow potentially-destructive subcommands
root@oxz_switch1:~# omdb clickhouse-admin retention-policy
note: clickhouse-admin URL not specified.  Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using clickhouse-admin URL http://[fd65:fbff:d89e:102::5]:8888
Retention: 30 days
root@oxz_switch1:~# omdb clickhouse-admin database-usage
note: clickhouse-admin URL not specified.  Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using clickhouse-admin URL http://[fd65:fbff:d89e:102::5]:8888
  Last usage
     Started: 2026-05-04T19:10:35.865Z
   Completed: 2026-05-04T19:10:38.660Z

TABLE_NAME                          N_BYTES     N_ROWS
oximeter.fields_bool                0 B         0
oximeter.fields_i16                 20.21 KiB   4308
oximeter.fields_i32                 0 B         0
oximeter.fields_i64                 0 B         0
oximeter.fields_i8                  0 B         0
oximeter.fields_ipaddr              1.43 KiB    48
oximeter.fields_string              212.93 KiB  52444
oximeter.fields_u16                 23.81 KiB   2747
oximeter.fields_u32                 85.58 KiB   12650
oximeter.fields_u64                 1.01 KiB    9
oximeter.fields_u8                  19.91 KiB   4024
oximeter.fields_uuid                91.08 KiB   16598
oximeter.measurements_bool          119.27 KiB  17952
oximeter.measurements_bytes         0 B         0
oximeter.measurements_cumulativef32 0 B         0
oximeter.measurements_cumulativef64 0 B         0
oximeter.measurements_cumulativei64 0 B         0
oximeter.measurements_cumulativeu64 14.48 MiB   2165367
oximeter.measurements_f32           21.05 MiB   2681714
oximeter.measurements_f64           2.50 KiB    132
oximeter.measurements_histogramf32  0 B         0
oximeter.measurements_histogramf64  0 B         0
oximeter.measurements_histogrami16  0 B         0
oximeter.measurements_histogrami32  0 B         0
oximeter.measurements_histogrami64  0 B         0
oximeter.measurements_histogrami8   0 B         0
oximeter.measurements_histogramu16  0 B         0
oximeter.measurements_histogramu32  0 B         0
oximeter.measurements_histogramu64  1007.85 KiB 10552
oximeter.measurements_histogramu8   0 B         0
oximeter.measurements_i16           0 B         0
oximeter.measurements_i32           0 B         0
oximeter.measurements_i64           1.31 KiB    30
oximeter.measurements_i8            0 B         0
oximeter.measurements_string        0 B         0
oximeter.measurements_u16           0 B         0
oximeter.measurements_u32           0 B         0
oximeter.measurements_u64           1.85 MiB    301862
oximeter.measurements_u8            0 B         0
oximeter.timeseries_schema          3.49 KiB    86
oximeter.version                    187 B       1
system.asynchronous_metric_log      76.10 KiB   128408
system.metric_log                   551.85 KiB  2012
system.query_log                    2.86 MiB    27914
  Last error
    None

Updating the retention policy also works, and is reflected on the actual tables in ClickHouse:

root@oxz_switch1:~# omdb clickhouse-admin set-retention-policy --days 3
note: clickhouse-admin URL not specified.  Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using clickhouse-admin URL http://[fd65:fbff:d89e:102::5]:8888
root@oxz_switch1:~# omdb clickhouse-admin retention-policy
note: clickhouse-admin URL not specified.  Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using clickhouse-admin URL http://[fd65:fbff:d89e:102::5]:8888
Retention:  3 days
root@oxz_switch1:~# pilot host exec -c 'zoneadm list | grep click' 0-31
14  2F8JEXDK           failure: exit code 1
15  BRM27230037        failure: exit code 1
16  BRM23230018        ok: oxz_clickhouse_ed1d627f-dc77-4bf9-8466-f5e71373b51e
17  BRM23230010        failure: exit code 1
Error: some operations failed
root@oxz_switch1:~# pilot host login 16

    #####
   ##   ##
  ##   # ##  ##   ##
  ##  #  ##   ## ##     Oxide Computer Company
  ## #   ##    ###      Engineering
   ##   ##    ## ##
    #####    ##   ##    Compute Sled

BRM23230018 # zlogin oxz_clickhouse_ed1d627f-dc77-4bf9-8466-f5e71373b51e
[Connected to zone 'oxz_clickhouse_ed1d627f-dc77-4bf9-8466-f5e71373b51e' pts/5]
The illumos Project     helios-2.0.23966        May 2026
root@oxz_clickhouse_ed1d627f:~# ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
lo0/v6            static   ok           ::1/128
oxControlService13/ll addrconf ok       fe80::8:20ff:fe29:411d%oxControlService13/10
oxControlService13/omicron6 static ok   fd65:fbff:d89e:102::5/64
root@oxz_clickhouse_ed1d627f:~# clickhouse client --host fd65:fbff:d89e:102::5
ClickHouse client version 23.8.7.1.
Connecting to fd65:fbff:d89e:102::5:9000 as user default.
Connected to ClickHouse server version 23.8.7 revision 54465.

oxz_clickhouse_ed1d627f-dc77-4bf9-8466-f5e71373b51e.local :) select create_table_query from system.tables where database = 'oximeter';

SELECT create_table_query
FROM system.tables
WHERE database = 'oximeter'

Query id: c4e96a54-0417-4f2c-9593-d64089d495e8

┌─create_table_query─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE oximeter.fields_bool (`timeseries_name` String, `timeseries_key` UInt64, `field_name` String, `field_value` Bool, `last_updated_at` DateTime MATERIALIZED now()) ENGINE = ReplacingMergeTree ORDER BY (timeseries_name, field_name, field_value, timeseries_key) TTL last_updated_at + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.fields_i16 (`timeseries_name` String, `timeseries_key` UInt64, `field_name` String, `field_value` Int16, `last_updated_at` DateTime MATERIALIZED now()) ENGINE = ReplacingMergeTree ORDER BY (timeseries_name, field_name, field_value, timeseries_key) TTL last_updated_at + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.fields_i32 (`timeseries_name` String, `timeseries_key` UInt64, `field_name` String, `field_value` Int32, `last_updated_at` DateTime MATERIALIZED now()) ENGINE = ReplacingMergeTree ORDER BY (timeseries_name, field_name, field_value, timeseries_key) TTL last_updated_at + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.fields_i64 (`timeseries_name` String, `timeseries_key` UInt64, `field_name` String, `field_value` Int64, `last_updated_at` DateTime MATERIALIZED now()) ENGINE = ReplacingMergeTree ORDER BY (timeseries_name, field_name, field_value, timeseries_key) TTL last_updated_at + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.fields_i8 (`timeseries_name` String, `timeseries_key` UInt64, `field_name` String, `field_value` Int8, `last_updated_at` DateTime MATERIALIZED now()) ENGINE = ReplacingMergeTree ORDER BY (timeseries_name, field_name, field_value, timeseries_key) TTL last_updated_at + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.fields_ipaddr (`timeseries_name` String, `timeseries_key` UInt64, `field_name` String, `field_value` IPv6, `last_updated_at` DateTime MATERIALIZED now()) ENGINE = ReplacingMergeTree ORDER BY (timeseries_name, field_name, field_value, timeseries_key) TTL last_updated_at + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.fields_string (`timeseries_name` String, `timeseries_key` UInt64, `field_name` String, `field_value` String, `last_updated_at` DateTime MATERIALIZED now()) ENGINE = ReplacingMergeTree ORDER BY (timeseries_name, field_name, field_value, timeseries_key) TTL last_updated_at + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.fields_u16 (`timeseries_name` String, `timeseries_key` UInt64, `field_name` String, `field_value` UInt16, `last_updated_at` DateTime MATERIALIZED now()) ENGINE = ReplacingMergeTree ORDER BY (timeseries_name, field_name, field_value, timeseries_key) TTL last_updated_at + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.fields_u32 (`timeseries_name` String, `timeseries_key` UInt64, `field_name` String, `field_value` UInt32, `last_updated_at` DateTime MATERIALIZED now()) ENGINE = ReplacingMergeTree ORDER BY (timeseries_name, field_name, field_value, timeseries_key) TTL last_updated_at + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.fields_u64 (`timeseries_name` String, `timeseries_key` UInt64, `field_name` String, `field_value` UInt64, `last_updated_at` DateTime MATERIALIZED now()) ENGINE = ReplacingMergeTree ORDER BY (timeseries_name, field_name, field_value, timeseries_key) TTL last_updated_at + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.fields_u8 (`timeseries_name` String, `timeseries_key` UInt64, `field_name` String, `field_value` UInt8, `last_updated_at` DateTime MATERIALIZED now()) ENGINE = ReplacingMergeTree ORDER BY (timeseries_name, field_name, field_value, timeseries_key) TTL last_updated_at + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.fields_uuid (`timeseries_name` String, `timeseries_key` UInt64, `field_name` String, `field_value` UUID, `last_updated_at` DateTime MATERIALIZED now()) ENGINE = ReplacingMergeTree ORDER BY (timeseries_name, field_name, field_value, timeseries_key) TTL last_updated_at + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_bool (`timeseries_name` String, `timeseries_key` UInt64, `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(Bool)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_bytes (`timeseries_name` String, `timeseries_key` UInt64, `timestamp` DateTime64(9, 'UTC'), `datum` Array(UInt8)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_cumulativef32 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(Float32)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_cumulativef64 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(Float64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_cumulativei64 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(Int64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_cumulativeu64 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(UInt64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_f32 (`timeseries_name` String, `timeseries_key` UInt64, `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(Float32)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_f64 (`timeseries_name` String, `timeseries_key` UInt64, `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(Float64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_histogramf32 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `bins` Array(Float32), `counts` Array(UInt64), `min` Float32, `max` Float32, `sum_of_samples` Float64, `squared_mean` Float64, `p50_marker_heights` Array(Float64), `p50_marker_positions` Array(UInt64), `p50_desired_marker_positions` Array(Float64), `p90_marker_heights` Array(Float64), `p90_marker_positions` Array(UInt64), `p90_desired_marker_positions` Array(Float64), `p99_marker_heights` Array(Float64), `p99_marker_positions` Array(UInt64), `p99_desired_marker_positions` Array(Float64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_histogramf64 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `bins` Array(Float64), `counts` Array(UInt64), `min` Float64, `max` Float64, `sum_of_samples` Float64, `squared_mean` Float64, `p50_marker_heights` Array(Float64), `p50_marker_positions` Array(UInt64), `p50_desired_marker_positions` Array(Float64), `p90_marker_heights` Array(Float64), `p90_marker_positions` Array(UInt64), `p90_desired_marker_positions` Array(Float64), `p99_marker_heights` Array(Float64), `p99_marker_positions` Array(UInt64), `p99_desired_marker_positions` Array(Float64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_histogrami16 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `bins` Array(Int16), `counts` Array(UInt64), `min` Int16, `max` Int16, `sum_of_samples` Int64, `squared_mean` Float64, `p50_marker_heights` Array(Float64), `p50_marker_positions` Array(UInt64), `p50_desired_marker_positions` Array(Float64), `p90_marker_heights` Array(Float64), `p90_marker_positions` Array(UInt64), `p90_desired_marker_positions` Array(Float64), `p99_marker_heights` Array(Float64), `p99_marker_positions` Array(UInt64), `p99_desired_marker_positions` Array(Float64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_histogrami32 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `bins` Array(Int32), `counts` Array(UInt64), `min` Int32, `max` Int32, `sum_of_samples` Int64, `squared_mean` Float64, `p50_marker_heights` Array(Float64), `p50_marker_positions` Array(UInt64), `p50_desired_marker_positions` Array(Float64), `p90_marker_heights` Array(Float64), `p90_marker_positions` Array(UInt64), `p90_desired_marker_positions` Array(Float64), `p99_marker_heights` Array(Float64), `p99_marker_positions` Array(UInt64), `p99_desired_marker_positions` Array(Float64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_histogrami64 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `bins` Array(Int64), `counts` Array(UInt64), `min` Int64, `max` Int64, `sum_of_samples` Int64, `squared_mean` Float64, `p50_marker_heights` Array(Float64), `p50_marker_positions` Array(UInt64), `p50_desired_marker_positions` Array(Float64), `p90_marker_heights` Array(Float64), `p90_marker_positions` Array(UInt64), `p90_desired_marker_positions` Array(Float64), `p99_marker_heights` Array(Float64), `p99_marker_positions` Array(UInt64), `p99_desired_marker_positions` Array(Float64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_histogrami8 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `bins` Array(Int8), `counts` Array(UInt64), `min` Int8, `max` Int8, `sum_of_samples` Int64, `squared_mean` Float64, `p50_marker_heights` Array(Float64), `p50_marker_positions` Array(UInt64), `p50_desired_marker_positions` Array(Float64), `p90_marker_heights` Array(Float64), `p90_marker_positions` Array(UInt64), `p90_desired_marker_positions` Array(Float64), `p99_marker_heights` Array(Float64), `p99_marker_positions` Array(UInt64), `p99_desired_marker_positions` Array(Float64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_histogramu16 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `bins` Array(UInt16), `counts` Array(UInt64), `min` UInt16, `max` UInt16, `sum_of_samples` Int64, `squared_mean` Float64, `p50_marker_heights` Array(Float64), `p50_marker_positions` Array(UInt64), `p50_desired_marker_positions` Array(Float64), `p90_marker_heights` Array(Float64), `p90_marker_positions` Array(UInt64), `p90_desired_marker_positions` Array(Float64), `p99_marker_heights` Array(Float64), `p99_marker_positions` Array(UInt64), `p99_desired_marker_positions` Array(Float64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_histogramu32 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `bins` Array(UInt32), `counts` Array(UInt64), `min` UInt32, `max` UInt32, `sum_of_samples` Int64, `squared_mean` Float64, `p50_marker_heights` Array(Float64), `p50_marker_positions` Array(UInt64), `p50_desired_marker_positions` Array(Float64), `p90_marker_heights` Array(Float64), `p90_marker_positions` Array(UInt64), `p90_desired_marker_positions` Array(Float64), `p99_marker_heights` Array(Float64), `p99_marker_positions` Array(UInt64), `p99_desired_marker_positions` Array(Float64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_histogramu64 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `bins` Array(UInt64), `counts` Array(UInt64), `min` UInt64, `max` UInt64, `sum_of_samples` Int64, `squared_mean` Float64, `p50_marker_heights` Array(Float64), `p50_marker_positions` Array(UInt64), `p50_desired_marker_positions` Array(Float64), `p90_marker_heights` Array(Float64), `p90_marker_positions` Array(UInt64), `p90_desired_marker_positions` Array(Float64), `p99_marker_heights` Array(Float64), `p99_marker_positions` Array(UInt64), `p99_desired_marker_positions` Array(Float64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_histogramu8 (`timeseries_name` String, `timeseries_key` UInt64, `start_time` DateTime64(9, 'UTC'), `timestamp` DateTime64(9, 'UTC'), `bins` Array(UInt8), `counts` Array(UInt64), `min` UInt8, `max` UInt8, `sum_of_samples` Int64, `squared_mean` Float64, `p50_marker_heights` Array(Float64), `p50_marker_positions` Array(UInt64), `p50_desired_marker_positions` Array(Float64), `p90_marker_heights` Array(Float64), `p90_marker_positions` Array(UInt64), `p90_desired_marker_positions` Array(Float64), `p99_marker_heights` Array(Float64), `p99_marker_positions` Array(UInt64), `p99_desired_marker_positions` Array(Float64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, start_time, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_i16 (`timeseries_name` String, `timeseries_key` UInt64, `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(Int16)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_i32 (`timeseries_name` String, `timeseries_key` UInt64, `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(Int32)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_i64 (`timeseries_name` String, `timeseries_key` UInt64, `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(Int64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_i8 (`timeseries_name` String, `timeseries_key` UInt64, `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(Int8)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_string (`timeseries_name` String, `timeseries_key` UInt64, `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(String)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_u16 (`timeseries_name` String, `timeseries_key` UInt64, `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(UInt16)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_u32 (`timeseries_name` String, `timeseries_key` UInt64, `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(UInt32)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_u64 (`timeseries_name` String, `timeseries_key` UInt64, `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(UInt64)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.measurements_u8 (`timeseries_name` String, `timeseries_key` UInt64, `timestamp` DateTime64(9, 'UTC'), `datum` Nullable(UInt8)) ENGINE = MergeTree ORDER BY (timeseries_name, timeseries_key, timestamp) TTL toDateTime(timestamp) + toIntervalDay(3) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.timeseries_schema (`timeseries_name` String, `fields.name` Array(String), `fields.type` Array(Enum8('Bool' = 1, 'I64' = 2, 'IpAddr' = 3, 'String' = 4, 'Uuid' = 6, 'I8' = 7, 'U8' = 8, 'I16' = 9, 'U16' = 10, 'I32' = 11, 'U32' = 12, 'U64' = 13)), `fields.source` Array(Enum8('Target' = 1, 'Metric' = 2)), `datum_type` Enum8('Bool' = 1, 'I64' = 2, 'F64' = 3, 'String' = 4, 'Bytes' = 5, 'CumulativeI64' = 6, 'CumulativeF64' = 7, 'HistogramI64' = 8, 'HistogramF64' = 9, 'I8' = 10, 'U8' = 11, 'I16' = 12, 'U16' = 13, 'I32' = 14, 'U32' = 15, 'U64' = 16, 'F32' = 17, 'CumulativeU64' = 18, 'CumulativeF32' = 19, 'HistogramI8' = 20, 'HistogramU8' = 21, 'HistogramI16' = 22, 'HistogramU16' = 23, 'HistogramI32' = 24, 'HistogramU32' = 25, 'HistogramU64' = 26, 'HistogramF32' = 27), `created` DateTime64(9, 'UTC')) ENGINE = MergeTree ORDER BY (timeseries_name, fields.name) SETTINGS index_granularity = 8192 │
│ CREATE TABLE oximeter.version (`value` UInt64, `timestamp` DateTime64(9, 'UTC')) ENGINE = MergeTree ORDER BY (value, timestamp) SETTINGS index_granularity = 8192                                                                                          │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

41 rows in set. Elapsed: 0.007 sec.

Although I'm going to remove it in this PR, the per-timeseries data is also there. We might want to resurrect it, assuming we can find a reasonably accurate way to compute it:

root@oxz_switch1:~# omdb clickhouse-admin oximeter-usage
note: clickhouse-admin URL not specified.  Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using clickhouse-admin URL http://[fd65:fbff:d89e:102::5]:8888
  Last usage
     Started: 2026-05-04T19:18:35.895Z
   Completed: 2026-05-04T19:18:36.552Z

TIMESERIES_NAME                                  N_BYTES    N_SAMPLES
collection_target:cpus_provisioned               440 B      10
collection_target:ram_provisioned                440 B      10
collection_target:virtual_disk_space_provisioned 440 B      10
database_transaction:retry_data                  21 B       1
ddm_router:originated_tunnel_endpoints           78.52 KiB  13400
ddm_router:originated_underlay_prefixes          78.52 KiB  13400
ddm_session:advertisements_received              1.08 MiB   161800
ddm_session:advertisements_sent                  1.08 MiB   161800
ddm_session:imported_tunnel_endpoints            948.05 KiB 161800
ddm_session:imported_underlay_prefixes           948.05 KiB 161800
ddm_session:peer_address_changes                 1.08 MiB   161800
ddm_session:peer_expirations                     1.08 MiB   161800
ddm_session:peer_sessions_established            1.08 MiB   161800
ddm_session:solicitations_received               1.08 MiB   161800
ddm_session:solicitations_sent                   1.08 MiB   161800
ddm_session:update_send_fail                     1.08 MiB   161800
ddm_session:updates_received                     1.08 MiB   161800
ddm_session:updates_sent                         1.08 MiB   161800
dendrite:sample_collection_duration              3.51 KiB   171
hardware_component:amd_cpu_tctl                  155.87 KiB 19951
hardware_component:current                       7.32 MiB   958917
hardware_component:fan_speed                     1.53 MiB   200026
hardware_component:power                         857.78 KiB 109796
hardware_component:sensor_error_count            1.47 MiB   220175
hardware_component:temperature                   8.65 MiB   1133891
hardware_component:voltage                       7.32 MiB   958918
http_service:request_latency_histogram           1.22 MiB   13087
kstat_sampler:expired_targets                    63 B       9
kstat_sampler:samples_dropped                    18.38 KiB  2688
management_network_data_link:bytes_received      143.43 KiB 20982
management_network_data_link:bytes_sent          143.43 KiB 20982
management_network_data_link:errors_received     143.43 KiB 20982
management_network_data_link:errors_sent         143.43 KiB 20982
management_network_data_link:packets_received    143.43 KiB 20982
management_network_data_link:packets_sent        143.43 KiB 20982
mg_lower:routes_blocked_by_link_state            25.84 KiB  4410
oximeter_collector:collections                   5.00 KiB   732
oximeter_collector:failed_collections            42 B       6
sled_cpu:cpu_nsec                                1.01 MiB   151000
sled_data_link:bytes_received                    135.06 KiB 19758
sled_data_link:bytes_sent                        135.06 KiB 19758
sled_data_link:errors_received                   135.06 KiB 19758
sled_data_link:errors_sent                       135.06 KiB 19758
sled_data_link:packets_received                  135.06 KiB 19758
sled_data_link:packets_sent                      135.06 KiB 19758
static_routing_config:static_nexthops            30.15 KiB  4410
static_routing_config:static_routes              30.15 KiB  4410
switch_data_link:bytes_received                  39.74 KiB  5814
switch_data_link:bytes_sent                      39.74 KiB  5814
switch_data_link:errors_received                 39.74 KiB  5814
switch_data_link:errors_sent                     39.74 KiB  5814
switch_data_link:fec_corrected_blocks            39.74 KiB  5814
switch_data_link:fec_high_symbol_errors          34.07 KiB  5814
switch_data_link:fec_symbol_errors               317.95 KiB 46512
switch_data_link:fec_sync_aligned                34.07 KiB  5814
switch_data_link:fec_uncorrected_blocks          39.74 KiB  5814
switch_data_link:link_enabled                    34.07 KiB  5814
switch_data_link:link_fsm                        675.65 KiB 98838
switch_data_link:link_up                         34.07 KiB  5814
switch_data_link:packets_received                39.74 KiB  5814
switch_data_link:packets_sent                    39.74 KiB  5814
switch_data_link:pcs_bad_sync_headers            39.74 KiB  5814
switch_data_link:pcs_block_lock_loss             39.74 KiB  5814
switch_data_link:pcs_errored_blocks              39.74 KiB  5814
switch_data_link:pcs_high_ber                    39.74 KiB  5814
switch_data_link:pcs_invalid_errors              39.74 KiB  5814
switch_data_link:pcs_sync_loss                   39.74 KiB  5814
switch_data_link:pcs_unknown_errors              39.74 KiB  5814
switch_data_link:pcs_valid_errors                39.74 KiB  5814
switch_data_link:receive_buffer_full_drops       39.74 KiB  5814
switch_data_link:receive_crc_error_drops         39.74 KiB  5814
switch_port_control_data_link:bytes_received     128.72 KiB 18830
switch_port_control_data_link:bytes_sent         128.72 KiB 18830
switch_port_control_data_link:errors_received    128.72 KiB 18830
switch_port_control_data_link:errors_sent        128.72 KiB 18830
switch_port_control_data_link:packets_received   128.72 KiB 18830
switch_port_control_data_link:packets_sent       128.72 KiB 18830
switch_rib:active_routes                         25.84 KiB  4410
switch_table:capacity                            15.03 KiB  2565
switch_table:collisions                          17.53 KiB  2565
switch_table:delete_misses                       17.53 KiB  2565
switch_table:deletes                             17.53 KiB  2565
switch_table:exhaustion                          17.53 KiB  2565
switch_table:inserts                             17.53 KiB  2565
switch_table:occupancy                           15.03 KiB  2565
switch_table:update_misses                       17.53 KiB  2565
switch_table:updates                             17.53 KiB  2565
zone:cpu_nsec                                    318.58 KiB 46604
  Last error
    None

- Remove oximeter usage computation, types, and API
- Add -w flag to `omdb` subcommand for setting retention
@bnaecker bnaecker marked this pull request as ready for review May 4, 2026 19:55
@bnaecker
Copy link
Copy Markdown
Collaborator Author

bnaecker commented May 4, 2026

Alright, I've removed all the code for computing the usage by timeseries. We'll flesh that out later if we need it. This is ready for proper review.

Copy link
Copy Markdown
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a bunch of nitpicks and questions from me; feel free to take or leave as you see fit. I know there's urgency here.

Comment thread clickhouse-admin/types/versions/src/lib.rs Outdated
Comment thread dev-tools/omdb/src/bin/omdb/clickhouse_admin.rs Outdated
Comment thread dev-tools/omdb/src/bin/omdb/clickhouse_admin.rs Outdated
Comment thread dev-tools/omdb/src/bin/omdb/clickhouse_admin.rs
Comment thread clickhouse-admin/src/context.rs Outdated
);

// Jump forward until we actually do compute the usage again.
tokio::time::pause();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does pausing time in this tokio runtime interact with the ClickHouseDeployment we spawned above? Is it also affected by this pause?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is affected by the pause, since they're using the same test runtime. As for what actually happens, I'm not sure. I could avoid all this shittiness by defining the update interval to be smaller during tests. That would make them take a few seconds, but never have to worry about this particular wart.

Comment thread oximeter/db/src/client/mod.rs Outdated
Comment thread oximeter/db/src/client/mod.rs Outdated
Comment thread oximeter/db/src/client/mod.rs
Comment thread oximeter/db/src/client/mod.rs Outdated
- wait_for_condition over sleep
- record errors setting retention policy and keep trying other tables
- expectorate tests for brittle string matching
- rename oximeter client for clarity
- allow replicated for omdb tooling
@bnaecker
Copy link
Copy Markdown
Collaborator Author

bnaecker commented May 5, 2026

I ran this one more time on dublin to test out the small changes I made to the API validation and formatting. Things look good to me:

root@oxz_switch0:~# omdb -w clickhouse-admin set-retention-policy --days 10
note: clickhouse-admin URL not specified.  Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using clickhouse-admin URL http://[fd84:8213:9724:102::5]:8888
root@oxz_switch0:~# omdb clickhouse-admin retention-policy
note: clickhouse-admin URL not specified.  Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using clickhouse-admin URL http://[fd84:8213:9724:102::5]:8888
Retention: 10 days
root@oxz_switch0:~# omdb -w clickhouse-admin set-retention-policy --days 100
note: clickhouse-admin URL not specified.  Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using clickhouse-admin URL http://[fd84:8213:9724:102::5]:8888
Error: setting retention policy

Caused by:
    Error Response: status: 400 Bad Request; headers: {"content-type": "application/json", "x-request-id": "8340825b-c1f5-49d7-ac02-7d3605664deb", "content-length": "161", "date": "Tue, 05 May 2026 01:07:49 GMT"}; value: Error { error_code: None, message: "unable to parse JSON body: days: days must be in the range [1, 30] at line 1 column 12", request_id: "8340825b-c1f5-49d7-ac02-7d3605664deb" }
root@oxz_switch0:~# omdb clickhouse-admin database-usage
note: clickhouse-admin URL not specified.  Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using clickhouse-admin URL http://[fd84:8213:9724:102::5]:8888

Last usage
   Started: 2026-05-05T01:07:59.599Z
 Completed: 2026-05-05T01:07:59.602Z

TABLE_NAME                          N_BYTES    N_ROWS
oximeter.fields_bool                0 B        0
oximeter.fields_i16                 50.84 KiB  8196
oximeter.fields_i32                 0 B        0
oximeter.fields_i64                 0 B        0
oximeter.fields_i8                  0 B        0
oximeter.fields_ipaddr              1.51 KiB   52
oximeter.fields_string              284.55 KiB 63957
oximeter.fields_u16                 38.65 KiB  4496
oximeter.fields_u32                 113.19 KiB 15984
oximeter.fields_u64                 1.99 KiB   12
oximeter.fields_u8                  51.28 KiB  7838
oximeter.fields_uuid                134.13 KiB 23855
oximeter.measurements_bool          85.62 KiB  12240
oximeter.measurements_bytes         0 B        0
oximeter.measurements_cumulativef32 0 B        0
oximeter.measurements_cumulativef64 0 B        0
oximeter.measurements_cumulativei64 0 B        0
oximeter.measurements_cumulativeu64 6.33 MiB   962199
oximeter.measurements_f32           8.39 MiB   1076829
oximeter.measurements_f64           1.95 KiB   92
oximeter.measurements_histogramf32  0 B        0
oximeter.measurements_histogramf64  0 B        0
oximeter.measurements_histogrami16  0 B        0
oximeter.measurements_histogrami32  0 B        0
oximeter.measurements_histogrami64  0 B        0
oximeter.measurements_histogrami8   0 B        0
oximeter.measurements_histogramu16  0 B        0
oximeter.measurements_histogramu32  0 B        0
oximeter.measurements_histogramu64  386.00 KiB 4017
oximeter.measurements_histogramu8   0 B        0
oximeter.measurements_i16           0 B        0
oximeter.measurements_i32           0 B        0
oximeter.measurements_i64           1.04 KiB   45
oximeter.measurements_i8            0 B        0
oximeter.measurements_string        0 B        0
oximeter.measurements_u16           0 B        0
oximeter.measurements_u32           0 B        0
oximeter.measurements_u64           783.84 KiB 124978
oximeter.measurements_u8            0 B        0
oximeter.timeseries_schema          4.74 KiB   88
oximeter.version                    187 B      1
system.asynchronous_metric_log      33.04 KiB  53022
system.metric_log                   216.93 KiB 836
system.query_log                    1.23 MiB   12426

Last error
  None

@bnaecker bnaecker merged commit 61a6d60 into main May 5, 2026
17 checks passed
@bnaecker bnaecker deleted the ben/add-clickhouse-retention-policy branch May 5, 2026 01:15
iliana pushed a commit that referenced this pull request May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants