feat: compute deterministic timeseries_id column at ingest#6286
feat: compute deterministic timeseries_id column at ingest#6286
Conversation
1fba6db to
9ac8674
Compare
9ac8674 to
60d859c
Compare
60d859c to
9522326
Compare
9e5c6ef to
cc4492e
Compare
cc4492e to
946c229
Compare
9522326 to
b0344ba
Compare
alanfgates
left a comment
There was a problem hiding this comment.
One question on metadata changes, other than that lgtm.
| WHERE | ||
| index_uid = $1 | ||
| AND split_id = ANY($2) | ||
| FOR UPDATE |
There was a problem hiding this comment.
Adding FOR UPDATE means your locking these rows. Is that what you want? I'm not clear why this change requires the addition of this lock.
There was a problem hiding this comment.
FWIW, Claude likes this addition, saying:
The addition of FOR UPDATE in the CTE subquery correctly prevents a TOCTOU race where concurrent mark_metrics_splits_for_deletion could change split state between the CTE read and the DELETE. This matches the established pattern in the non-metrics delete_splits at line 1076-1083 of the same file. Good fix.
| //! implementation for `str` writes `bytes ++ [0xFF]` and for `u8` writes | ||
| //! `[byte]`; this has been stable since Rust 1.0. A pinned stability test | ||
| //! (`test_hash_stability_pinned`) will catch any regression. | ||
|
|
There was a problem hiding this comment.
Claude comments:
The stability contract relies on Rust's Hash trait implementation for str (writes bytes ++ [0xFF]) and u8 (writes [byte]). While this has been stable since Rust 1.0 and the pinned tests would catch a regression, this is not a guaranteed API contract — it's an implementation detail of the standard library. A Rust toolchain update could theoretically change it.
Recommendation: Consider hashing raw bytes directly (e.g., hasher.write(metric_name.as_bytes()); hasher.write(&[0xFF]);) rather than going through the Hash trait. This would make the stability explicit in the code rather than relying on an undocumented trait implementation. The pinned tests are a good safety net but catch the problem only after a build — explicit byte feeding prevents it.
It's not clear to me if the added safety here is worth the effort, especially considering a change to the hash implementation for str would probably break a lot of things.
There was a problem hiding this comment.
im going to take Claude's comment here
| metric_type_builder.append_value(dp.metric_type as u8); | ||
| timestamp_secs_builder.append_value(dp.timestamp_secs); | ||
| value_builder.append_value(dp.value); | ||
| timeseries_id_builder.append_value(compute_timeseries_id( |
There was a problem hiding this comment.
Comment from Claude:
compute_timeseries_id is called per-row and sorts the tags each time. For batches where most rows share the same tag key set, this is redundant work. For typical batch sizes this is unlikely to be a bottleneck, but worth being aware of for large batches.
This seems valid, especially since we're storing one value per row, as compaction proceeds we'll be getting identical tag sets next to each other. Maybe it's too early to optimize this, but leaving a TODO noting we could do it later seems good.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b0344baed2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fields.push(Field::new("timeseries_id", DataType::Int64, false)); | ||
|
|
||
| for &tag_key in &sorted_tag_keys { | ||
| fields.push(Field::new( |
There was a problem hiding this comment.
Filter reserved
timeseries_id tag before schema expansion
Adding the fixed timeseries_id field here without reserving that key in tag ingestion allows an input attribute named timeseries_id to be added again in the dynamic tag loop, producing duplicate column names in one RecordBatch. This path is reachable because create_number_data_point only strips REQUIRED_FIELDS (metric_name, metric_type, timestamp_secs, value), not timeseries_id. With duplicate names, downstream schema lookups resolve only one match, so one of the two timeseries_id columns is effectively shadowed/ambiguous and query behavior becomes incorrect for those inputs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
not impossible that a user would want to submit a timeseries_id tag, we'll probably want to deal with this at some point. i also don't think we want to explicitly reserve the timeseries_id tag. adding a TODO
Add a timeseries_id column (Int64) to the metrics Arrow batch, computed as a SipHash-2-4 of the series identity columns (metric_name, metric_type, and all tags excluding temporal/value columns). The hash uses fixed keys for cross-process determinism. The column is already declared in the metrics default sort schema (between host and timestamp_secs), so the parquet writer now automatically sorts by it and places it in the correct physical position. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The timeseries_id hash is persisted to Parquet files — any change
silently corrupts compaction and queries. Add:
- 3 pinned stability tests with hardcoded expected hash values
- 3 proptest properties (order independence, excluded tag immunity,
extra-tag discrimination) each running 256 random cases
- Boundary ambiguity test ({"ab":"c"} vs {"a":"bc"})
- Same-series-different-timestamp invariant test
- All-excluded-tags coverage (every EXCLUDED_TAGS entry verified)
- Edge cases: empty strings, unicode, 100-tag cardinality
- Module-level doc explaining the stability contract
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mirror the CTE + FOR UPDATE pattern from delete_splits to prevent stale-state races. Without row locking, a concurrent mark_metrics_splits_for_deletion can commit between the state read and the DELETE, causing spurious FailedPrecondition errors and retry churn. The new query locks the target rows before reading their state, reports not-deletable (Staged/Published) and not-found splits separately, and only deletes when all requested splits are in MarkedForDeletion state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
b0344ba to
f846e6c
Compare
|
commit history was messed up, i cherry-picked the 4 relevant commits on top of main and force pushed. just in case, i pushed the original history to the https://github.com/quickwit-oss/quickwit/tree/matthew.kim/gtt/sorted-series-column remote branch |
43f35da to
0a0adb5
Compare
0a0adb5 to
720f498
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 720f498323
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c8c0c100c6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Summary
timeseries_idcolumn (Int64) to the metrics Arrow batch, computed as a deterministic SipHash-2-4 of the series identity columnsmetric_name,metric_type, and all tags — excludes temporal columns (timestamp_secs,start_timestamp_secs,timestamp) and value columns (value, plus DDSketch components from [metrics] Support DDSketch in the parquet pipeline #6257:count,sum,min,max,flags,keys,counts)metric_name|service|env|datacenter|region|host|timeseries_id|timestamp_secs/V2), so the writer automatically sorts by it and places it in the correct physical positionTimeseriesIdvariant toParquetFieldenum and updatesSORT_ORDERDesign reference
Sorted Series Column for QW Parquet Pipeline — this PR implements the Timeseries ID component; the full Sorted Series composite key is a follow-up.
Test plan
compute_timeseries_id(determinism, exclusions, order independence, key/value non-interchangeability)quickwit-parquet-engineandquickwit-opentelemetrypass with updated column countsreorder_columnsnot introduced by this PR)🤖 Generated with Claude Code