Skip to content

feat(core): add _ts system column#2067

Merged
leoyvens merged 3 commits intomainfrom
leo/ts-system-column-rebased
Apr 2, 2026
Merged

feat(core): add _ts system column#2067
leoyvens merged 3 commits intomainfrom
leo/ts-system-column-rebased

Conversation

@leoyvens
Copy link
Copy Markdown
Collaborator

@leoyvens leoyvens commented Apr 1, 2026

Addresses #2048.

This PR implements a _ts built-in and auto-propagated timestamp column, just like we do for _block_num. The two columns are abstracted by WatermarkColumn.

The largest concern with this PR is backwards compatibility, to not cause immediate schema mismatch errors on existing deployments. So we make sure the _ts column is only included when it is available at the source and requested by the consumer of the query.

Remaining piece of #2048 is implementing a temporary hack to recognize the timestamp EVM column as equivalent to _ts, so we don't have to resync raw datasets to use the feature.

@leoyvens leoyvens requested a review from Theodus April 1, 2026 16:51
@leoyvens leoyvens force-pushed the leo/ts-system-column-rebased branch 6 times, most recently from aaaa611 to cb46136 Compare April 1, 2026 18:50
@leoyvens leoyvens marked this pull request as ready for review April 1, 2026 18:55
Copy link
Copy Markdown
Member

@Theodus Theodus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some suggestions.

Comment on lines +159 to +173
if rows
.schema()
.column_with_name(RESERVED_TS_COLUMN_NAME)
.is_some()
{
let ts_col = rows
.column_by_name(RESERVED_TS_COLUMN_NAME)
.ok_or(CheckInvariantsError::MissingTsColumn)?;
if !matches!(
ts_col.data_type(),
DataType::Timestamp(TimeUnit::Nanosecond, Some(tz)) if tz.as_ref() == "+00:00"
) {
return Err(CheckInvariantsError::InvalidTsColumnType);
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if rows
.schema()
.column_with_name(RESERVED_TS_COLUMN_NAME)
.is_some()
{
let ts_col = rows
.column_by_name(RESERVED_TS_COLUMN_NAME)
.ok_or(CheckInvariantsError::MissingTsColumn)?;
if !matches!(
ts_col.data_type(),
DataType::Timestamp(TimeUnit::Nanosecond, Some(tz)) if tz.as_ref() == "+00:00"
) {
return Err(CheckInvariantsError::InvalidTsColumnType);
}
}
if let Some(ts_col) = rows.column_by_name(RESERVED_TS_COLUMN_NAME)
&& !matches!(
ts_col.data_type(),
DataType::Timestamp(TimeUnit::Nanosecond, Some(tz)) if tz.as_ref() == "+00:00"
)
{
return Err(CheckInvariantsError::InvalidTsColumnType);
}

Comment on lines +213 to +216
/// Required `_ts` column is missing from the record batch
#[error("missing _ts column")]
MissingTsColumn,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error variant is dead code if the above suggestion is applied.

Suggested change
/// Required `_ts` column is missing from the record batch
#[error("missing _ts column")]
MissingTsColumn,

.parent()
.unwrap()
.join("tests/config/manifests/eth_rpc.json");
.join("tests/config/manifests/eth_rpc_generated.json");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we've added these _generated.json files without removing the old ones. Is this intentional?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The situation is:

  • We have tests for the output of ampctl gen and we want new manifests generated with the _ts column.
  • I still wanted to keep all tests on old manifests to be sure this would not break existing datasets that do not have _ts.

So we needed two manifests, at least for now.

};

/// A planning-time sentinel UDF that gets replaced with the appropriate `_ts`
/// expression during `SystemColumnPropagator::f_up`. Panics if it reaches execution.
Copy link
Copy Markdown
Member

@Theodus Theodus Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an inconsistency here between the name SystemColumnPropagator and WatermarkColumnPropagator. Might be worth doing a pass to consolidate on one term (system column vs watermark column).

Comment on lines +56 to +64
.iter()
.copied()
.filter(|wm| {
let name = wm.column_name();
schemas
.iter()
.all(|s| s.fields().iter().any(|f| f.name() == name))
})
.collect()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably copy after the filter

Suggested change
.iter()
.copied()
.filter(|wm| {
let name = wm.column_name();
schemas
.iter()
.all(|s| s.fields().iter().any(|f| f.name() == name))
})
.collect()
.iter()
.filter(|wm| {
let name = wm.column_name();
schemas
.iter()
.all(|s| s.fields().iter().any(|f| f.name() == name))
})
.copied()
.collect()

@leoyvens leoyvens force-pushed the leo/ts-system-column-rebased branch from cb46136 to 0dbe2f3 Compare April 2, 2026 19:07
@leoyvens
Copy link
Copy Markdown
Collaborator Author

leoyvens commented Apr 2, 2026

@Theodus thanks for the review, I've addressed your comments

@leoyvens leoyvens merged commit 6183218 into main Apr 2, 2026
8 checks passed
@leoyvens leoyvens deleted the leo/ts-system-column-rebased branch April 2, 2026 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants