diff --git a/website/docs/features/data-acceleration/data-refresh.md b/website/docs/features/data-acceleration/data-refresh.md index db44c2066..1ad92aefc 100644 --- a/website/docs/features/data-acceleration/data-refresh.md +++ b/website/docs/features/data-acceleration/data-refresh.md @@ -58,6 +58,20 @@ datasets: If late arriving data or clock-skew needs to be accounted for, an optional overlap can also be specified. See [`acceleration.refresh_append_overlap`](/docs/reference/spicepod/datasets#accelerationrefresh_append_overlap). +Datasets that are partitioned by a less-granular time-column (e.g. day, month, year) can also use the `time_partition_column` parameter in addition to the `time_column` parameter to specify the time-column to use for efficient partition pruning. + +Example: + +```yaml +datasets: + - from: databricks:my_dataset + name: accelerated_dataset + time_column: created_at + time_format: iso8601 + time_partition_column: created_at_day + time_partition_format: date +``` + ### Changes (CDC) Datasets configured with acceleration `refresh_mode: changes` requires a [Change Data Capture (CDC)](/docs/features/cdc/index.md) supported data connector. Initial CDC support in Spice is supported by the [Debezium data connector](/docs/components/data-connectors/debezium.md). diff --git a/website/docs/reference/spicepod/datasets.md b/website/docs/reference/spicepod/datasets.md index f129f4624..10d3cece5 100644 --- a/website/docs/reference/spicepod/datasets.md +++ b/website/docs/reference/spicepod/datasets.md @@ -150,6 +150,7 @@ Optional. The format of the `time_column`. The following values are supported: - `unix_seconds` - Unix timestamp in seconds. E.g. `1718756687`. - `unix_millis` - Unix timestamp in milliseconds. E.g. `1718756687000`. - `ISO8601` - [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format. +- `date` - Date in YYYY-MM-DD format. E.g. `2024-01-01`. Spice emits a warning if the `time_column` from the data source is incompatible with the `time_format` config. @@ -159,6 +160,14 @@ Spice emits a warning if the `time_column` from the data source is incompatible ::: +## `time_partition_column` + +(Optional) Specify the column that represents the physical partitioning of the dataset when using append-based acceleration. When the defined `time_column` is a fine-grained timestamp and the dataset is physically partitioned by a coarser granularity (for example, by date), setting `time_partition_column` to the partition column (e.g. date_col) improves partition pruning, excludes irrelevant partitions during refreshes, and optimizes scan efficiency. + +## `time_partition_format` + +(Optional) Define the format of the `time_partition_column`. For instance, if the physical partitions follow a date format (YYYY-MM-DD), set this value to `date`. The same format options as `time_format` are supported for `time_partition_column`. + ## `unsupported_type_action` Optional. Specifies the action to take when a data type that is not supported by the data connector is encountered.