Skip to content

Latest commit

History

History
101 lines (85 loc) 路 8.31 KB

README.md

File metadata and controls

101 lines (85 loc) 路 8.31 KB

Accelerated Dataset Retention Policy Quickstart

Step 1. Initialize and start Spice

spice init retention-quickstart
cd retention-quickstart

Step 2. Add a dataset with a retention policy by editing spicepod.yaml

version: v1beta1
kind: Spicepod
name: retention-quickstart
datasets:
  - from: s3://spiceai-demo-datasets/taxi_trips/2024/
    name: taxi_trips
    time_column: tpep_pickup_datetime
    params:
      file_format: parquet
    acceleration:
      enabled: true
      refresh_check_interval: 10m
      retention_check_enabled: true
      retention_check_interval: 60s
      retention_period: 35040h # 4 years, this will evict 5 rows of data from the dataset

Step 3. Run spice and see the retention policy in action

When dataset is being refreshed, the retention policy won't evict any data as 0 rows are loaded.

2024-04-22T04:17:24.374905Z  INFO runtime: Loaded dataset: taxi_trips
2024-04-22T04:17:24.375694Z  INFO runtime::accelerated_table: [retention] Evicting data for taxi_trips where tpep_pickup_datetime < 2020-04-23T04:17:24+00:00...
2024-04-22T04:17:24.376073Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:3000
2024-04-22T04:17:24.376110Z  INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
2024-04-22T04:17:24.376393Z  INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
2024-04-22T04:17:24.377709Z  INFO runtime::accelerated_table: Refreshing data for taxi_trips
2024-04-22T04:17:24.382154Z  INFO runtime::accelerated_table: [retention] Evicted 0 records for taxi_trips

Step 4. Run queries against the dataset using the Spice SQL REPL after the dataset is loaded and before the next retention check interval

spice sql

Welcome to the Spice.ai SQL REPL! Type 'help' for help.

show tables; -- list available tables
sql> select count(1) from taxi_trips;
+-----------------+
| COUNT(Int64(1)) |
+-----------------+
| 2964624         |
+-----------------+

Time: 0.012826375 seconds. 1 rows.
sql> select * from taxi_trips order by tpep_pickup_datetime limit 5;
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | Airport_fee |
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+
| 2        | 2002-12-31T22:59:39  | 2002-12-31T23:05:41   | 1               | 0.63          | 1          | N                  | 170          | 170          | 3            | 6.5         | 0.0   | 0.5     | 0.0        | 0.0          | 1.0                   | 10.5         | 2.5                  | 0.0         |
| 2        | 2002-12-31T22:59:39  | 2002-12-31T23:05:41   | 1               | 0.63          | 1          | N                  | 170          | 170          | 3            | -6.5        | 0.0   | -0.5    | 0.0        | 0.0          | -1.0                  | -10.5        | -2.5                 | 0.0         |
| 2        | 2009-01-01T00:24:09  | 2009-01-01T01:13:00   | 2               | 10.88         | 1          | N                  | 138          | 264          | 2            | 50.6        | 9.25  | 0.5     | 0.0        | 6.94         | 1.0                   | 68.29        | 0.0                  | 0.0         |
| 2        | 2009-01-01T23:30:39  | 2009-01-02T00:01:39   | 1               | 10.99         | 1          | N                  | 237          | 264          | 2            | 45.0        | 3.5   | 0.5     | 0.0        | 0.0          | 1.0                   | 50.0         | 0.0                  | 0.0         |
| 2        | 2009-01-01T23:58:40  | 2009-01-02T00:01:40   | 1               | 0.46          | 1          | N                  | 137          | 264          | 2            | 4.4         | 3.5   | 0.5     | 0.0        | 0.0          | 1.0                   | 9.4          | 0.0                  | 0.0         |
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+

Time: 0.053698917 seconds. 5 rows.

Step 5. Wait for the next retention check interval and see the retention policy evict data

2024-04-22T04:18:24.378312Z  INFO runtime::accelerated_table: [retention] Evicting data for taxi_trips where tpep_pickup_datetime < 2020-04-23T04:18:24+00:00...
2024-04-22T04:18:24.395165Z  INFO runtime::accelerated_table: [retention] Evicted 5 records for taxi_trips

Step 6. Run queries against the dataset using the Spice SQL REPL again to check the outdated data has been evicted

sql> select count(1) from taxi_trips;
+-----------------+
| COUNT(Int64(1)) |
+-----------------+
| 2964619         |
+-----------------+

Time: 0.008739667 seconds. 1 rows.
sql> select * from taxi_trips order by tpep_pickup_datetime limit 5;
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | Airport_fee |
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+
| 2        | 2023-12-31T23:39:17  | 2023-12-31T23:42:00   | 2               | 0.47          | 1          | N                  | 90           | 68           | 1            | 5.1         | 1.0   | 0.5     | 0.0        | 0.0          | 1.0                   | 10.1         | 2.5                  | 0.0         |
| 2        | 2023-12-31T23:41:02  | 2023-12-31T23:48:03   | 1               | 0.4           | 1          | N                  | 246          | 246          | 2            | 7.2         | 1.0   | 0.5     | 0.0        | 0.0          | 1.0                   | 12.2         | 2.5                  | 0.0         |
| 2        | 2023-12-31T23:47:28  | 2023-12-31T23:57:07   | 2               | 1.44          | 1          | N                  | 68           | 137          | 1            | 10.7        | 1.0   | 0.5     | 3.14       | 0.0          | 1.0                   | 18.84        | 2.5                  | 0.0         |
| 2        | 2023-12-31T23:49:12  | 2024-01-01T00:04:32   | 1               | 3.14          | 1          | N                  | 234          | 237          | 1            | 17.0        | 1.0   | 0.5     | 6.6        | 0.0          | 1.0                   | 28.6         | 2.5                  | 0.0         |
| 2        | 2023-12-31T23:54:27  | 2024-01-01T00:13:12   | 1               | 7.7           | 1          | N                  | 229          | 244          | 1            | 33.1        | 1.0   | 0.5     | 7.62       | 0.0          | 1.0                   | 45.72        | 2.5                  | 0.0         |
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+

Time: 0.050874208 seconds. 5 rows.