From a649cb3d9c7ad2029d3f4ed09a55a244c2a6081e Mon Sep 17 00:00:00 2001 From: jeadie Date: Thu, 18 Apr 2024 13:50:39 +1000 Subject: [PATCH 1/5] add data type compatibility docs --- spiceaidocs/docs/data-accelerators/index.md | 3 ++ spiceaidocs/docs/reference/datatypes.md | 55 +++++++++++++++++++++ 2 files changed, 58 insertions(+) create mode 100644 spiceaidocs/docs/reference/datatypes.md diff --git a/spiceaidocs/docs/data-accelerators/index.md b/spiceaidocs/docs/data-accelerators/index.md index b47f72ee..95303af9 100644 --- a/spiceaidocs/docs/data-accelerators/index.md +++ b/spiceaidocs/docs/data-accelerators/index.md @@ -33,6 +33,9 @@ Currently supported Data Accelerators include: | [`sqlite`](./sqlite.md) | Embedded SQLite | Alpha | `memory`, `file` | | [`postgres`](./postgres/index.md) | Attached PostgreSQL | Alpha | | +## Data types +Data accelerators may not support all possible Apache Arrow data types. For complete compatibility, see [specifications](../reference/datatypes.md). + ## Refresh SQL For datasets configured with a `full` refresh mode, this is an optional setting that filters the locally accelerated data to a smaller working set. This can be useful if your application/dashboard only ever uses a subset of the data stored in the federated table. diff --git a/spiceaidocs/docs/reference/datatypes.md b/spiceaidocs/docs/reference/datatypes.md new file mode 100644 index 00000000..373344d5 --- /dev/null +++ b/spiceaidocs/docs/reference/datatypes.md @@ -0,0 +1,55 @@ +--- +title: "Data Types" +sidebar_label: "Data Types" +pagination_prev: 'reference/index' +pagination_next: null +--- + +Spice adheres to Apache Arrow data [types](https://docs.rs/arrow/latest/arrow/datatypes/index.html). Data accelerators do no support all Arrow data types. The table below outlines the data type compatibility for each accelerator, and datatype used within the accelerator. + +| Datatype | Description | Arrow | DuckDB | SQLite | Postgres| +| --- | --- | --- | --- | --- | ---| +| na | A NULL type having no physical storage. | na | | | | +| bool | Boolean as 1 bit, LSB bit-packed ordering. | bool | | | | +| uint8 | Unsigned 8-bit little-endian integer. | uint8 | | | | +| int8 | Signed 8-bit little-endian integer. | int8 | | | | +| uint16 | Unsigned 16-bit little-endian integer. | uint16 | | | | +| int16 | Signed 16-bit little-endian integer. | int16 | | | | +| uint32 | Unsigned 32-bit little-endian integer. | uint32 | | | | +| int32 | Signed 32-bit little-endian integer. | int32 | | | | +| uint64 | Unsigned 64-bit little-endian integer. | uint64 | | | | +| int64 | Signed 64-bit little-endian integer. | int64 | | | | +| half_float | 2-byte floating point value | half_float | | | | +| float | 4-byte floating point value | float | | | | +| double | 8-byte floating point value | double | | | | +| string | UTF8 variable-length string as List | string | | | | +| binary | Variable-length bytes (no guarantee of UTF8-ness) | binary | | | | +| fixed_size_binary | Fixed-size binary. Each value occupies the same number of bytes. | fixed_size_binary | | | | +| date32 | int32_t days since the UNIX epoch | date32 | | | | +| date64 | int64_t milliseconds since the UNIX epoch | date64 | | | | +| timestamp | Exact timestamp encoded with int64 since UNIX epoch Default unit millisecond. | timestamp | | | | +| time32 | Time as signed 32-bit integer, representing either seconds or milliseconds since midnight. | time32 | | | | +| time64 | Time as signed 64-bit integer, representing either microseconds or nanoseconds since midnight. | time64 | | | | +| interval_months | YEAR_MONTH interval in SQL style. | interval_months | | | | +| interval_day_time | DAY_TIME interval in SQL style. | interval_day_time | | | | +| decimal128 | Precision- and scale-based decimal type with 128 bits. | decimal128 | | | | +| decimal | Defined for backward-compatibility. | decimal | | | | +| decimal256 | Precision- and scale-based decimal type with 256 bits. | decimal256 | | | | +| list | A list of some logical data type. | list | | | | +| struct | Struct of logical types. | struct | | | | +| sparse_union | Sparse unions of logical types. | sparse_union | | | | +| dense_union | Dense unions of logical types. | dense_union | | | | +| dictionary | Dictionary-encoded type, also called "categorical" or "factor" in other programming languages. | dictionary | | | | +| map | Map, a repeated struct logical type. | map | | | | +| extension | Custom data type, implemented by user. | extension | | | | +| fixed_size_list | Fixed size list of some logical type. | fixed_size_list | | | | +| duration | Measure of elapsed time in either seconds, milliseconds, microseconds or nanoseconds. | duration | | | | +| large_string | Like STRING, but with 64-bit offsets. | large_string | | | | +| large_binary | Like BINARY, but with 64-bit offsets. | large_binary | | | | +| large_list | Like LIST, but with 64-bit offsets. | large_list | | | | +| interval_month_day_nano | Calendar interval type with three fields. | interval_month_day_nano | | | | +| run_end_encoded | Run-end encoded data. | run_end_encoded | | | | +| string_view | String (UTF8) view type with 4-byte prefix and inline small string optimization. | string_view | | | | +| binary_view | Bytes view type with 4-byte prefix and inline small string optimization. | binary_view | | | | +| list_view | A list of some logical data type represented by offset and size. | list_view | | | | +| large_list_view | Like LIST_VIEW, but with 64-bit offsets and sizes. | large_list_view | | | | \ No newline at end of file From c36370825c42c5adc7414d77612f78edcb7ebb10 Mon Sep 17 00:00:00 2001 From: jeadie Date: Thu, 18 Apr 2024 13:57:27 +1000 Subject: [PATCH 2/5] escape --- spiceaidocs/docs/reference/datatypes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spiceaidocs/docs/reference/datatypes.md b/spiceaidocs/docs/reference/datatypes.md index 373344d5..829395dc 100644 --- a/spiceaidocs/docs/reference/datatypes.md +++ b/spiceaidocs/docs/reference/datatypes.md @@ -22,7 +22,7 @@ Spice adheres to Apache Arrow data [types](https://docs.rs/arrow/latest/arrow/da | half_float | 2-byte floating point value | half_float | | | | | float | 4-byte floating point value | float | | | | | double | 8-byte floating point value | double | | | | -| string | UTF8 variable-length string as List | string | | | | +| string | UTF8 variable-length string as List\ | string | | | | | binary | Variable-length bytes (no guarantee of UTF8-ness) | binary | | | | | fixed_size_binary | Fixed-size binary. Each value occupies the same number of bytes. | fixed_size_binary | | | | | date32 | int32_t days since the UNIX epoch | date32 | | | | From f8646c28a1bc46bfb4017d9c45ff1f6037dcaef1 Mon Sep 17 00:00:00 2001 From: jeadie Date: Thu, 18 Apr 2024 14:04:04 +1000 Subject: [PATCH 3/5] remove separate arrow type --- spiceaidocs/docs/reference/datatypes.md | 92 ++++++++++++------------- 1 file changed, 46 insertions(+), 46 deletions(-) diff --git a/spiceaidocs/docs/reference/datatypes.md b/spiceaidocs/docs/reference/datatypes.md index 829395dc..49af7c91 100644 --- a/spiceaidocs/docs/reference/datatypes.md +++ b/spiceaidocs/docs/reference/datatypes.md @@ -7,49 +7,49 @@ pagination_next: null Spice adheres to Apache Arrow data [types](https://docs.rs/arrow/latest/arrow/datatypes/index.html). Data accelerators do no support all Arrow data types. The table below outlines the data type compatibility for each accelerator, and datatype used within the accelerator. -| Datatype | Description | Arrow | DuckDB | SQLite | Postgres| -| --- | --- | --- | --- | --- | ---| -| na | A NULL type having no physical storage. | na | | | | -| bool | Boolean as 1 bit, LSB bit-packed ordering. | bool | | | | -| uint8 | Unsigned 8-bit little-endian integer. | uint8 | | | | -| int8 | Signed 8-bit little-endian integer. | int8 | | | | -| uint16 | Unsigned 16-bit little-endian integer. | uint16 | | | | -| int16 | Signed 16-bit little-endian integer. | int16 | | | | -| uint32 | Unsigned 32-bit little-endian integer. | uint32 | | | | -| int32 | Signed 32-bit little-endian integer. | int32 | | | | -| uint64 | Unsigned 64-bit little-endian integer. | uint64 | | | | -| int64 | Signed 64-bit little-endian integer. | int64 | | | | -| half_float | 2-byte floating point value | half_float | | | | -| float | 4-byte floating point value | float | | | | -| double | 8-byte floating point value | double | | | | -| string | UTF8 variable-length string as List\ | string | | | | -| binary | Variable-length bytes (no guarantee of UTF8-ness) | binary | | | | -| fixed_size_binary | Fixed-size binary. Each value occupies the same number of bytes. | fixed_size_binary | | | | -| date32 | int32_t days since the UNIX epoch | date32 | | | | -| date64 | int64_t milliseconds since the UNIX epoch | date64 | | | | -| timestamp | Exact timestamp encoded with int64 since UNIX epoch Default unit millisecond. | timestamp | | | | -| time32 | Time as signed 32-bit integer, representing either seconds or milliseconds since midnight. | time32 | | | | -| time64 | Time as signed 64-bit integer, representing either microseconds or nanoseconds since midnight. | time64 | | | | -| interval_months | YEAR_MONTH interval in SQL style. | interval_months | | | | -| interval_day_time | DAY_TIME interval in SQL style. | interval_day_time | | | | -| decimal128 | Precision- and scale-based decimal type with 128 bits. | decimal128 | | | | -| decimal | Defined for backward-compatibility. | decimal | | | | -| decimal256 | Precision- and scale-based decimal type with 256 bits. | decimal256 | | | | -| list | A list of some logical data type. | list | | | | -| struct | Struct of logical types. | struct | | | | -| sparse_union | Sparse unions of logical types. | sparse_union | | | | -| dense_union | Dense unions of logical types. | dense_union | | | | -| dictionary | Dictionary-encoded type, also called "categorical" or "factor" in other programming languages. | dictionary | | | | -| map | Map, a repeated struct logical type. | map | | | | -| extension | Custom data type, implemented by user. | extension | | | | -| fixed_size_list | Fixed size list of some logical type. | fixed_size_list | | | | -| duration | Measure of elapsed time in either seconds, milliseconds, microseconds or nanoseconds. | duration | | | | -| large_string | Like STRING, but with 64-bit offsets. | large_string | | | | -| large_binary | Like BINARY, but with 64-bit offsets. | large_binary | | | | -| large_list | Like LIST, but with 64-bit offsets. | large_list | | | | -| interval_month_day_nano | Calendar interval type with three fields. | interval_month_day_nano | | | | -| run_end_encoded | Run-end encoded data. | run_end_encoded | | | | -| string_view | String (UTF8) view type with 4-byte prefix and inline small string optimization. | string_view | | | | -| binary_view | Bytes view type with 4-byte prefix and inline small string optimization. | binary_view | | | | -| list_view | A list of some logical data type represented by offset and size. | list_view | | | | -| large_list_view | Like LIST_VIEW, but with 64-bit offsets and sizes. | large_list_view | | | | \ No newline at end of file +| Arrow Type | Description | DuckDB | SQLite | Postgres | +| ------------------- | ----------------------------------------------------- | ------ | ------ | -------- | +| na | A NULL type having no physical storage. | | | | +| bool | Boolean as 1 bit, LSB bit-packed ordering. | | | | +| uint8 | Unsigned 8-bit little-endian integer. | | | | +| int8 | Signed 8-bit little-endian integer. | | | | +| uint16 | Unsigned 16-bit little-endian integer. | | | | +| int16 | Signed 16-bit little-endian integer. | | | | +| uint32 | Unsigned 32-bit little-endian integer. | | | | +| int32 | Signed 32-bit little-endian integer. | | | | +| uint64 | Unsigned 64-bit little-endian integer. | | | | +| int64 | Signed 64-bit little-endian integer. | | | | +| half_float | 2-byte floating point value | | | | +| float | 4-byte floating point value | | | | +| double | 8-byte floating point value | | | | +| string | UTF8 variable-length string as List\ | | | | +| binary | Variable-length bytes (no guarantee of UTF8-ness) | | | | +| fixed_size_binary | Fixed-size binary. Each value occupies the same number of bytes. | | | | +| date32 | int32_t days since the UNIX epoch | | | | +| date64 | int64_t milliseconds since the UNIX epoch | | | | +| timestamp | Exact timestamp encoded with int64 since UNIX epoch Default unit millisecond. | | | | +| time32 | Time as signed 32-bit integer, representing either seconds or milliseconds since midnight. | | | | +| time64 | Time as signed 64-bit integer, representing either microseconds or nanoseconds since midnight. | | | | +| interval_months | YEAR_MONTH interval in SQL style. | | | | +| interval_day_time | DAY_TIME interval in SQL style. | | | | +| decimal128 | Precision- and scale-based decimal type with 128 bits. | | | | +| decimal | Defined for backward-compatibility. | | | | +| decimal256 | Precision- and scale-based decimal type with 256 bits. | | | | +| list | A list of some logical data type. | | | | +| struct | Struct of logical types. | | | | +| sparse_union | Sparse unions of logical types. | | | | +| dense_union | Dense unions of logical types. | | | | +| dictionary | Dictionary-encoded type, also called "categorical" or "factor" in other programming languages. | | | | +| map | Map, a repeated struct logical type. | | | | +| extension | Custom data type, implemented by user. | | | | +| fixed_size_list | Fixed size list of some logical type. | | | | +| duration | Measure of elapsed time in either seconds, milliseconds, microseconds or nanoseconds. | | | | +| large_string | Like STRING, but with 64-bit offsets. | | | | +| large_binary | Like BINARY, but with 64-bit offsets. | | | | +| large_list | Like LIST, but with 64-bit offsets. | | | | +| interval_month_day_nano | Calendar interval type with three fields. | | | | +| run_end_encoded | Run-end encoded data. | | | | +| string_view | String (UTF8) view type with 4-byte prefix and inline small string optimization. | | | | +| binary_view | Bytes view type with 4-byte prefix and inline small string optimization. | | | | +| list_view | A list of some logical data type represented by offset and size. | | | | +| large_list_view | Like LIST_VIEW, but with 64-bit offsets and sizes. | | | | From 72b02967670faf99cebfbd15efe58cdd59a21cda Mon Sep 17 00:00:00 2001 From: jeadie Date: Mon, 22 Apr 2024 10:36:13 +1000 Subject: [PATCH 4/5] add sqlite, postgres, mysql to data types --- spiceaidocs/docs/reference/datatypes.md | 95 +++++++++++++------------ 1 file changed, 49 insertions(+), 46 deletions(-) diff --git a/spiceaidocs/docs/reference/datatypes.md b/spiceaidocs/docs/reference/datatypes.md index 49af7c91..2118aa70 100644 --- a/spiceaidocs/docs/reference/datatypes.md +++ b/spiceaidocs/docs/reference/datatypes.md @@ -7,49 +7,52 @@ pagination_next: null Spice adheres to Apache Arrow data [types](https://docs.rs/arrow/latest/arrow/datatypes/index.html). Data accelerators do no support all Arrow data types. The table below outlines the data type compatibility for each accelerator, and datatype used within the accelerator. -| Arrow Type | Description | DuckDB | SQLite | Postgres | -| ------------------- | ----------------------------------------------------- | ------ | ------ | -------- | -| na | A NULL type having no physical storage. | | | | -| bool | Boolean as 1 bit, LSB bit-packed ordering. | | | | -| uint8 | Unsigned 8-bit little-endian integer. | | | | -| int8 | Signed 8-bit little-endian integer. | | | | -| uint16 | Unsigned 16-bit little-endian integer. | | | | -| int16 | Signed 16-bit little-endian integer. | | | | -| uint32 | Unsigned 32-bit little-endian integer. | | | | -| int32 | Signed 32-bit little-endian integer. | | | | -| uint64 | Unsigned 64-bit little-endian integer. | | | | -| int64 | Signed 64-bit little-endian integer. | | | | -| half_float | 2-byte floating point value | | | | -| float | 4-byte floating point value | | | | -| double | 8-byte floating point value | | | | -| string | UTF8 variable-length string as List\ | | | | -| binary | Variable-length bytes (no guarantee of UTF8-ness) | | | | -| fixed_size_binary | Fixed-size binary. Each value occupies the same number of bytes. | | | | -| date32 | int32_t days since the UNIX epoch | | | | -| date64 | int64_t milliseconds since the UNIX epoch | | | | -| timestamp | Exact timestamp encoded with int64 since UNIX epoch Default unit millisecond. | | | | -| time32 | Time as signed 32-bit integer, representing either seconds or milliseconds since midnight. | | | | -| time64 | Time as signed 64-bit integer, representing either microseconds or nanoseconds since midnight. | | | | -| interval_months | YEAR_MONTH interval in SQL style. | | | | -| interval_day_time | DAY_TIME interval in SQL style. | | | | -| decimal128 | Precision- and scale-based decimal type with 128 bits. | | | | -| decimal | Defined for backward-compatibility. | | | | -| decimal256 | Precision- and scale-based decimal type with 256 bits. | | | | -| list | A list of some logical data type. | | | | -| struct | Struct of logical types. | | | | -| sparse_union | Sparse unions of logical types. | | | | -| dense_union | Dense unions of logical types. | | | | -| dictionary | Dictionary-encoded type, also called "categorical" or "factor" in other programming languages. | | | | -| map | Map, a repeated struct logical type. | | | | -| extension | Custom data type, implemented by user. | | | | -| fixed_size_list | Fixed size list of some logical type. | | | | -| duration | Measure of elapsed time in either seconds, milliseconds, microseconds or nanoseconds. | | | | -| large_string | Like STRING, but with 64-bit offsets. | | | | -| large_binary | Like BINARY, but with 64-bit offsets. | | | | -| large_list | Like LIST, but with 64-bit offsets. | | | | -| interval_month_day_nano | Calendar interval type with three fields. | | | | -| run_end_encoded | Run-end encoded data. | | | | -| string_view | String (UTF8) view type with 4-byte prefix and inline small string optimization. | | | | -| binary_view | Bytes view type with 4-byte prefix and inline small string optimization. | | | | -| list_view | A list of some logical data type represented by offset and size. | | | | -| large_list_view | Like LIST_VIEW, but with 64-bit offsets and sizes. | | | | +| Arrow Type | Description | [DuckDB](https://duckdb.org/docs/sql/data_types/overview) | [SQLite](https://sqlite.org/datatype3.html) | [Postgres](https://www.postgresql.org/docs/current/datatype.html#DATATYPE-TABLE) | +|--------------------------|------------------------------------------------------------------------------|-------------------------------|-------------------|--------------------| +| na | A NULL type having no physical storage. | | | | +| bool | Boolean as 1 bit, LSB bit-packed ordering. | `BOOLEAN` | `BOOL` | `BOOL` | +| uint8 | Unsigned 8-bit little-endian integer. | `TINYINT` | `TINYINT` | `SMALLINT` | +| int8 | Signed 8-bit little-endian integer. | `TINYINT` | `TINYINT` | `SMALLINT` | +| uint16 | Unsigned 16-bit little-endian integer. | `SMALLINT` | `SMALLINT` | `SMALLINT` | +| int16 | Signed 16-bit little-endian integer. | `SMALLINT` | `SMALLINT` | `SMALLINT` | +| uint32 | Unsigned 32-bit little-endian integer. | `INTEGER` | `INT` | `INTEGER` | +| int32 | Signed 32-bit little-endian integer. | `INTEGER` | `INT` | `INTEGER` | +| uint64 | Unsigned 64-bit little-endian integer. | `BIGINT` | `BIGINT` | `BIGINT` | +| int64 | Signed 64-bit little-endian integer. | `BIGINT` | `BIGINT` | `BIGINT` | +| half_float | 2-byte floating point value | | | | +| float | 4-byte floating point value | `FLOAT` | `FLOAT` | `REAL` | +| double | 8-byte floating point value | `DOUBLE` | `DOUBLE` | `DOUBLE PRECISION` | +| string | UTF8 variable-length string as List\ | `VARCHAR` | `TEXT` | `TEXT` | +| binary | Variable-length bytes (no guarantee of UTF8-ness) | | | | +| fixed_size_binary | Each value has equal bytes of binary. | | | | +| date32 | int32_t days since the UNIX epoch | `DATE` | `DATE` | | +| date64 | int64_t milliseconds since the UNIX epoch | `DATE` | `TIMESTAMP` | | +| timestamp | Exact timestamp encoded with int64 since UNIX epoch, seconds or milliseconds | `TIMESTAMP_S`, `TIMESTAMP_MS` | `TIMESTAMP` | | +| time32 | Time as signed 32-bit integer, seconds or milliseconds since midnight. | `TIME` | `TIME` | | +| time64 | Time as signed 64-bit integer, microseconds or nanoseconds since midnight. | `TIME` | `TIME` | | +| interval_months | YEAR_MONTH interval in SQL style. | | | | +| interval_day_time | DAY_TIME interval in SQL style. | | | | +| decimal128 | Precision- and scale-based decimal type with 128 bits. | `DOUBLE` | `DECIMAL(38, 10)` | `DECIMAL(38, 10)` | +| decimal | Defined for backward-compatibility. | | | | +| decimal256 | Precision- and scale-based decimal type with 256 bits. | | | | +| list | A list of some logical data type. | | | `TYPE[]` | +| struct | Struct of logical types. | | | | +| sparse_union | Sparse unions of logical types. | | | | +| dense_union | Dense unions of logical types. | | | | +| dictionary | Dictionary-encoded type, | | | | +| map | Map, a repeated struct logical type. | | | | +| extension | Custom data type, implemented by user. | | | | +| fixed_size_list | Fixed size list of some logical type. | | | | +| duration | Elapsed time in seconds, milliseconds, microseconds or nanoseconds. | | | | +| large_string | Like STRING, but with 64-bit offsets. | | | | +| large_binary | Like BINARY, but with 64-bit offsets. | | | | +| large_list | Like LIST, but with 64-bit offsets. | | | | +| interval_month_day_nano | Calendar interval type with three fields. | | | | +| run_end_encoded | Run-end encoded data. | | | | +| string_view | UTF8 view type with 4-byte prefix & inline small string optimization. | | | | +| binary_view | Bytes view type with 4-byte prefix and inline small string optimization. | | | | +| list_view | A list of some logical data type represented by offset and size. | | | | +| large_list_view | Like LIST_VIEW, but with 64-bit offsets and sizes. | | | | + +Note: Where `TYPE` is used (e.g. `TYPE[]`), it refers an established supported type for the specific data accelerator (e.g. `INTEGER[]`). + From 54462cad53c334a9adad81a7a9008e416b4c6698 Mon Sep 17 00:00:00 2001 From: Jack Eadie Date: Mon, 22 Apr 2024 19:51:10 +1000 Subject: [PATCH 5/5] Update datatypes.md --- spiceaidocs/docs/reference/datatypes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spiceaidocs/docs/reference/datatypes.md b/spiceaidocs/docs/reference/datatypes.md index 2118aa70..fefeb7ff 100644 --- a/spiceaidocs/docs/reference/datatypes.md +++ b/spiceaidocs/docs/reference/datatypes.md @@ -27,7 +27,7 @@ Spice adheres to Apache Arrow data [types](https://docs.rs/arrow/latest/arrow/da | fixed_size_binary | Each value has equal bytes of binary. | | | | | date32 | int32_t days since the UNIX epoch | `DATE` | `DATE` | | | date64 | int64_t milliseconds since the UNIX epoch | `DATE` | `TIMESTAMP` | | -| timestamp | Exact timestamp encoded with int64 since UNIX epoch, seconds or milliseconds | `TIMESTAMP_S`, `TIMESTAMP_MS` | `TIMESTAMP` | | +| timestamp | Exact timestamp encoded with int64 since UNIX epoch, seconds or milliseconds | `TIMESTAMP_S`, `TIMESTAMP_MS` | `TIMESTAMP` | `TIMESTAMP` | | time32 | Time as signed 32-bit integer, seconds or milliseconds since midnight. | `TIME` | `TIME` | | | time64 | Time as signed 64-bit integer, microseconds or nanoseconds since midnight. | `TIME` | `TIME` | | | interval_months | YEAR_MONTH interval in SQL style. | | | |