Skip to content

Commit e829336

Browse files
authored
Merge branch 'main' into ia_pgwire_oidc
2 parents c4adab4 + 305a3a3 commit e829336

File tree

10 files changed

+554
-134
lines changed

10 files changed

+554
-134
lines changed

documentation/configuration-utils/_cairo.config.json

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -466,5 +466,29 @@
466466
"cairo.partition.encoder.parquet.raw.array.encoding.enabled": {
467467
"default": "false",
468468
"description": "determines whether to export arrays in QuestDB-native binary format (true, less compatible) or Parquet-native format (false, more compatible)."
469+
},
470+
"cairo.partition.encoder.parquet.version": {
471+
"default": "1",
472+
"description": "Output parquet version to use for parquet-encoded partitions. Can be 1 or 2."
473+
},
474+
"cairo.partition.encoder.parquet.statistics.enabled": {
475+
"default": "true",
476+
"description": "Controls whether or not statistics are included in parquet-encoded partitions."
477+
},
478+
"cairo.partition.encoder.parquet.compression.codec": {
479+
"default": "ZSTD",
480+
"description": "Sets the default compression codec for parquet-encoded partitions. Alternatives include `LZ4_RAW`, `SNAPPY`."
481+
},
482+
"cairo.partition.encoder.parquet.compression.level": {
483+
"default": "9 (ZSTD), 0 (otherwise)",
484+
"description": "Sets the default compression level for parquet-encoded partitions. Dependent on underlying compression codec."
485+
},
486+
"cairo.partition.encoder.parquet.row.group.size": {
487+
"default": "100000",
488+
"description": "Sets the default row-group size for parquet-encoded partitions."
489+
},
490+
"cairo.partition.encoder.parquet.data.page.size": {
491+
"default": "1048576",
492+
"description": "Sets the default page size for parquet-encoded partitions."
469493
}
470494
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"cairo.sql.copy.export.root": {
3+
"default": "export",
4+
"description": "Root directory for parquet exports via `COPY-TO` SQL. This path must not overlap with other directory (e.g. db, conf) of running instance, otherwise export may delete or overwrite existing files. Relative paths are resolved against the server root directory."
5+
}
6+
}

documentation/configuration.md

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ import cairoConfig from "./configuration-utils/\_cairo.config.json"
1010
import parallelSqlConfig from "./configuration-utils/\_parallel-sql.config.json"
1111
import walConfig from "./configuration-utils/\_wal.config.json"
1212
import csvImportConfig from "./configuration-utils/\_csv-import.config.json"
13+
import parquetExportConfig from "./configuration-utils/\_parquet-export.config.json"
1314
import postgresConfig from "./configuration-utils/\_postgres.config.json"
1415
import tcpConfig from "./configuration-utils/\_tcp.config.json"
1516
import udpConfig from "./configuration-utils/\_udp.config.json"
@@ -168,12 +169,14 @@ applying WAL data to the table storage:
168169

169170
<ConfigTable rows={walConfig} />
170171

171-
### CSV import
172+
### COPY settings
173+
174+
#### Import
172175

173176
This section describes configuration settings for using `COPY` to import large
174-
CSV files.
177+
CSV files, or export parquet files.
175178

176-
Settings for `COPY`:
179+
Settings for `COPY FROM` (import):
177180

178181
<ConfigTable
179182
rows={csvImportConfig}
@@ -188,7 +191,7 @@ Settings for `COPY`:
188191
]}
189192
/>
190193

191-
#### CSV import configuration for Docker
194+
**CSV import configuration for Docker**
192195

193196
For QuestDB instances using Docker:
194197

@@ -222,6 +225,28 @@ Where:
222225
It is important that the two path are identical
223226
(`/var/lib/questdb/questdb_import` in the example).
224227

228+
229+
#### Export
230+
231+
<ConfigTable rows={parquetExportConfig} />
232+
233+
Parquet export is also generally impacted by query execution and parquet conversion parameters.
234+
235+
If not overridden, the following default setting will be used.
236+
237+
<ConfigTable
238+
rows={cairoConfig}
239+
pick={[
240+
"cairo.partition.encoder.parquet.raw.array.encoding.enabled",
241+
"cairo.partition.encoder.parquet.version",
242+
"cairo.partition.encoder.parquet.statistics.enabled",
243+
"cairo.partition.encoder.parquet.compression.codec",
244+
"cairo.partition.encoder.parquet.compression.level",
245+
"cairo.partition.encoder.parquet.row.group.size",
246+
"cairo.partition.encoder.parquet.data.page.size"
247+
]}
248+
/>
249+
225250
### Parallel SQL execution
226251

227252
This section describes settings that can affect the level of parallelism during

documentation/guides/export-parquet.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -33,20 +33,17 @@ You can override these defaults when [exporting via COPY](#export-query-as-files
3333

3434
## Export queries as files
3535

36-
:::warning
37-
Exporting as files is right now available on a development branch: [https://github.com/questdb/questdb/pull/6008](https://github.com/questdb/questdb/pull/6008).
38-
If you want to test this feature, you need to clone and compile the branch.
39-
40-
The code is functional, but it is just lacking fuzzy tests and documentation. We should be able to include this in a
41-
release soon enough, but for exporting it is safe to just checkout the development branch, compile, and start QuestDB
42-
pointing to the target jar.
43-
:::
44-
4536
To export a query as a file, you can use either the `/exp` REST API endpoint or the `COPY` command.
4637

4738

4839
### Export query as file via REST
4940

41+
:::tip
42+
43+
See also the [/exp documentation](/docs/reference/api/rest/#exp---export-data).
44+
45+
:::
46+
5047
You can use the same parameters as when doing a [CSV export](/docs/reference/api/rest/#exp---export-data), only passing `parquet` as the `fmt` parameter value.
5148

5249
```
@@ -67,12 +64,18 @@ to point DuckDB to the example file exported in the previous example, you could
6764
start DuckDB and execute:
6865

6966
```
70-
select * from read_parquet('~/tmp/exp.parquet');
67+
select * from read_parquet('~/tmp/exp.parquet');
7168
```
7269

73-
7470
### Export query as files via COPY
7571

72+
73+
:::tip
74+
75+
See also the [COPY-TO documentation](/docs/reference/sql/copy).
76+
77+
:::
78+
7679
If you prefer to export data via SQL, or if you want to export asynchronously, you
7780
can use the `COPY` command from the web console, from any pgwire-compliant client,
7881
or using the [`exec` endpoint](/docs/reference/api/rest/#exec---execute-queries) of the REST API.
@@ -81,13 +84,13 @@ or using the [`exec` endpoint](/docs/reference/api/rest/#exec---execute-queries)
8184
You can export a query:
8285

8386
```
84-
COPY (select * from market_data limit 3) TO 'market_data_parquet_table' WITH FORMAT PARQUET;
87+
COPY (select * from market_data limit 3) TO 'market_data_parquet_table' WITH FORMAT PARQUET;
8588
```
8689

8790
Or you can export a whole table:
8891

8992
```
90-
COPY market_data TO 'market_data_parquet_table' WITH FORMAT PARQUET;
93+
COPY market_data TO 'market_data_parquet_table' WITH FORMAT PARQUET;
9194
```
9295

9396

@@ -106,7 +109,6 @@ If you want to monitor the export process, you can issue a call like this:
106109
SELECT * FROM 'sys.copy_export_log' WHERE id = '45ba24e5ba338099';
107110
```
108111

109-
110112
While it is running, export can be cancelled with:
111113

112114
```

documentation/guides/import-csv.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ csvstack *.csv > singleFile.csv
127127
128128
#### Configure `COPY`
129129
130-
- Enable `COPY` and [configure](/docs/configuration/#csv-import) the `COPY`
130+
- Enable `COPY` and [configure](/docs/configuration/#copy-settings) the `COPY`
131131
directories to suit your server.
132132
- `cairo.sql.copy.root` must be set for `COPY` to work.
133133

documentation/reference/api/rest.md

Lines changed: 66 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,7 @@ off-the-shelf HTTP clients. It provides a simple way to interact with QuestDB
2020
and is compatible with most programming languages. API functions are fully keyed
2121
on the URL and they use query parameters as their arguments.
2222

23-
The Web Console[Web Console](/docs/web-console/) is the official Web client
24-
relying on the REST API.
23+
The [Web Console](/docs/web-console/) is the official Web client for QuestDB, that relies on the REST API.
2524

2625
**Available methods**
2726

@@ -591,15 +590,41 @@ returned in a tabular form to be saved and reused as opposed to JSON.
591590
`/exp` is expecting an HTTP GET request with following parameters:
592591

593592
| Parameter | Required | Description |
594-
| :-------- | :------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
593+
|:----------|:---------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
595594
| `query` | Yes | URL encoded query text. It can be multi-line. |
596595
| `limit` | No | Paging opp parameter. For example, `limit=10,20` will return row numbers 10 through to 20 inclusive and `limit=20` will return first 20 rows, which is equivalent to `limit=0,20`. `limit=-20` will return the last 20 rows. |
597596
| `nm` | No | `true` or `false`. Skips the metadata section of the response when set to `true`. |
597+
| `fmt` | No | Export format. Valid values: `parquet`, `csv`. When set to `parquet`, exports data in Parquet format instead of CSV. |
598+
599+
#### Parquet Export Parameters
600+
601+
:::warning
602+
603+
Parquet exports currently require writing interim data to disk, and therefore must be run on **read-write instances only**.
604+
605+
This limitation will be removed in future.
606+
607+
:::
608+
609+
When `fmt=parquet`, the following additional parameters are supported:
610+
611+
| Parameter | Required | Default | Description |
612+
|:---------------------|:---------|:----------|:-------------------------------------------------------------------------------------------------------------------|
613+
| `partition_by` | No | `NONE` | Partition unit: `NONE`, `HOUR`, `DAY`, `WEEK`, `MONTH`, or `YEAR`. |
614+
| `compression_codec` | No | `ZSTD` | Compression algorithm: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZ4`, `ZSTD`, `LZ4_RAW`, `BROTLI`, `LZO`. |
615+
| `compression_level` | No | `9` | Compression level (codec-specific). Higher values = better compression but slower. |
616+
| `row_group_size` | No | `100000` | Number of rows per Parquet row group. |
617+
| `data_page_size` | No | `1048576` | Size of data pages in bytes (default 1MB). |
618+
| `statistics_enabled` | No | `true` | Enable Parquet column statistics: `true` or `false`. |
619+
| `parquet_version` | No | `2` | Parquet format version: `1` (v1.0) or `2` (v2.0). |
620+
| `raw_array_encoding` | No | `false` | Use raw encoding for arrays: `true` (lighter-weight, less compatible) or `false` (heavier-weight, more compatible) |
598621

599622
The parameters must be URL encoded.
600623

601624
### Examples
602625

626+
#### CSV Export (default)
627+
603628
Considering the query:
604629

605630
```shell
@@ -620,6 +645,44 @@ A HTTP status code of `200` is returned with the following response body:
620645
200501BS00005,"2005-01-10T00:00:00.000Z",21:13
621646
```
622647

648+
#### Parquet Export
649+
650+
Export query results to Parquet format:
651+
652+
```shell
653+
curl -G \
654+
--data-urlencode "query=SELECT * FROM trades WHERE timestamp IN today()" \
655+
--data-urlencode "fmt=parquet" \
656+
http://localhost:9000/exp > trades_today.parquet
657+
```
658+
659+
#### Parquet Export with Custom Options
660+
661+
Export with custom compression and partitioning:
662+
663+
```shell
664+
curl -G \
665+
--data-urlencode "query=SELECT * FROM trades" \
666+
--data-urlencode "fmt=parquet" \
667+
--data-urlencode "partition_by=DAY" \
668+
--data-urlencode "compression_codec=ZSTD" \
669+
--data-urlencode "compression_level=9" \
670+
--data-urlencode "row_group_size=1000000" \
671+
http://localhost:9000/exp > trades.parquet
672+
```
673+
674+
#### Parquet Export with LZ4 Compression
675+
676+
Export with LZ4_RAW compression for faster export:
677+
678+
```shell
679+
curl -G \
680+
--data-urlencode "query=SELECT symbol, price, amount FROM trades WHERE timestamp > dateadd('h', -1, now())" \
681+
--data-urlencode "fmt=parquet" \
682+
--data-urlencode "compression_codec=LZ4_RAW" \
683+
http://localhost:9000/exp > recent_trades.parquet
684+
```
685+
623686
## Error responses
624687

625688
### Malformed queries

documentation/reference/function/aggregation.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,17 +22,19 @@ Running it will result in the following error:
2222

2323
You can work around this limitation by using CTEs or subqueries:
2424

25-
```questdb-sql title="aggregates as function args workaround" demo
25+
```questdb-sql title="CTE workaround"
2626
-- CTE
2727
WITH minmax AS (
28-
SELECT min(timestamp) as min_date, max(timestamp) as max_date FROM trades
28+
SELECT min(timestamp) AS min_date, max(timestamp) AS max_date FROM trades
2929
)
3030
SELECT datediff('d', min_date, max_date) FROM minmax;
3131
3232
-- Subquery
33-
SELECT datediff('d', min_date, max_date) FROM (
34-
SELECT min(timestamp) as min_date, max(timestamp) as max_date FROM trades
33+
SELECT datediff('d', min_date, max_date)
34+
FROM (
35+
SELECT min(timestamp) AS min_date, max(timestamp) AS max_date FROM trades
3536
);
37+
3638
```
3739

3840
:::

0 commit comments

Comments
 (0)