You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"description": "determines whether to export arrays in QuestDB-native binary format (true, less compatible) or Parquet-native format (false, more compatible)."
469
+
},
470
+
"cairo.partition.encoder.parquet.version": {
471
+
"default": "1",
472
+
"description": "Output parquet version to use for parquet-encoded partitions. Can be 1 or 2."
"description": "Root directory for parquet exports via `COPY-TO` SQL. This path must not overlap with other directory (e.g. db, conf) of running instance, otherwise export may delete or overwrite existing files. Relative paths are resolved against the server root directory."
Copy file name to clipboardExpand all lines: documentation/guides/export-parquet.md
+16-14Lines changed: 16 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,20 +33,17 @@ You can override these defaults when [exporting via COPY](#export-query-as-files
33
33
34
34
## Export queries as files
35
35
36
-
:::warning
37
-
Exporting as files is right now available on a development branch: [https://github.com/questdb/questdb/pull/6008](https://github.com/questdb/questdb/pull/6008).
38
-
If you want to test this feature, you need to clone and compile the branch.
39
-
40
-
The code is functional, but it is just lacking fuzzy tests and documentation. We should be able to include this in a
41
-
release soon enough, but for exporting it is safe to just checkout the development branch, compile, and start QuestDB
42
-
pointing to the target jar.
43
-
:::
44
-
45
36
To export a query as a file, you can use either the `/exp` REST API endpoint or the `COPY` command.
46
37
47
38
48
39
### Export query as file via REST
49
40
41
+
:::tip
42
+
43
+
See also the [/exp documentation](/docs/reference/api/rest/#exp---export-data).
44
+
45
+
:::
46
+
50
47
You can use the same parameters as when doing a [CSV export](/docs/reference/api/rest/#exp---export-data), only passing `parquet` as the `fmt` parameter value.
51
48
52
49
```
@@ -67,12 +64,18 @@ to point DuckDB to the example file exported in the previous example, you could
67
64
start DuckDB and execute:
68
65
69
66
```
70
-
select * from read_parquet('~/tmp/exp.parquet');
67
+
select * from read_parquet('~/tmp/exp.parquet');
71
68
```
72
69
73
-
74
70
### Export query as files via COPY
75
71
72
+
73
+
:::tip
74
+
75
+
See also the [COPY-TO documentation](/docs/reference/sql/copy).
76
+
77
+
:::
78
+
76
79
If you prefer to export data via SQL, or if you want to export asynchronously, you
77
80
can use the `COPY` command from the web console, from any pgwire-compliant client,
78
81
or using the [`exec` endpoint](/docs/reference/api/rest/#exec---execute-queries) of the REST API.
@@ -81,13 +84,13 @@ or using the [`exec` endpoint](/docs/reference/api/rest/#exec---execute-queries)
81
84
You can export a query:
82
85
83
86
```
84
-
COPY (select * from market_data limit 3) TO 'market_data_parquet_table' WITH FORMAT PARQUET;
87
+
COPY (select * from market_data limit 3) TO 'market_data_parquet_table' WITH FORMAT PARQUET;
85
88
```
86
89
87
90
Or you can export a whole table:
88
91
89
92
```
90
-
COPY market_data TO 'market_data_parquet_table' WITH FORMAT PARQUET;
93
+
COPY market_data TO 'market_data_parquet_table' WITH FORMAT PARQUET;
91
94
```
92
95
93
96
@@ -106,7 +109,6 @@ If you want to monitor the export process, you can issue a call like this:
106
109
SELECT * FROM 'sys.copy_export_log' WHERE id = '45ba24e5ba338099';
107
110
```
108
111
109
-
110
112
While it is running, export can be cancelled with:
|`query`| Yes | URL encoded query text. It can be multi-line. |
596
595
|`limit`| No | Paging opp parameter. For example, `limit=10,20` will return row numbers 10 through to 20 inclusive and `limit=20` will return first 20 rows, which is equivalent to `limit=0,20`. `limit=-20` will return the last 20 rows. |
597
596
|`nm`| No |`true` or `false`. Skips the metadata section of the response when set to `true`. |
597
+
|`fmt`| No | Export format. Valid values: `parquet`, `csv`. When set to `parquet`, exports data in Parquet format instead of CSV. |
598
+
599
+
#### Parquet Export Parameters
600
+
601
+
:::warning
602
+
603
+
Parquet exports currently require writing interim data to disk, and therefore must be run on **read-write instances only**.
604
+
605
+
This limitation will be removed in future.
606
+
607
+
:::
608
+
609
+
When `fmt=parquet`, the following additional parameters are supported:
|`compression_level`| No |`9`| Compression level (codec-specific). Higher values = better compression but slower. |
616
+
|`row_group_size`| No |`100000`| Number of rows per Parquet row group. |
617
+
|`data_page_size`| No |`1048576`| Size of data pages in bytes (default 1MB). |
618
+
|`statistics_enabled`| No |`true`| Enable Parquet column statistics: `true` or `false`. |
619
+
|`parquet_version`| No |`2`| Parquet format version: `1` (v1.0) or `2` (v2.0). |
620
+
|`raw_array_encoding`| No |`false`| Use raw encoding for arrays: `true` (lighter-weight, less compatible) or `false` (heavier-weight, more compatible) |
598
621
599
622
The parameters must be URL encoded.
600
623
601
624
### Examples
602
625
626
+
#### CSV Export (default)
627
+
603
628
Considering the query:
604
629
605
630
```shell
@@ -620,6 +645,44 @@ A HTTP status code of `200` is returned with the following response body:
620
645
200501BS00005,"2005-01-10T00:00:00.000Z",21:13
621
646
```
622
647
648
+
#### Parquet Export
649
+
650
+
Export query results to Parquet format:
651
+
652
+
```shell
653
+
curl -G \
654
+
--data-urlencode "query=SELECT * FROM trades WHERE timestamp IN today()" \
655
+
--data-urlencode "fmt=parquet" \
656
+
http://localhost:9000/exp > trades_today.parquet
657
+
```
658
+
659
+
#### Parquet Export with Custom Options
660
+
661
+
Export with custom compression and partitioning:
662
+
663
+
```shell
664
+
curl -G \
665
+
--data-urlencode "query=SELECT * FROM trades" \
666
+
--data-urlencode "fmt=parquet" \
667
+
--data-urlencode "partition_by=DAY" \
668
+
--data-urlencode "compression_codec=ZSTD" \
669
+
--data-urlencode "compression_level=9" \
670
+
--data-urlencode "row_group_size=1000000" \
671
+
http://localhost:9000/exp > trades.parquet
672
+
```
673
+
674
+
#### Parquet Export with LZ4 Compression
675
+
676
+
Export with LZ4_RAW compression for faster export:
677
+
678
+
```shell
679
+
curl -G \
680
+
--data-urlencode "query=SELECT symbol, price, amount FROM trades WHERE timestamp > dateadd('h', -1, now())" \
0 commit comments