Skip to content

Add export to parquet guide #232

@javier

Description

@javier

Even if parquet is not yet production ready, add a guide on the current status. The basic skeleton can be this:


Parquet export is undocumented yet, but there are basically two ways to export to parquet: convert one (or more) partitions in-place, or export the results of any query as a parquet file.

In the first case, in-place conversion, the partition(s) remain under QuestDB control, and they can still be queried. In the second case, the export is just an external file that QuestDB stops tracking once downloaded.

In-place conversion

For in-place conversion, the code is fully functional (I recommend QuestDB 9.0.1 as it supports also exporting arrays as parquet, which was not supported in previous versions), but it has a lot of caveats:

  • we don't recommend it for production yet
  • I have personally tested it for months with no issues, but we don't guarantee data will not be corrupted, so we advise to backup first
  • while converting data, writes to the partitions remain blocked
  • after a partition has been converted to parquet, it will not register any changes you send to that partition, unless you convert back to native
  • schema changes are not supported
  • database might crash when it reads parquet partitions in case of any unexpected issue (such as schema mismatch)
  • some parallel queries are still not optimized for parquet
  • there is no compression by default (it can be enable via config values)

All those caveats should disappear in the next few months, when we will announce it is ready for production. But in the meantime it is fully functional and this can be achieved via:

alter table market_data convert partition to parquet where timestamp < '2025-08-31';

This will convert all partitions earlier than 2025-08-31 to parquet format. Conversion is asynchronous, so you might want to check the status by running

table_partitions('market_data')

If you want to go back to native, you can run

` ``
alter table market_data convert partition to native where timestamp < '2025-08-31';


By default, parquet files will be uncompressed, which is not ideal. You can configure your server.conf with these variables to add compression

   # zstd
    cairo.partition.encoder.parquet.compression.codec=6
    # level is from 1 to 22, 1 is fastest
    cairo.partition.encoder.parquet.compression.level=10

# Export query as file

Exporting as a file is right now available on a development branch: [https://github.com/questdb/questdb/pull/6008](https://github.com/questdb/questdb/pull/6008)

The code is functional, but it is just lacking fuzzy tests and documentation. We should be able to include this in a release soon enough, but for exporting it is safe to just checkout the development branch, compile, and then use it (you can always go back to the master branch after the export).

To export the query as a file, you can use either the COPY command or the `/exp` REST API endpoint, as in

curl -G \
--data-urlencode "query=select * from market_data limit 3;"
'http://localhost:9000/exp?fmt=parquet' > ~/tmp/exp.parquet


Again, by default the parquet file will not be compressed, but it can be controlled with the `server.conf` variables above.

Once exported, you can just use it from anywhere, including DuckDB, for example:

select * from read_parquet('~/tmp/exp.parquet');
You can also use COPY from the web console, the postgresql protocol, or the API `exec` endpoint (from wherever you can run a SQL statement)

copy market_data to 'market_data_parquet_test_table' with format parquet;
The output files (one per partition) will be under `$QUESTDB_ROOT_FOLDER/export/$TO_TABLE_NAME/`

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions