feat(core): add local parquet metadata sidecar file for optimized query planning by RaphDal · Pull Request #6913 · questdb/questdb

RaphDal · 2026-03-30T13:57:27Z

Closes questdb/roadmap#101
Tandem https://github.com/questdb/questdb-enterprise/pull/978

Summary

Introduces a compact binary _pm sidecar file that accompanies each data.parquet partition file. The _pm file stores all metadata the query engine needs: column descriptors, QuestDB column types, per-row-group column chunk byte ranges, compression codecs, encodings, and min/max statistics.
Replaces the JSON metadata blob previously embedded in the parquet footer's key-value section with a purpose-built binary format. The _pm file is the single authoritative source of partition metadata for parquet partitions.
Enables row group pruning via min/max statistics and bloom filter offsets stored locally in _pm, without reading the parquet file itself.
Lays the groundwork for cold storage: when a parquet file lives in object storage (S3, GCS, Azure Blob), the _pm file stays local and provides everything the query planner needs to decide which column chunks to fetch by byte range, eliminating metadata round-trips.
Adds migration Mig940 to generate _pm files for all existing parquet partitions on engine upgrade.

Motivation

QuestDB's parquet partition support previously stored QuestDB-specific metadata (column types, column tops, symbol flags) as a JSON blob inside the parquet footer under the "questdb" key. This approach has several limitations:

No column chunk byte ranges. The JSON metadata does not include per-column byte offsets and lengths. To decode a specific column chunk, the engine must parse the full parquet footer (thrift-encoded) to locate byte ranges. For cold storage, this requires a network round-trip to read the footer from the remote object store before any data fetch.
No row group statistics. Min/max values per column per row group are available in the parquet footer but not extracted into any local structure. The query engine cannot prune row groups without first downloading and parsing the footer.
No bloom filter references. Bloom filter offsets and lengths live in the parquet footer. Without local access to these references, the engine cannot skip row groups for equality predicates without reading the remote file.
No incremental updates. When an O3 merge appends a new row group to an existing parquet partition, the entire parquet footer must be re-read and re-parsed to access the updated metadata.

The _pm file addresses all of these by storing a complete, locally accessible, binary-encoded copy of all metadata the query engine needs.

The `_pm` binary format

All multi-byte integers are little-endian.

File layout

 Offset   Section
 ───────  ──────────────────────────────────────────────
 0        HEADER (32B fixed)
            footer_offset           u64   (byte offset of the current footer;
                                          mutable on update, not covered by CRC)
            feature_flags           u64
            designated_timestamp    i32   (-1 if none)
            sorting_column_count    u32
            column_count            u32
            _reserved               u32   (must be 0; alignment padding)

          COLUMN DESCRIPTORS (32B each x column_count)
            name_offset             u64   (offset into name strings region)
            id                      i32   (QuestDB column writer index, -1 if N/A)
            col_type                i32   (QuestDB column type code)
            flags                   i32   (see Column Flags below)
            fixed_byte_len          i32   (for FIXED_LEN_BYTE_ARRAY columns)
            name_length             u32   (UTF-8 byte length)
            physical_type           u8    (parquet physical type 0-7)
            max_rep_level           u8
            max_def_level           u8
            _reserved               u8    (must be 0)

          SORTING COLUMNS (4B each x sorting_column_count)
            column_index            i32   (index into column descriptors)

          NAME STRINGS
            column names stored as raw UTF-8 bytes, referenced by
            (name_offset, name_length) from each column descriptor.
            Each name is padded to 4-byte alignment.

          HEADER FEATURE SECTIONS
            present only when specific feature flag bits are set.
            Navigated sequentially in bit order.
            Bit 0 (BLOOM_FILTERS): bloom column index table.

 ─ ─ ─ ─  ROW GROUP BLOCKS (one per row group, 8-byte aligned)

          Each block:
            num_rows                u64
            column_chunks           64B each x column_count (see below)
            [out-of-line stats]     variable, for stats > 8 bytes
            [bloom filter bitsets]  if BLOOM_FILTERS flag set and not external

 ─ ─ ─ ─  FOOTER (located via header footer_offset)

            parquet_footer_offset   u64   (offset in data.parquet)
            parquet_footer_length   u32   (parquet footer size in bytes)
            row_group_count         u32
            unused_bytes            u64   (dead space from old footers)
            prev_footer_offset      u64   (byte offset of previous footer,
                                          0 if this is the first version)
            row_group_entries       u32 each x row_group_count
                                          (block offset >> 3)
            [footer feature sections]
            crc32                   u32   (covers bytes [8, this field) —
                                          everything after footer_offset)
          TRAILER (last 4 bytes of file)
            footer_length           u32   (bytes from footer start through
                                          CRC, inclusive; used for
                                          standalone validation only)

A reader locates the footer by reading footer_offset from the header at a fixed position. The trailer at the end of the file is retained for standalone validation but is not used on the hot read path.

Column chunk (64 bytes)

Each column chunk in a row group block stores the information needed to locate and decode a column's data in data.parquet:

 Offset  Field               Type   Description
 ──────  ──────────────────  ─────  ──────────────────────────────────
 0       codec               u8     Compression codec (0-7)
 1       encodings           u8     Bitmask of parquet encodings used
 2       stat_flags          u8     Which statistics are present/inlined
 3       stat_sizes          u8     Min/max stat byte sizes (nibble-encoded)
 4       _reserved           u32    Must be 0
 8       num_values          u64    Total values (may differ from num_rows)
 16      byte_range_start    u64    Offset of column chunk in data.parquet
 24      total_compressed    u64    Compressed byte length
 32      null_count          u64
 40      distinct_count      u64
 48      min_stat            u64    Inlined min or out-of-line reference
 56      max_stat            u64    Inlined max or out-of-line reference

Statistics encoding. For fixed-width types that fit in 8 bytes (BOOLEAN, BYTE, SHORT, CHAR, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP), stats are always inlined directly in the min_stat/max_stat fields. For variable-length or wider types (VARCHAR, STRING, UUID, LONG256), stats are stored out-of-line in the region after the column chunks. In that case, the min_stat/max_stat fields encode [offset: u32 (>> 3), length: u32] pointing to the out-of-line data.

Stat flags (u8):

Bit	Name
0	MIN_PRESENT
1	MIN_INLINED
2	MIN_EXACT
3	MAX_PRESENT
4	MAX_INLINED
5	MAX_EXACT
6	DISTINCT_COUNT_PRESENT
7	NULL_COUNT_PRESENT

Encoding bitmask (u8):

Bit	Encoding
0	PLAIN
1	RLE_DICTIONARY
2	DELTA_BINARY_PACKED
3	DELTA_LENGTH_BYTE_ARRAY
4	DELTA_BYTE_ARRAY
5	BYTE_STREAM_SPLIT

Column flags (i32)

Bit(s)	Name	Description
0	LOCAL_KEY_IS_GLOBAL	Symbol column: local dict equals global
1	IS_ASCII	Varchar column: all values are ASCII
2-3	FIELD_REPETITION	0=Required, 1=Optional, 2=Repeated
4	DESCENDING	Sorting column is in descending order
5-31	Reserved	Must be 0

Feature flags (u64)

The header carries a single feature_flags field that gates optional sections in both the header and footer. This allows additive extensions without bumping the format version.

Bits 0-31 (optional): A reader ignores unknown optional bits and skips their sections.
Bits 32-63 (required): A reader rejects the file if any unknown required bit is set.

Current feature bits:

Bit	Name	Header section	Footer section
0	BLOOM_FILTERS	Bloom column index table	Per-row-group bloom offset table
1	BLOOM_FILTERS_EXTERNAL	None	Offsets point into data.parquet
2	SORTING_IS_DTS_ASC	None	Implicit sort by DTS ascending

Bit 1 requires bit 0. Bit 2 requires designated_timestamp >= 0. When bit 2 is
set, sorting_column_count on disk is 0 and readers infer sorting by the
designated timestamp column.

`_txn` integration

Each partition entry in _txn consists of 4 longs (LONGS_PER_TX_ATTACHED_PARTITION = 4):

Field	Offset	Name	Description
0	+0	PARTITION_TS	Partition timestamp
1	+1	PARTITION_MASKED_SIZE	Row count + flags (bit 61: parquet format)
2	+2	PARTITION_NAME_TX	Partition name txn
3	+3	PARTITION_PARQUET_FILE_SIZE	Parquet file size in bytes, or -1 for native

Field 3 retains its original meaning: the byte size of data.parquet. This preserves backward compatibility — older versions of QuestDB read field 3 to memory-map the parquet file, so changing its meaning would break rollback.

The _pm file size is not stored in _txn. Instead, readers stat() the _pm file and locate the correct footer version via the header's footer_offset field, using the parquet file size from field 3 as the MVCC version token (see Read path below).

TxWriter.setPartitionParquetFormat() writes both the parquet format bit in field 1 and the parquet file size in field 3 within a single transaction commit.

Write path

Initial creation (convert to parquet)

When the engine converts a native partition to parquet, the Rust parquet encoder generates both data.parquet and _pm in a single pass. The encoder:

Writes column data as parquet row groups to data.parquet.
Builds the _pm file in memory via ParquetMetaWriter: column descriptors from the table schema, one row group block per parquet row group with column chunk metadata (byte ranges, codecs, statistics).
Writes the _pm footer with a CRC32, sets footer_offset in the header to point to it, and sets prev_footer_offset to 0 (first version).
The Java layer calls TxWriter.setPartitionParquetFormat() to record the parquet file size in _txn field 3.

Incremental update (O3 merge)

When an out-of-order insert merges data into an existing parquet partition, O3PartitionJob uses PartitionUpdater to append a new row group:

PartitionUpdater.of() opens the existing data.parquet (read + write) and _pm file.
The new row group is appended to data.parquet.
A new row group block is appended to _pm after the old footer (the old footer becomes dead space tracked in the unused_bytes footer field).
A new footer is written with updated row_group_count, CRC32, and prev_footer_offset pointing to the old footer.
Last: footer_offset in the header is updated to point to the new footer. This is an 8-byte aligned atomic write and is the commit point for the _pm file.
O3PartitionJob commits the new parquet file size to _txn field 3.

Readers using an older _txn snapshot see a different parquet file size in field 3 than the latest _pm footer. They follow the prev_footer_offset chain from the header's footer_offset to locate the footer whose parquet_footer_offset + parquet_footer_length + 8 matches their _txn parquet file size. This provides lock-free concurrent read/write access without storing _pm file size in _txn.

Schema evolution

When the table schema changes via ALTER TABLE ADD COLUMN or ALTER TABLE DROP COLUMN, PartitionUpdater.setTargetSchema() receives the new schema. The Rust updater copies existing row groups, inserting null column chunks for added columns via copyRowGroupWithNullColumns() and dropping chunks for removed columns. The _pm file reflects the target schema: its column descriptors match the new schema, and column IDs link each descriptor to the corresponding writer index.

Read path

Memory mapping

TableReader stat()s the _pm file and memory-maps its full size. The reader then locates the correct footer version by reading footer_offset from the header and walking the prev_footer_offset chain until finding the footer whose parquet file size matches _txn field 3. This avoids storing _pm file size in _txn, preserving backward compatibility for rollback.

If footer_offset exceeds the mapped size (rare race with a concurrent O3 merge extending the file between stat() and mmap), the reader remaps with a fresh stat().

The parquet file size is read from _txn field 3, so the engine does not need to stat() the parquet file.

`ParquetMetaFileReader`

A zero-allocation reader that operates directly over mmapped memory via Unsafe offset arithmetic. It provides:

Column metadata: getColumnCount(), getColumnId(), getColumnType(), getColumnName().
Row group metadata: getRowGroupCount(), getRowGroupSize(), getTotalRowCount().
Statistics access: getChunkMinStat(), getChunkMaxStat(), getChunkStatFlags().
ParquetRowGroupSkipper implementation for filter pushdown (see below).

of(addr, fileSize, parquetFileSize) initializes the reader: it reads footer_offset from the header, then walks the prev_footer_offset chain to find the footer whose parquet file size matches the parquetFileSize argument (the MVCC version token from _txn field 3).

The first call to canSkipRowGroup() lazily allocates a native handle that caches the parsed _pm header and footer. The handle is reused across subsequent calls and freed by close().

`ParquetMetaPartitionDecoder`

Replaces footer-based PartitionDecoder for table partitions. Java owns all metadata via ParquetMetaFileReader; Rust acts as a stateless decode engine that receives explicit parameters per decode call:

decodeRowGroup(): decodes a full row group given column indices, row range, and output buffers.
decodeRowGroupWithRowFilter(): decodes with a row-level filter predicate.

Column type overrides (Symbol to Varchar, Varchar to VarcharSlice) come from the Java-side column type, while base types come from the _pm file. For cold storage, the same decode API works: the engine downloads column chunks by byte range using offsets from _pm and passes the data to the same Rust decoder.

Row group pruning

ParquetRowGroupFilter.prepareFilterList() accepts a ParquetMetaFileReader and builds a filter list from SQL WHERE clause conditions. The filter list references column indices resolved from _pm column names and min/max statistics.

At scan time, ParquetMetaFileReader.canSkipRowGroup() evaluates the filter list against the row group's inlined statistics and returns true if the row group can be skipped entirely. This happens on the Java side without any JNI call for the common case of inlined fixed-width statistics; the native handle is only needed for out-of-line or complex comparisons.

Bloom filter offsets stored in _pm footer feature sections (when BLOOM_FILTERS flag is set) enable equality-predicate pruning for columns with bloom filters.

Migration (`Mig940`)

Mig940 runs during engine upgrade on tables with parquet partitions:

Reads _meta to determine partitionBy and timestamp column type.
Opens _txn for read-only access.
Iterates over all partition entries. For each partition with the parquet format bit set (bit 61 of field 1):
- Reads the parquet file size from _txn field 3.
- Opens data.parquet read-only.
- Creates the _pm file.
- Calls ParquetMetadataWriter.generate(), which reads the parquet footer, extracts QuestDB metadata from the "questdb" JSON key-value pair, and writes the complete _pm file. The parquet file size from _txn field 3 is used as the authoritative parquet file size (not stat() on the parquet file).

Mig940 does not modify _txn. Field 3 retains its original meaning (parquet file size), preserving backward compatibility for rollback. The migration only generates _pm sidecar files.

The migration is non-destructive: original data.parquet files and _txn remain unchanged. It operates only on local partitions (no cold storage access required).

Rollback safety

_txn field 3 stores the parquet file size (not the _pm file size). Older versions of QuestDB use field 3 to memory-map data.parquet, so keeping its original meaning preserves backward compatibility for rollback. Readers locate the _pm footer via the header's footer_offset field and walk the prev_footer_offset chain, using the parquet file size as an exact-match MVCC token.

If a user rolls back to an older version, writes data that modifies data.parquet (O3 merge, new partitions, etc.), and then re-upgrades, the _pm files become stale: the footer chain contains no footer matching the new parquet file size. ParquetMetaFileReader.of() detects this via exact matching and throws.

Escape hatch: set cairo.repeat.migration.from.version in server.conf to force Mig940 to re-run on the next restart:

cairo.repeat.migration.from.version=427

This causes the migration framework to reset the migration version and re-run Mig940, which regenerates _pm files for all parquet partitions. Remove the property after the restart.

`SHOW PARTITIONS` integration

ShowPartitionsRecordCursorFactory uses ParquetMetaFileReader to extract min/max timestamps from parquet partitions without parsing the parquet footer. It reads the designated timestamp column index from the _pm header, then fetches inlined min/max statistics from the first and last row group blocks.

Test plan

Round-trip _pm creation: convert native partition to parquet, verify _pm file is generated alongside data.parquet and matches expected metadata
Multiple row groups: create a partition with multiple row groups, verify each row group block contains correct column chunk metadata
O3 incremental update: insert out-of-order data into an existing parquet partition, verify _pm is updated with the new row group and old data remains accessible
Multiple consecutive O3 merges: verify _pm handles repeated appends with growing unused_bytes from discarded footers
Column type coverage: verify all supported QuestDB types (BOOLEAN, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, STRING, VARCHAR, SYMBOL, UUID, LONG256, IPv4, BINARY) produce correct _pm descriptors
Schema evolution: ADD COLUMN and DROP COLUMN between O3 merges, verify _pm reflects the new schema with null chunks for added columns
Migration (Mig940): existing parquet partitions without _pm files, verify migration generates correct _pm without modifying _txn
ParquetMetaFileReader edge cases: zero row groups, large row group counts, format version validation, corrupted trailer detection
ParquetMetaPartitionDecoder lifecycle: verify of() releases pre-existing native handles, destroy() clears reader unconditionally
Row group pruning: verify canSkipRowGroup() correctly prunes row groups based on min/max statistics for various column types and filter conditions
SHOW PARTITIONS: verify min/max timestamps from _pm match expected values
CRC32 integrity: verify reader rejects files with corrupted checksums
Parquet file size derivation: verify parquet_footer_offset + parquet_footer_length + 8 matches _txn field 3 parquet file size
Footer chain MVCC: after multiple O3 merges, verify reader with older _txn snapshot walks prev_footer_offset chain to find the correct footer version
Header footer_offset concurrency: verify reader correctly handles footer_offset exceeding mapped size (remap path)
Detach/reattach: verify _pm files survive partition detach and reattach
Rust unit tests: cargo test --lib for format serialization, deserialization, CRC computation, stat encoding, feature flag validation
End-to-end Rust test (decode_pm_e2e): write parquet, convert to _pm, extract fields, decode column chunks, compare with footer-based decode

coderabbitai · 2026-03-30T13:57:34Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 02013f52-042b-4151-9acf-63685cec394b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch rd_parquet_metadata

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…riter

…s and documentation

…ructure

… timestamps

…ndling

…ntext and O3PartitionJob

…ration after corrupt metadata

- Renamed test methods for clarity and consistency. - Added tests to ensure migration handles corrupt and empty parquet files correctly. - Implemented checks for parquet metadata generation and validation after migration. - Enhanced error handling for corrupt parquet files during migration. - Introduced a method to patch parquet file size in transaction files. - Ensured migration respects the committed parquet file size over actual file length. - Added tests to verify behavior when encountering stale or missing metadata.

TableReader.openParquetMetadata previously opened the _pm file twice per partition: ParquetMetaFileReader.readParquetMetaFileSize opened the fd to read the 8-byte size header and closed it, then MemoryCMRDetachedImpl reopened the same path to mmap the file. Add MemoryCMRImpl.ofWithSizeFromHeader, which opens once, reads the mapping size from offset 0 through the just-opened fd, validates against ff.length(fd) to prevent SIGBUS on an over-large mapping, and maps. The detached subclass overrides the method to close the fd after mapping (the mmap survives), matching the existing of() override. TableReader.openParquetMetadata now constructs (or reuses) a MemoryCMRDetachedImpl slot and calls ofWithSizeFromHeader, halving the per-partition openRO count on the parquet-partition open path. The slot reuse pattern via parquetMetadataPartitions is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mtopolnik · 2026-05-05T20:37:47Z

[PR Coverage check]

😍 pass : 11111 / 12398 (89.62%)

file detail

	path	covered line	new line	coverage
🔵	qdbr/src/bin/pm_generate.rs	0	76	00.00%
🔵	io/questdb/cairo/sql/PageFrame.java	0	1	00.00%
🔵	io/questdb/cairo/sql/PartitionFrame.java	0	1	00.00%
🔵	io/questdb/griffin/engine/table/parquet/ParquetMetadataWriter.java	0	1	00.00%
🔵	io/questdb/griffin/engine/table/parquet/PartitionUpdater.java	2	5	40.00%
🔵	io/questdb/cairo/vm/MemoryCMRImpl.java	10	18	55.56%
🔵	io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java	3	5	60.00%
🔵	io/questdb/cairo/TableWriter.java	145	218	66.51%
🔵	io/questdb/cairo/ParquetMetaFileWriter.java	2	3	66.67%
🔵	qdbr/src/parquet_read/jni/file_decoder.rs	39	56	69.64%
🔵	io/questdb/cairo/O3PartitionJob.java	64	87	73.56%
🔵	io/questdb/cairo/ParquetMetaFileReader.java	190	248	76.61%
🔵	qdbr/src/parquet_metadata/jni/writer.rs	171	224	76.34%
🔵	qdbr/src/parquet_metadata/jni/converter.rs	122	157	77.71%
🔵	qdb-parquet-meta/src/reader.rs	528	657	80.37%
🔵	io/questdb/cairo/TableSnapshotRestore.java	144	180	80.00%
🔵	qdbr/src/parquet_write/jni.rs	214	266	80.45%
🔵	qdbr/src/bin/pm_inspect.rs	320	394	81.22%
🔵	io/questdb/griffin/engine/table/ShowPartitionsRecordCursorFactory.java	29	35	82.86%
🔵	io/questdb/cairo/mig/Mig940.java	104	126	82.54%
🔵	qdbr/src/parquet_read/mod.rs	33	39	84.62%
🔵	qdbr/src/parquet_read/meta.rs	38	45	84.44%
🔵	io/questdb/cairo/sql/PageFrameAddressCache.java	6	7	85.71%
🔵	qdbr/src/parquet_read/jni/partition_decoder.rs	410	467	87.79%
🔵	qdbr/src/parquet_write/update.rs	164	186	88.17%
🔵	io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java	8	9	88.89%
🔵	qdbr/src/parquet_read/decode.rs	8	9	88.89%
🔵	qdbr/src/parquet_read/decode_column.rs	599	669	89.54%
🔵	parquet2/src/write/file.rs	49	55	89.09%
🔵	qdbr/src/parquet_metadata/jni/reader.rs	346	387	89.41%
🔵	io/questdb/griffin/engine/table/parquet/ParquetPartitionDecoder.java	73	81	90.12%
🔵	qdb-parquet-meta/src/writer.rs	513	561	91.44%
🔵	qdb-parquet-meta/src/footer.rs	259	280	92.50%
🔵	qdbr/src/parquet_metadata/skip.rs	379	404	93.81%
🔵	qdbr/src/parquet_metadata/convert.rs	2058	2209	93.16%
🔵	qdbr/src/parquet_read/parquet_meta_decode.rs	538	577	93.24%
🔵	io/questdb/cairo/TableReader.java	72	76	94.74%
🔵	qdb-parquet-meta/src/row_group.rs	362	378	95.77%
🔵	parquet2/src/read/page/slice_reader.rs	60	63	95.24%
🔵	qdbr/tests/pm_inspect_e2e.rs	474	495	95.76%
🔵	qdb-parquet-meta/src/header.rs	768	801	95.88%
🔵	qdbr/src/parquet_read/jni/buffers.rs	58	60	96.67%
🔵	qdbr/tests/decode_pm_e2e.rs	889	922	96.42%
🔵	io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java	3	3	100.00%
🔵	io/questdb/cairo/IntervalBwdPartitionFrameCursor.java	3	3	100.00%
🔵	io/questdb/griffin/engine/table/parquet/ParquetFileDecoder.java	4	4	100.00%
🔵	qdb-parquet-meta/src/error.rs	68	68	100.00%
🔵	io/questdb/griffin/engine/functions/table/ReadParquetFunctionFactory.java	1	1	100.00%
🔵	qdbr/src/parquet/error.rs	3	3	100.00%
🔵	io/questdb/cairo/O3ParquetMergeContext.java	7	7	100.00%
🔵	io/questdb/cairo/AttachDetachStatus.java	4	4	100.00%
🔵	io/questdb/cairo/IntervalFwdPartitionFrameCursor.java	3	3	100.00%
🔵	parquet2/src/write/column_chunk.rs	6	6	100.00%
🔵	qdbr/src/parquet_write/file.rs	12	12	100.00%
🔵	io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java	1	1	100.00%
🔵	io/questdb/cairo/ParquetTimestampFinder.java	7	7	100.00%
🔵	io/questdb/std/MemoryTag.java	1	1	100.00%
🔵	io/questdb/griffin/engine/table/parquet/PartitionEncoder.java	1	1	100.00%
🔵	parquet2/src/write/row_group.rs	22	22	100.00%
🔵	io/questdb/griffin/engine/table/ParquetRowGroupFilter.java	6	6	100.00%
🔵	qdbr/src/parquet_read/jni/mod.rs	10	10	100.00%
🔵	io/questdb/cutlass/parquet/HybridColumnMaterializer.java	3	3	100.00%
🔵	io/questdb/cairo/mig/EngineMigration.java	1	1	100.00%
🔵	io/questdb/griffin/engine/table/ExtraNullColumnCursorFactory.java	1	1	100.00%
🔵	io/questdb/cairo/TableUtils.java	20	20	100.00%
🔵	io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java	1	1	100.00%
🔵	qdb-parquet-meta/src/column_chunk.rs	73	73	100.00%
🔵	io/questdb/cutlass/parquet/CopyExportRequestTask.java	2	2	100.00%
🔵	qdbr/src/parquet_read/decoders/converters.rs	13	13	100.00%
🔵	qdb-parquet-meta/src/types.rs	490	490	100.00%
🔵	io/questdb/cairo/vm/MemoryCMRDetachedImpl.java	7	7	100.00%
🔵	qdbr/src/parquet_read/row_groups.rs	5	5	100.00%
🔵	qdb-core/src/col_type.rs	5	5	100.00%
🔵	io/questdb/cairo/sql/PageFrameMemoryPool.java	62	62	100.00%
🔵	parquet2/src/bloom_filter/read.rs	6	6	100.00%
🔵	io/questdb/cairo/FullBwdPartitionFrameCursor.java	3	3	100.00%
🔵	io/questdb/cairo/AbstractFullPartitionFrameCursor.java	1	1	100.00%
🔵	io/questdb/cairo/FullFwdPartitionFrameCursor.java	3	3	100.00%
🔵	io/questdb/cairo/AbstractIntervalPartitionFrameCursor.java	2	2	100.00%

Brings in 4 upstream commits (c0c5638..8735521): * feat(core): add local parquet metadata sidecar file for optimized query planning (#6913). Introduces the `_pm` sidecar produced by `ParquetMetadataWriter`, the qdb-parquet-meta Rust crate, the pm_generate / pm_inspect binaries, Mig940 to backfill existing partitions, and a refactor of attachPartition / O3 parquet paths to derive parquet state from the sidecar. * fix(sql): EMA, VWEMA and KSUM failures in combined window queries (#7030). * chore(core): another posting index fix avoiding corrupt results (#7062). * docs(core): bump Java requirement to 25 in core/README (#7036). Conflict 1: core/src/main/resources/io/questdb/bin/linux-x86-64/libquestdbr.so is a prebuilt native library; took master's version (same resolution as prior 046c874 / 4302e10 merges). Other native libs updated cleanly. Conflict 2: MemoryTag.java tag-name registration. Took both - the errand-side NATIVE_MEMFD_STORAGE entry and the new master-side MMAP_PARQUET_METADATA_READER entry are independent. Semantic merge: TableWriter.attachPartition. Master refactored the method to detect parquet partitions from the post-rename `_pm` file (`parquetFileSize > -1`) and dropped the local `boolean isParquet`. Auto-merge silently kept errand's three `isParquet` references (skip attachValidateMetadata, upsert column tops instead of iterateDir, skip configureAppendPosition) without their declaration. Restored `boolean isParquet` and the early `PARQUET_PARTITION_NAME` probe so errand's parquet-attach optimisations stay intact alongside master's new sidecar generation block. Path is reset via trimTo after the detection probe. Also relocated the parquet-testing nested submodule from core/rust/qdbr/parquet2/testing/parquet-testing to core/rust/parquet2/testing/parquet-testing per the upstream rename; removed the now-empty leftover directory under core/rust/qdbr/parquet2. Build: mvn package -DskipTests -pl core -am succeeds. Tests not run in this commit.

RaphDal added the DO NOT MERGE These changes should not be merged to main branch label Apr 1, 2026

RaphDal force-pushed the rd_parquet_metadata branch 4 times, most recently from 739982c to 85bedbf Compare April 2, 2026 14:42

RaphDal marked this pull request as ready for review April 2, 2026 20:59

RaphDal force-pushed the rd_parquet_metadata branch from b6f00e6 to f117804 Compare April 7, 2026 09:49

RaphDal added 22 commits April 9, 2026 08:21

Add QDBP metadata types and writer implementation

1f0f0c7

docs: add guidelines for building and validating Rust code

b7a53e7

integrate into database

6d4f118

test: add Parquet metadata generation and migration tests

7bcbf32

integrate _pm file generation in parquet write path

a883d15

feat: add CLI tools for generating and inspecting _pm metadata files

ca13625

fix: update file descriptor types for parquet file handling in TableW…

c0947bd

…riter

feat: add 'top' field to column metadata and update related structure…

1c7faf8

…s and documentation

fix: update column descriptor size and add new fields for metadata st…

5c8ad3e

…ructure

feat: add support for read path

dd659f9

adding feature flags

7da8c61

more tests

2c95492

refactor: split jni.rs

ef6cf82

fix: update cargo build command to include --lib flag for library builds

ce83e5d

formatting

4fbf075

more tests

b5ce2cf

fix: trim path to partition length before reading native size min/max…

1dd07a6

… timestamps

fix: enhance parquet partition metadata handling and memory mapping

03f4a9f

inlining bloom filters

71e24ee

Added unused_bytes to parquet metadata footers

b2a811c

fix: update column descriptor documentation and adjust name string ha…

0a40dda

…ndling

fix: remove unused RowGroupStatBuffers references in O3ParquetMergeCo…

c87e4e5

…ntext and O3PartitionJob

RaphDal and others added 8 commits April 29, 2026 09:22

remove unnecessary panic settings from Cargo.toml

342a71f

revert ffi guard

f9df084

Rebuild Rust libraries

4ee1e57

revert column top changes in parquet

cd1a169

refactor: reorder Parquet metadata reading logic and add test for mig…

2e822e6

…ration after corrupt metadata

Merge branch 'master' into rd_parquet_metadata

bcd5d76

cleanup references to ParquetMetaPartitionDecoder

4c42004

formatting

6f11db0

ideoma previously approved these changes Apr 29, 2026

View reviewed changes

RaphDal added 2 commits April 29, 2026 17:14

fix: update memory checks in Parquet metadata methods to use isOpen()

a7abca1

Improve comments

7c50bc3

RaphDal dismissed ideoma’s stale review via 7c50bc3 April 29, 2026 15:14

RaphDal force-pushed the rd_parquet_metadata branch from bb24c23 to 7c50bc3 Compare April 29, 2026 15:45

ideoma previously approved these changes Apr 29, 2026

View reviewed changes

RaphDal dismissed ideoma’s stale review via 9baf27b April 30, 2026 07:44

RaphDal and others added 9 commits April 30, 2026 10:32

Merge branch 'master' into rd_parquet_metadata

92be4db

Merge branch 'master' into rd_parquet_metadata

4193ae5

Rebuild Rust libraries

955ce7b

ci: trigger

294be31

Merge branch 'master' into rd_parquet_metadata

774af07

Rebuild Rust libraries

13edf36

Merge branch 'master' into rd_parquet_metadata

9629ef2

Merge branch 'master' into rd_parquet_metadata

318b4d9

bluestreak01 approved these changes May 5, 2026

View reviewed changes

bluestreak01 merged commit 6593577 into master May 5, 2026
55 checks passed

bluestreak01 deleted the rd_parquet_metadata branch May 5, 2026 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): add local parquet metadata sidecar file for optimized query planning#6913

feat(core): add local parquet metadata sidecar file for optimized query planning#6913
bluestreak01 merged 196 commits into
masterfrom
rd_parquet_metadata

RaphDal commented Mar 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading

Review skipped

Uh oh!

mtopolnik commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

RaphDal commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

The _pm binary format

File layout

Column chunk (64 bytes)

Column flags (i32)

Feature flags (u64)

_txn integration

Write path

Initial creation (convert to parquet)

Incremental update (O3 merge)

Schema evolution

Read path

Memory mapping

ParquetMetaFileReader

ParquetMetaPartitionDecoder

Row group pruning

Migration (Mig940)

Rollback safety

SHOW PARTITIONS integration

Test plan

Uh oh!

coderabbitai Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

mtopolnik commented May 5, 2026

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

RaphDal commented Mar 30, 2026 •

edited

Loading

The `_pm` binary format

`_txn` integration

`ParquetMetaFileReader`

`ParquetMetaPartitionDecoder`

Migration (`Mig940`)

`SHOW PARTITIONS` integration

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading