Skip to content

feat(core): add local parquet metadata sidecar file for optimized query planning#6913

Merged
bluestreak01 merged 196 commits into
masterfrom
rd_parquet_metadata
May 5, 2026
Merged

feat(core): add local parquet metadata sidecar file for optimized query planning#6913
bluestreak01 merged 196 commits into
masterfrom
rd_parquet_metadata

Conversation

@RaphDal
Copy link
Copy Markdown
Contributor

@RaphDal RaphDal commented Mar 30, 2026

Closes questdb/roadmap#101
Tandem https://github.com/questdb/questdb-enterprise/pull/978

Summary

  • Introduces a compact binary _pm sidecar file that accompanies each data.parquet partition file. The _pm file stores all metadata the query engine needs: column descriptors, QuestDB column types, per-row-group column chunk byte ranges, compression codecs, encodings, and min/max statistics.

  • Replaces the JSON metadata blob previously embedded in the parquet footer's key-value section with a purpose-built binary format. The _pm file is the single authoritative source of partition metadata for parquet partitions.

  • Enables row group pruning via min/max statistics and bloom filter offsets stored locally in _pm, without reading the parquet file itself.

  • Lays the groundwork for cold storage: when a parquet file lives in object storage (S3, GCS, Azure Blob), the _pm file stays local and provides everything the query planner needs to decide which column chunks to fetch by byte range, eliminating metadata round-trips.

  • Adds migration Mig940 to generate _pm files for all existing parquet partitions on engine upgrade.

Motivation

QuestDB's parquet partition support previously stored QuestDB-specific metadata (column types, column tops, symbol flags) as a JSON blob inside the parquet footer under the "questdb" key. This approach has several limitations:

  1. No column chunk byte ranges. The JSON metadata does not include per-column byte offsets and lengths. To decode a specific column chunk, the engine must parse the full parquet footer (thrift-encoded) to locate byte ranges. For cold storage, this requires a network round-trip to read the footer from the remote object store before any data fetch.

  2. No row group statistics. Min/max values per column per row group are available in the parquet footer but not extracted into any local structure. The query engine cannot prune row groups without first downloading and parsing the footer.

  3. No bloom filter references. Bloom filter offsets and lengths live in the parquet footer. Without local access to these references, the engine cannot skip row groups for equality predicates without reading the remote file.

  4. No incremental updates. When an O3 merge appends a new row group to an existing parquet partition, the entire parquet footer must be re-read and re-parsed to access the updated metadata.

The _pm file addresses all of these by storing a complete, locally accessible, binary-encoded copy of all metadata the query engine needs.

The _pm binary format

All multi-byte integers are little-endian.

File layout

 Offset   Section
 ───────  ──────────────────────────────────────────────
 0        HEADER (32B fixed)
            footer_offset           u64   (byte offset of the current footer;
                                          mutable on update, not covered by CRC)
            feature_flags           u64
            designated_timestamp    i32   (-1 if none)
            sorting_column_count    u32
            column_count            u32
            _reserved               u32   (must be 0; alignment padding)

          COLUMN DESCRIPTORS (32B each x column_count)
            name_offset             u64   (offset into name strings region)
            id                      i32   (QuestDB column writer index, -1 if N/A)
            col_type                i32   (QuestDB column type code)
            flags                   i32   (see Column Flags below)
            fixed_byte_len          i32   (for FIXED_LEN_BYTE_ARRAY columns)
            name_length             u32   (UTF-8 byte length)
            physical_type           u8    (parquet physical type 0-7)
            max_rep_level           u8
            max_def_level           u8
            _reserved               u8    (must be 0)

          SORTING COLUMNS (4B each x sorting_column_count)
            column_index            i32   (index into column descriptors)

          NAME STRINGS
            column names stored as raw UTF-8 bytes, referenced by
            (name_offset, name_length) from each column descriptor.
            Each name is padded to 4-byte alignment.

          HEADER FEATURE SECTIONS
            present only when specific feature flag bits are set.
            Navigated sequentially in bit order.
            Bit 0 (BLOOM_FILTERS): bloom column index table.

 ─ ─ ─ ─  ROW GROUP BLOCKS (one per row group, 8-byte aligned)

          Each block:
            num_rows                u64
            column_chunks           64B each x column_count (see below)
            [out-of-line stats]     variable, for stats > 8 bytes
            [bloom filter bitsets]  if BLOOM_FILTERS flag set and not external

 ─ ─ ─ ─  FOOTER (located via header footer_offset)

            parquet_footer_offset   u64   (offset in data.parquet)
            parquet_footer_length   u32   (parquet footer size in bytes)
            row_group_count         u32
            unused_bytes            u64   (dead space from old footers)
            prev_footer_offset      u64   (byte offset of previous footer,
                                          0 if this is the first version)
            row_group_entries       u32 each x row_group_count
                                          (block offset >> 3)
            [footer feature sections]
            crc32                   u32   (covers bytes [8, this field) —
                                          everything after footer_offset)
          TRAILER (last 4 bytes of file)
            footer_length           u32   (bytes from footer start through
                                          CRC, inclusive; used for
                                          standalone validation only)

A reader locates the footer by reading footer_offset from the header at a fixed position. The trailer at the end of the file is retained for standalone validation but is not used on the hot read path.

Column chunk (64 bytes)

Each column chunk in a row group block stores the information needed to locate and decode a column's data in data.parquet:

 Offset  Field               Type   Description
 ──────  ──────────────────  ─────  ──────────────────────────────────
 0       codec               u8     Compression codec (0-7)
 1       encodings           u8     Bitmask of parquet encodings used
 2       stat_flags          u8     Which statistics are present/inlined
 3       stat_sizes          u8     Min/max stat byte sizes (nibble-encoded)
 4       _reserved           u32    Must be 0
 8       num_values          u64    Total values (may differ from num_rows)
 16      byte_range_start    u64    Offset of column chunk in data.parquet
 24      total_compressed    u64    Compressed byte length
 32      null_count          u64
 40      distinct_count      u64
 48      min_stat            u64    Inlined min or out-of-line reference
 56      max_stat            u64    Inlined max or out-of-line reference

Statistics encoding. For fixed-width types that fit in 8 bytes (BOOLEAN, BYTE, SHORT, CHAR, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP), stats are always inlined directly in the min_stat/max_stat fields. For variable-length or wider types (VARCHAR, STRING, UUID, LONG256), stats are stored out-of-line in the region after the column chunks. In that case, the min_stat/max_stat fields encode [offset: u32 (>> 3), length: u32] pointing to the out-of-line data.

Stat flags (u8):

Bit Name
0 MIN_PRESENT
1 MIN_INLINED
2 MIN_EXACT
3 MAX_PRESENT
4 MAX_INLINED
5 MAX_EXACT
6 DISTINCT_COUNT_PRESENT
7 NULL_COUNT_PRESENT

Encoding bitmask (u8):

Bit Encoding
0 PLAIN
1 RLE_DICTIONARY
2 DELTA_BINARY_PACKED
3 DELTA_LENGTH_BYTE_ARRAY
4 DELTA_BYTE_ARRAY
5 BYTE_STREAM_SPLIT

Column flags (i32)

Bit(s) Name Description
0 LOCAL_KEY_IS_GLOBAL Symbol column: local dict equals global
1 IS_ASCII Varchar column: all values are ASCII
2-3 FIELD_REPETITION 0=Required, 1=Optional, 2=Repeated
4 DESCENDING Sorting column is in descending order
5-31 Reserved Must be 0

Feature flags (u64)

The header carries a single feature_flags field that gates optional sections in both the header and footer. This allows additive extensions without bumping the format version.

  • Bits 0-31 (optional): A reader ignores unknown optional bits and skips their sections.
  • Bits 32-63 (required): A reader rejects the file if any unknown required bit is set.

Current feature bits:

Bit Name Header section Footer section
0 BLOOM_FILTERS Bloom column index table Per-row-group bloom offset table
1 BLOOM_FILTERS_EXTERNAL None Offsets point into data.parquet
2 SORTING_IS_DTS_ASC None Implicit sort by DTS ascending

Bit 1 requires bit 0. Bit 2 requires designated_timestamp >= 0. When bit 2 is
set, sorting_column_count on disk is 0 and readers infer sorting by the
designated timestamp column.

_txn integration

Each partition entry in _txn consists of 4 longs (LONGS_PER_TX_ATTACHED_PARTITION = 4):

Field Offset Name Description
0 +0 PARTITION_TS Partition timestamp
1 +1 PARTITION_MASKED_SIZE Row count + flags (bit 61: parquet format)
2 +2 PARTITION_NAME_TX Partition name txn
3 +3 PARTITION_PARQUET_FILE_SIZE Parquet file size in bytes, or -1 for native

Field 3 retains its original meaning: the byte size of data.parquet. This preserves backward compatibility — older versions of QuestDB read field 3 to memory-map the parquet file, so changing its meaning would break rollback.

The _pm file size is not stored in _txn. Instead, readers stat() the _pm file and locate the correct footer version via the header's footer_offset field, using the parquet file size from field 3 as the MVCC version token (see Read path below).

TxWriter.setPartitionParquetFormat() writes both the parquet format bit in field 1 and the parquet file size in field 3 within a single transaction commit.

Write path

Initial creation (convert to parquet)

When the engine converts a native partition to parquet, the Rust parquet encoder generates both data.parquet and _pm in a single pass. The encoder:

  1. Writes column data as parquet row groups to data.parquet.
  2. Builds the _pm file in memory via ParquetMetaWriter: column descriptors from the table schema, one row group block per parquet row group with column chunk metadata (byte ranges, codecs, statistics).
  3. Writes the _pm footer with a CRC32, sets footer_offset in the header to point to it, and sets prev_footer_offset to 0 (first version).
  4. The Java layer calls TxWriter.setPartitionParquetFormat() to record the parquet file size in _txn field 3.

Incremental update (O3 merge)

When an out-of-order insert merges data into an existing parquet partition, O3PartitionJob uses PartitionUpdater to append a new row group:

  1. PartitionUpdater.of() opens the existing data.parquet (read + write) and _pm file.
  2. The new row group is appended to data.parquet.
  3. A new row group block is appended to _pm after the old footer (the old footer becomes dead space tracked in the unused_bytes footer field).
  4. A new footer is written with updated row_group_count, CRC32, and prev_footer_offset pointing to the old footer.
  5. Last: footer_offset in the header is updated to point to the new footer. This is an 8-byte aligned atomic write and is the commit point for the _pm file.
  6. O3PartitionJob commits the new parquet file size to _txn field 3.

Readers using an older _txn snapshot see a different parquet file size in field 3 than the latest _pm footer. They follow the prev_footer_offset chain from the header's footer_offset to locate the footer whose parquet_footer_offset + parquet_footer_length + 8 matches their _txn parquet file size. This provides lock-free concurrent read/write access without storing _pm file size in _txn.

Schema evolution

When the table schema changes via ALTER TABLE ADD COLUMN or ALTER TABLE DROP COLUMN, PartitionUpdater.setTargetSchema() receives the new schema. The Rust updater copies existing row groups, inserting null column chunks for added columns via copyRowGroupWithNullColumns() and dropping chunks for removed columns. The _pm file reflects the target schema: its column descriptors match the new schema, and column IDs link each descriptor to the corresponding writer index.

Read path

Memory mapping

TableReader stat()s the _pm file and memory-maps its full size. The reader then locates the correct footer version by reading footer_offset from the header and walking the prev_footer_offset chain until finding the footer whose parquet file size matches _txn field 3. This avoids storing _pm file size in _txn, preserving backward compatibility for rollback.

If footer_offset exceeds the mapped size (rare race with a concurrent O3 merge extending the file between stat() and mmap), the reader remaps with a fresh stat().

The parquet file size is read from _txn field 3, so the engine does not need to stat() the parquet file.

ParquetMetaFileReader

A zero-allocation reader that operates directly over mmapped memory via Unsafe offset arithmetic. It provides:

  • Column metadata: getColumnCount(), getColumnId(), getColumnType(), getColumnName().
  • Row group metadata: getRowGroupCount(), getRowGroupSize(), getTotalRowCount().
  • Statistics access: getChunkMinStat(), getChunkMaxStat(), getChunkStatFlags().
  • ParquetRowGroupSkipper implementation for filter pushdown (see below).

of(addr, fileSize, parquetFileSize) initializes the reader: it reads footer_offset from the header, then walks the prev_footer_offset chain to find the footer whose parquet file size matches the parquetFileSize argument (the MVCC version token from _txn field 3).

The first call to canSkipRowGroup() lazily allocates a native handle that caches the parsed _pm header and footer. The handle is reused across subsequent calls and freed by close().

ParquetMetaPartitionDecoder

Replaces footer-based PartitionDecoder for table partitions. Java owns all metadata via ParquetMetaFileReader; Rust acts as a stateless decode engine that receives explicit parameters per decode call:

  • decodeRowGroup(): decodes a full row group given column indices, row range, and output buffers.
  • decodeRowGroupWithRowFilter(): decodes with a row-level filter predicate.

Column type overrides (Symbol to Varchar, Varchar to VarcharSlice) come from the Java-side column type, while base types come from the _pm file. For cold storage, the same decode API works: the engine downloads column chunks by byte range using offsets from _pm and passes the data to the same Rust decoder.

Row group pruning

ParquetRowGroupFilter.prepareFilterList() accepts a ParquetMetaFileReader and builds a filter list from SQL WHERE clause conditions. The filter list references column indices resolved from _pm column names and min/max statistics.

At scan time, ParquetMetaFileReader.canSkipRowGroup() evaluates the filter list against the row group's inlined statistics and returns true if the row group can be skipped entirely. This happens on the Java side without any JNI call for the common case of inlined fixed-width statistics; the native handle is only needed for out-of-line or complex comparisons.

Bloom filter offsets stored in _pm footer feature sections (when BLOOM_FILTERS flag is set) enable equality-predicate pruning for columns with bloom filters.

Migration (Mig940)

Mig940 runs during engine upgrade on tables with parquet partitions:

  1. Reads _meta to determine partitionBy and timestamp column type.
  2. Opens _txn for read-only access.
  3. Iterates over all partition entries. For each partition with the parquet format bit set (bit 61 of field 1):
    • Reads the parquet file size from _txn field 3.
    • Opens data.parquet read-only.
    • Creates the _pm file.
    • Calls ParquetMetadataWriter.generate(), which reads the parquet footer, extracts QuestDB metadata from the "questdb" JSON key-value pair, and writes the complete _pm file. The parquet file size from _txn field 3 is used as the authoritative parquet file size (not stat() on the parquet file).

Mig940 does not modify _txn. Field 3 retains its original meaning (parquet file size), preserving backward compatibility for rollback. The migration only generates _pm sidecar files.

The migration is non-destructive: original data.parquet files and _txn remain unchanged. It operates only on local partitions (no cold storage access required).

Rollback safety

_txn field 3 stores the parquet file size (not the _pm file size). Older versions of QuestDB use field 3 to memory-map data.parquet, so keeping its original meaning preserves backward compatibility for rollback. Readers locate the _pm footer via the header's footer_offset field and walk the prev_footer_offset chain, using the parquet file size as an exact-match MVCC token.

If a user rolls back to an older version, writes data that modifies data.parquet (O3 merge, new partitions, etc.), and then re-upgrades, the _pm files become stale: the footer chain contains no footer matching the new parquet file size. ParquetMetaFileReader.of() detects this via exact matching and throws.

Escape hatch: set cairo.repeat.migration.from.version in server.conf to force Mig940 to re-run on the next restart:

cairo.repeat.migration.from.version=427

This causes the migration framework to reset the migration version and re-run Mig940, which regenerates _pm files for all parquet partitions. Remove the property after the restart.

SHOW PARTITIONS integration

ShowPartitionsRecordCursorFactory uses ParquetMetaFileReader to extract min/max timestamps from parquet partitions without parsing the parquet footer. It reads the designated timestamp column index from the _pm header, then fetches inlined min/max statistics from the first and last row group blocks.

Test plan

  • Round-trip _pm creation: convert native partition to parquet, verify _pm file is generated alongside data.parquet and matches expected metadata
  • Multiple row groups: create a partition with multiple row groups, verify each row group block contains correct column chunk metadata
  • O3 incremental update: insert out-of-order data into an existing parquet partition, verify _pm is updated with the new row group and old data remains accessible
  • Multiple consecutive O3 merges: verify _pm handles repeated appends with growing unused_bytes from discarded footers
  • Column type coverage: verify all supported QuestDB types (BOOLEAN, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, STRING, VARCHAR, SYMBOL, UUID, LONG256, IPv4, BINARY) produce correct _pm descriptors
  • Schema evolution: ADD COLUMN and DROP COLUMN between O3 merges, verify _pm reflects the new schema with null chunks for added columns
  • Migration (Mig940): existing parquet partitions without _pm files, verify migration generates correct _pm without modifying _txn
  • ParquetMetaFileReader edge cases: zero row groups, large row group counts, format version validation, corrupted trailer detection
  • ParquetMetaPartitionDecoder lifecycle: verify of() releases pre-existing native handles, destroy() clears reader unconditionally
  • Row group pruning: verify canSkipRowGroup() correctly prunes row groups based on min/max statistics for various column types and filter conditions
  • SHOW PARTITIONS: verify min/max timestamps from _pm match expected values
  • CRC32 integrity: verify reader rejects files with corrupted checksums
  • Parquet file size derivation: verify parquet_footer_offset + parquet_footer_length + 8 matches _txn field 3 parquet file size
  • Footer chain MVCC: after multiple O3 merges, verify reader with older _txn snapshot walks prev_footer_offset chain to find the correct footer version
  • Header footer_offset concurrency: verify reader correctly handles footer_offset exceeding mapped size (remap path)
  • Detach/reattach: verify _pm files survive partition detach and reattach
  • Rust unit tests: cargo test --lib for format serialization, deserialization, CRC computation, stat encoding, feature flag validation
  • End-to-end Rust test (decode_pm_e2e): write parquet, convert to _pm, extract fields, decode column chunks, compare with footer-based decode

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 30, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 02013f52-042b-4151-9acf-63685cec394b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch rd_parquet_metadata

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@RaphDal RaphDal added the DO NOT MERGE These changes should not be merged to main branch label Apr 1, 2026
@RaphDal RaphDal force-pushed the rd_parquet_metadata branch 4 times, most recently from 739982c to 85bedbf Compare April 2, 2026 14:42
@RaphDal RaphDal marked this pull request as ready for review April 2, 2026 20:59
@RaphDal RaphDal force-pushed the rd_parquet_metadata branch from b6f00e6 to f117804 Compare April 7, 2026 09:49
ideoma
ideoma previously approved these changes Apr 29, 2026
ideoma
ideoma previously approved these changes Apr 29, 2026
- Renamed test methods for clarity and consistency.
- Added tests to ensure migration handles corrupt and empty parquet files correctly.
- Implemented checks for parquet metadata generation and validation after migration.
- Enhanced error handling for corrupt parquet files during migration.
- Introduced a method to patch parquet file size in transaction files.
- Ensured migration respects the committed parquet file size over actual file length.
- Added tests to verify behavior when encountering stale or missing metadata.
RaphDal and others added 9 commits April 30, 2026 10:32
TableReader.openParquetMetadata previously opened the _pm file twice
per partition: ParquetMetaFileReader.readParquetMetaFileSize opened
the fd to read the 8-byte size header and closed it, then
MemoryCMRDetachedImpl reopened the same path to mmap the file.

Add MemoryCMRImpl.ofWithSizeFromHeader, which opens once, reads the
mapping size from offset 0 through the just-opened fd, validates
against ff.length(fd) to prevent SIGBUS on an over-large mapping, and
maps. The detached subclass overrides the method to close the fd
after mapping (the mmap survives), matching the existing of()
override.

TableReader.openParquetMetadata now constructs (or reuses) a
MemoryCMRDetachedImpl slot and calls ofWithSizeFromHeader, halving the
per-partition openRO count on the parquet-partition open path. The
slot reuse pattern via parquetMetadataPartitions is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mtopolnik
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 11111 / 12398 (89.62%)

file detail

path covered line new line coverage
🔵 qdbr/src/bin/pm_generate.rs 0 76 00.00%
🔵 io/questdb/cairo/sql/PageFrame.java 0 1 00.00%
🔵 io/questdb/cairo/sql/PartitionFrame.java 0 1 00.00%
🔵 io/questdb/griffin/engine/table/parquet/ParquetMetadataWriter.java 0 1 00.00%
🔵 io/questdb/griffin/engine/table/parquet/PartitionUpdater.java 2 5 40.00%
🔵 io/questdb/cairo/vm/MemoryCMRImpl.java 10 18 55.56%
🔵 io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java 3 5 60.00%
🔵 io/questdb/cairo/TableWriter.java 145 218 66.51%
🔵 io/questdb/cairo/ParquetMetaFileWriter.java 2 3 66.67%
🔵 qdbr/src/parquet_read/jni/file_decoder.rs 39 56 69.64%
🔵 io/questdb/cairo/O3PartitionJob.java 64 87 73.56%
🔵 io/questdb/cairo/ParquetMetaFileReader.java 190 248 76.61%
🔵 qdbr/src/parquet_metadata/jni/writer.rs 171 224 76.34%
🔵 qdbr/src/parquet_metadata/jni/converter.rs 122 157 77.71%
🔵 qdb-parquet-meta/src/reader.rs 528 657 80.37%
🔵 io/questdb/cairo/TableSnapshotRestore.java 144 180 80.00%
🔵 qdbr/src/parquet_write/jni.rs 214 266 80.45%
🔵 qdbr/src/bin/pm_inspect.rs 320 394 81.22%
🔵 io/questdb/griffin/engine/table/ShowPartitionsRecordCursorFactory.java 29 35 82.86%
🔵 io/questdb/cairo/mig/Mig940.java 104 126 82.54%
🔵 qdbr/src/parquet_read/mod.rs 33 39 84.62%
🔵 qdbr/src/parquet_read/meta.rs 38 45 84.44%
🔵 io/questdb/cairo/sql/PageFrameAddressCache.java 6 7 85.71%
🔵 qdbr/src/parquet_read/jni/partition_decoder.rs 410 467 87.79%
🔵 qdbr/src/parquet_write/update.rs 164 186 88.17%
🔵 io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java 8 9 88.89%
🔵 qdbr/src/parquet_read/decode.rs 8 9 88.89%
🔵 qdbr/src/parquet_read/decode_column.rs 599 669 89.54%
🔵 parquet2/src/write/file.rs 49 55 89.09%
🔵 qdbr/src/parquet_metadata/jni/reader.rs 346 387 89.41%
🔵 io/questdb/griffin/engine/table/parquet/ParquetPartitionDecoder.java 73 81 90.12%
🔵 qdb-parquet-meta/src/writer.rs 513 561 91.44%
🔵 qdb-parquet-meta/src/footer.rs 259 280 92.50%
🔵 qdbr/src/parquet_metadata/skip.rs 379 404 93.81%
🔵 qdbr/src/parquet_metadata/convert.rs 2058 2209 93.16%
🔵 qdbr/src/parquet_read/parquet_meta_decode.rs 538 577 93.24%
🔵 io/questdb/cairo/TableReader.java 72 76 94.74%
🔵 qdb-parquet-meta/src/row_group.rs 362 378 95.77%
🔵 parquet2/src/read/page/slice_reader.rs 60 63 95.24%
🔵 qdbr/tests/pm_inspect_e2e.rs 474 495 95.76%
🔵 qdb-parquet-meta/src/header.rs 768 801 95.88%
🔵 qdbr/src/parquet_read/jni/buffers.rs 58 60 96.67%
🔵 qdbr/tests/decode_pm_e2e.rs 889 922 96.42%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetRecordCursor.java 3 3 100.00%
🔵 io/questdb/cairo/IntervalBwdPartitionFrameCursor.java 3 3 100.00%
🔵 io/questdb/griffin/engine/table/parquet/ParquetFileDecoder.java 4 4 100.00%
🔵 qdb-parquet-meta/src/error.rs 68 68 100.00%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetFunctionFactory.java 1 1 100.00%
🔵 qdbr/src/parquet/error.rs 3 3 100.00%
🔵 io/questdb/cairo/O3ParquetMergeContext.java 7 7 100.00%
🔵 io/questdb/cairo/AttachDetachStatus.java 4 4 100.00%
🔵 io/questdb/cairo/IntervalFwdPartitionFrameCursor.java 3 3 100.00%
🔵 parquet2/src/write/column_chunk.rs 6 6 100.00%
🔵 qdbr/src/parquet_write/file.rs 12 12 100.00%
🔵 io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java 1 1 100.00%
🔵 io/questdb/cairo/ParquetTimestampFinder.java 7 7 100.00%
🔵 io/questdb/std/MemoryTag.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/parquet/PartitionEncoder.java 1 1 100.00%
🔵 parquet2/src/write/row_group.rs 22 22 100.00%
🔵 io/questdb/griffin/engine/table/ParquetRowGroupFilter.java 6 6 100.00%
🔵 qdbr/src/parquet_read/jni/mod.rs 10 10 100.00%
🔵 io/questdb/cutlass/parquet/HybridColumnMaterializer.java 3 3 100.00%
🔵 io/questdb/cairo/mig/EngineMigration.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/ExtraNullColumnCursorFactory.java 1 1 100.00%
🔵 io/questdb/cairo/TableUtils.java 20 20 100.00%
🔵 io/questdb/griffin/engine/functions/table/ReadParquetPageFrameCursor.java 1 1 100.00%
🔵 qdb-parquet-meta/src/column_chunk.rs 73 73 100.00%
🔵 io/questdb/cutlass/parquet/CopyExportRequestTask.java 2 2 100.00%
🔵 qdbr/src/parquet_read/decoders/converters.rs 13 13 100.00%
🔵 qdb-parquet-meta/src/types.rs 490 490 100.00%
🔵 io/questdb/cairo/vm/MemoryCMRDetachedImpl.java 7 7 100.00%
🔵 qdbr/src/parquet_read/row_groups.rs 5 5 100.00%
🔵 qdb-core/src/col_type.rs 5 5 100.00%
🔵 io/questdb/cairo/sql/PageFrameMemoryPool.java 62 62 100.00%
🔵 parquet2/src/bloom_filter/read.rs 6 6 100.00%
🔵 io/questdb/cairo/FullBwdPartitionFrameCursor.java 3 3 100.00%
🔵 io/questdb/cairo/AbstractFullPartitionFrameCursor.java 1 1 100.00%
🔵 io/questdb/cairo/FullFwdPartitionFrameCursor.java 3 3 100.00%
🔵 io/questdb/cairo/AbstractIntervalPartitionFrameCursor.java 2 2 100.00%

@bluestreak01 bluestreak01 merged commit 6593577 into master May 5, 2026
55 checks passed
@bluestreak01 bluestreak01 deleted the rd_parquet_metadata branch May 5, 2026 20:43
nwoolmer added a commit that referenced this pull request May 6, 2026
Brings in 4 upstream commits (c0c5638..8735521):

* feat(core): add local parquet metadata sidecar file for optimized query
  planning (#6913). Introduces the `_pm` sidecar produced by
  `ParquetMetadataWriter`, the qdb-parquet-meta Rust crate, the
  pm_generate / pm_inspect binaries, Mig940 to backfill existing
  partitions, and a refactor of attachPartition / O3 parquet paths to
  derive parquet state from the sidecar.
* fix(sql): EMA, VWEMA and KSUM failures in combined window queries
  (#7030).
* chore(core): another posting index fix avoiding corrupt results (#7062).
* docs(core): bump Java requirement to 25 in core/README (#7036).

Conflict 1: core/src/main/resources/io/questdb/bin/linux-x86-64/libquestdbr.so
is a prebuilt native library; took master's version (same resolution as
prior 046c874 / 4302e10 merges). Other native libs updated
cleanly.

Conflict 2: MemoryTag.java tag-name registration. Took both - the
errand-side NATIVE_MEMFD_STORAGE entry and the new master-side
MMAP_PARQUET_METADATA_READER entry are independent.

Semantic merge: TableWriter.attachPartition. Master refactored the
method to detect parquet partitions from the post-rename `_pm` file
(`parquetFileSize > -1`) and dropped the local `boolean isParquet`.
Auto-merge silently kept errand's three `isParquet` references (skip
attachValidateMetadata, upsert column tops instead of iterateDir, skip
configureAppendPosition) without their declaration. Restored
`boolean isParquet` and the early `PARQUET_PARTITION_NAME` probe so
errand's parquet-attach optimisations stay intact alongside master's
new sidecar generation block. Path is reset via trimTo after the
detection probe.

Also relocated the parquet-testing nested submodule from
core/rust/qdbr/parquet2/testing/parquet-testing to
core/rust/parquet2/testing/parquet-testing per the upstream rename;
removed the now-empty leftover directory under core/rust/qdbr/parquet2.

Build: mvn package -DskipTests -pl core -am succeeds. Tests not run
in this commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core Related to storage, data type, etc. New feature Feature requests Performance Performance improvements rust Pull requests that update rust code storage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Local Parquet metadata for optimized remote storage queries

7 participants