Skip to content

Add Parquet footer metadata cache for metrics scans#6388

Open
alexanderbianchi wants to merge 1 commit intoquickwit-oss:mainfrom
alexanderbianchi:bianchi/parquet-footer-cache
Open

Add Parquet footer metadata cache for metrics scans#6388
alexanderbianchi wants to merge 1 commit intoquickwit-oss:mainfrom
alexanderbianchi:bianchi/parquet-footer-cache

Conversation

@alexanderbianchi
Copy link
Copy Markdown
Collaborator

@alexanderbianchi alexanderbianchi commented May 6, 2026

Summary

  • Wire metrics ParquetSource through DataFusion's CachedParquetFileReaderFactory so Parquet footer and page-index metadata can be reused across scans.
  • Increase the shared Quickwit DataFusion runtime file metadata cache budget to 4 GiB per node.
  • Add a session-builder override for the metadata cache limit so we can raise or tune it independently later.
  • Add tests covering both the cached reader factory attachment and the 4 GiB runtime cache default.

Expected impact

This targets the current metrics IO bottleneck where repeated scans spend a large amount of time in Parquet metadata loading/opening before returning rows. Warm runs on the same worker should reduce metadata_load_time, time_elapsed_opening, and time_elapsed_scanning_until_data for repeated access to the same files.

This does not cache column data or object-store byte ranges. bytes_scanned should remain roughly unchanged until we add a full blob/range cache.

Follow-up

For a full file/range cache, DataFusion/object_store OSS gives us range APIs and range coalescing, but not a packaged production byte-cache. The likely Quickwit integration point is either the QuickwitObjectStore adapter or the existing quickwit-storage StorageWithCache/StorageCache path, backed by Foyer or a similar bounded cache.

Validation

  • cargo fmt -p quickwit-df-core
  • cargo test -p quickwit-df-core
  • cargo test -p quickwit-datafusion

@alexanderbianchi alexanderbianchi changed the title add parquet footer cache Add Parquet footer metadata cache for metrics scans May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant