Skip to content

Local Parquet metadata for optimized remote storage queries #101

@bluestreak01

Description

@bluestreak01

Summary

When partitions are converted to Parquet and uploaded to object storage, QuestDB extracts and stores metadata locally. This enables the query engine to plan and optimize queries without round-trips to remote storage.

How it works

  1. Partition converts to Parquet (via storage policy)
  2. Parquet file uploads to object storage (S3, GCS, Azure Blob)
  3. Metadata (row group info, column statistics, schema) stays local
  4. Query engine uses local metadata for planning and pruning
  5. Only required data is fetched from remote storage

Benefits

  • Faster query planning — No network latency for metadata access
  • Intelligent pruning — Skip row groups based on local min/max statistics
  • Reduced costs — Fewer object storage API calls
  • Lower latency — Metadata lookups are local disk reads, not remote fetches

Use case

Ideal for large-scale deployments where hot data stays local and cold data tiers to object storage. Queries spanning both hot and cold data remain performant because metadata is always local.

Parent feature

Sub-issue of #62 (Native Apache Parquet partition format)

Metadata

Metadata

Assignees

Labels

enterpriseFeatures specific to QuestDB Enterprise

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions