Skip to content

Compute min/max values for columns to support file-level and chunk-level pruning. #11

@eddyxu

Description

@eddyxu

Problem Statement

Using min/max values to skip / pruning chunks is a common practice in columar storages. Let's add it as well to support coarse grained filtering.

Open questions:

  • Should we use different bit width to support different data types (i.e., 8 bits for int8/uint8, 64 bits for int64/uint64/double, or just one implementation for all.
  • How to support string values efficiently. Do we only want to support min/max for dictionary string values or string values in general?

Desired behavior

Compute min/max values and store them in a fashion that does not hurt either full scan or point queries. Ideally, such indices are only loaded when necessary, while also taking advantage of optimal I/O size on S3/GCS, and vectorizations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    c++C++ issuesenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions