Problem Statement
Using min/max values to skip / pruning chunks is a common practice in columar storages. Let's add it as well to support coarse grained filtering.
Open questions:
- Should we use different bit width to support different data types (i.e., 8 bits for
int8/uint8, 64 bits for int64/uint64/double, or just one implementation for all.
- How to support string values efficiently. Do we only want to support
min/max for dictionary string values or string values in general?
Desired behavior
Compute min/max values and store them in a fashion that does not hurt either full scan or point queries. Ideally, such indices are only loaded when necessary, while also taking advantage of optimal I/O size on S3/GCS, and vectorizations.
Problem Statement
Using
min/maxvalues to skip / pruning chunks is a common practice in columar storages. Let's add it as well to support coarse grained filtering.Open questions:
int8/uint8, 64 bits forint64/uint64/double, or just one implementation for all.min/maxfor dictionary string values or string values in general?Desired behavior
Compute min/max values and store them in a fashion that does not hurt either full scan or point queries. Ideally, such indices are only loaded when necessary, while also taking advantage of optimal I/O size on S3/GCS, and vectorizations.