Conversation
088437a to
05bd98e
Compare
| } | ||
|
|
||
| #[tracing::instrument(skip_all, err, fields(table = %self.table_ref(), files = %self.resolved_files.len()))] | ||
| fn supports_filters_pushdown( |
There was a problem hiding this comment.
Is this used somewhere? Quick search doesn't find any other instances of it here
There was a problem hiding this comment.
It's part of the TableProvider interface. DataFusion will call this do decide if a filter should even be passed into scan.
Looking at it again, is_block_num_filter should allow more general cases, I'll adjust it.
| /// Does not import `Segment`, `Chain`, or `canonical_chain` — works entirely with | ||
| /// [`ResolvedFile`] values produced by `physical_table::TableSnapshot`. | ||
| /// Holds [`Segment`] values from the canonical chain, giving the execution layer | ||
| /// access to block ranges for file-level pruning. |
There was a problem hiding this comment.
Makes sense that we now need to leak Segment values from canonical chain, but is it worth discussing this change instead of it just being a side effect of this PR? Given the intention to use ResolvedFile in the past to keep some separation.
fordN
left a comment
There was a problem hiding this comment.
Code looks great and well tested! I tested this branch locally and it worked well (count of file ranges matched are effectively reduced by _block_num predicate!).
I left a few comments and questions that don't need to block merging.
This PR implements logical pushdown for filters involving
_block_num(e.g.where _block_num > N). This filters based on segment metadata, avoiding the more expensive processing of Parquet metadata.Should be a significant optimization for derived datasets at chain head, that are repeatedly loading all Parquet metadata only to prune all files except the most recent one.