Skip to content

logical pushdown for _block_num#1998

Merged
leoyvens merged 7 commits intomainfrom
leo/logical-pushdown
Mar 20, 2026
Merged

logical pushdown for _block_num#1998
leoyvens merged 7 commits intomainfrom
leo/logical-pushdown

Conversation

@leoyvens
Copy link
Collaborator

This PR implements logical pushdown for filters involving _block_num (e.g. where _block_num > N). This filters based on segment metadata, avoiding the more expensive processing of Parquet metadata.

Should be a significant optimization for derived datasets at chain head, that are repeatedly loading all Parquet metadata only to prune all files except the most recent one.

@leoyvens leoyvens force-pushed the leo/logical-pushdown branch from 088437a to 05bd98e Compare March 20, 2026 11:11
@leoyvens leoyvens marked this pull request as ready for review March 20, 2026 11:38
}

#[tracing::instrument(skip_all, err, fields(table = %self.table_ref(), files = %self.resolved_files.len()))]
fn supports_filters_pushdown(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used somewhere? Quick search doesn't find any other instances of it here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of the TableProvider interface. DataFusion will call this do decide if a filter should even be passed into scan.

Looking at it again, is_block_num_filter should allow more general cases, I'll adjust it.

/// Does not import `Segment`, `Chain`, or `canonical_chain` — works entirely with
/// [`ResolvedFile`] values produced by `physical_table::TableSnapshot`.
/// Holds [`Segment`] values from the canonical chain, giving the execution layer
/// access to block ranges for file-level pruning.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense that we now need to leak Segment values from canonical chain, but is it worth discussing this change instead of it just being a side effect of this PR? Given the intention to use ResolvedFile in the past to keep some separation.

Copy link
Contributor

@fordN fordN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks great and well tested! I tested this branch locally and it worked well (count of file ranges matched are effectively reduced by _block_num predicate!).

I left a few comments and questions that don't need to block merging.

@leoyvens leoyvens merged commit f2e7d07 into main Mar 20, 2026
8 checks passed
@leoyvens leoyvens deleted the leo/logical-pushdown branch March 20, 2026 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants