New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Selection pushdown #204
Comments
implement hint |
Readers should have a property (can_select), so the optimizer can work out if the selection can be pushed. The optimizer, initially, should only push down when a query only has a single table (that removes not knowing which table a field is in) and if the Reader can actually do the selection at read. When there is a metastore, it may be able to work out which table which field is in. Initially this should be implemented for the blob reader (parquet can do it at read, others may need python to do them after read) and for the firestore reader. |
need to convert ExpressionTree to DNF for PyArrow to filter - also needs to do own DNF filtering for blob readers not working via PyArrow (can I do post read filtering using the same mechanism?) |
This will be required before being able to use SQL database as a data source, it won't be practical to download an entire table from a database and then do selection and projections.
This is unlikely to be available when functions are applied to the data as part of a selection, but raw or simple selections should be possible.
The text was updated successfully, but these errors were encountered: