Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Selection pushdown #204

Closed
joocer opened this issue Jun 18, 2022 · 3 comments
Closed

✨ Selection pushdown #204

joocer opened this issue Jun 18, 2022 · 3 comments
Labels
Performance 🏃‍♀️ Improve performance ⚙️ Query Optimizer Component Label - Optimizer Structural 🏗️ Changes to how the Engine works

Comments

@joocer
Copy link
Contributor

joocer commented Jun 18, 2022

This will be required before being able to use SQL database as a data source, it won't be practical to download an entire table from a database and then do selection and projections.

This is unlikely to be available when functions are applied to the data as part of a selection, but raw or simple selections should be possible.

@joocer joocer assigned joocer and unassigned joocer Jun 18, 2022
@joocer joocer added the Performance 🏃‍♀️ Improve performance label Jun 19, 2022
@joocer
Copy link
Contributor Author

joocer commented Jun 20, 2022

implement hint NO_PUSH_SELECTION at the same time

@joocer joocer changed the title [FEATURE] Selection pushdown ✨ Selection pushdown Aug 20, 2022
@joocer joocer added the Structural 🏗️ Changes to how the Engine works label Aug 22, 2022
joocer added a commit that referenced this issue Sep 9, 2022
@joocer joocer added the ⚙️ Query Optimizer Component Label - Optimizer label Sep 15, 2022
@gva-jjoyce
Copy link
Contributor

Readers should have a property (can_select), so the optimizer can work out if the selection can be pushed.

The optimizer, initially, should only push down when a query only has a single table (that removes not knowing which table a field is in) and if the Reader can actually do the selection at read. When there is a metastore, it may be able to work out which table which field is in.

Initially this should be implemented for the blob reader (parquet can do it at read, others may need python to do them after read) and for the firestore reader.

@joocer
Copy link
Contributor Author

joocer commented Dec 7, 2022

need to convert ExpressionTree to DNF for PyArrow to filter - also needs to do own DNF filtering for blob readers not working via PyArrow (can I do post read filtering using the same mechanism?)

joocer added a commit that referenced this issue Dec 7, 2022
joocer added a commit that referenced this issue Dec 7, 2022
joocer added a commit that referenced this issue Dec 7, 2022
joocer added a commit that referenced this issue Dec 7, 2022
joocer added a commit that referenced this issue Dec 7, 2022
joocer added a commit that referenced this issue Dec 7, 2022
@joocer joocer closed this as completed Dec 8, 2022
joocer added a commit that referenced this issue Dec 8, 2022
joocer added a commit that referenced this issue Dec 9, 2022
joocer added a commit that referenced this issue Dec 9, 2022
joocer added a commit that referenced this issue Dec 9, 2022
joocer added a commit that referenced this issue Dec 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance 🏃‍♀️ Improve performance ⚙️ Query Optimizer Component Label - Optimizer Structural 🏗️ Changes to how the Engine works
Projects
None yet
Development

No branches or pull requests

2 participants