-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(drop): speed up performance of drop #9440
Conversation
|
Benchmarks: |
b9bc7f7
to
1ee4118
Compare
495aa08
to
2e68785
Compare
|
New benchmark results (note these aren't comparing to ConstructionStill scaling with the number of columns, but overall doing fewer full scans of all columns. CompilationThis now scales with |
|
Here are the comparison benchmarks for benchmarks that passed: CompilationConstruction |
|
For construction, things are better across the board. For compilation, it's kind of a wash except for the important fact that some cases of drop simply didn't work because construction the expression overflowed the Python stack. I think then this is overall a net improvement. |
|
Clouds are passing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lovely, looks like a win to me


This PR speeds up drop construction and compilation in cases where the number
of dropped columns is small relative to the total number of parent-table
columns.
There are two ways this is done:
dropconstruction is sped up by reducing the number of iterationsover the full set of columns when constructing an output schema.
This is where the bulk of the improvement is.
Compilation of the
dropoperation is also a bit faster for smaller sets ofdropped columns on some backends due to use of
* EXCLUDEsyntax.Since the optimization is done in the
schemaproperty, adding a newDropColumnsrelation IR seemed like the lightest weight approach giventhat that also enables compilers to use
EXCLUDEsyntax, which will produce afar smaller query than using project-without-the-dropped-columns approach.
Partially addresses the
dropperformance seen in #9111.To address this for all backends, they either need to all support
SELECT * EXCLUDE(col1, ..., colN)syntax or we need to implement columnpruning.
Follow-ups could include applying a similar approach to
rename(usingREPLACEsyntax for compilation).
It might be possible to reduce the overhead of
relocateas well, butI haven't explored that.