-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Ibis slows down considerably for wider tables #9111
Comments
|
I've benchmarked Ibis 8 vs. 9 on 55 Ibis expressions which are part of a Data Warehouse built using https://github.com/binste/dbt-ibis. I've measured:
Here some pseudo code to illustrate import ibis
t = ibis.table(...)
# --------------------------
# START: Execution of Ibis code
# --------------------------
t = t.group_by(...).select(...)
# --------------------------
# END: Execution of Ibis code
# --------------------------
# --------------------------
# START: Convert Ibis expression to SQL
# --------------------------
ibis.to_sql(t)
# --------------------------
# END: Convert Ibis expression to SQL
# --------------------------The great news is that the compilation to SQL got significantly faster with the move to SQLGlot which is super nice! :) The execution of Ibis code on the other hand got a bit slower with one expression taking significantly longer with 11 seconds. I've profiled that expression and most time is spent in the following statements:
Hope this helps! |
|
@binste Thanks for digging into this. Interesting results! Can you share the query that's now taking 11 seconds with Ibis 9.0? |
|
Regarding drop and relocate, they're both implemented using a somewhat naive approach:
I suspect that for
We may be able to take a similar approach for relocate by using a data structure more optimized for the operation. I think something that has fast ordered inserts (perhaps |
|
Unfortunately, I can't as I'd need to mask column names and code logic for IP reasons. It's 2 input tables with each around 50 columns and 1 table with ~10 columns and then various operations on top of it. But happy to test out any PRs if there is a wheel file available! |
|
Naive benchmark here, but for a quick test: # drop_test.py
import ibis
import time
from contextlib import contextmanager
@contextmanager
def tictoc(num_cols):
tic = time.time()
yield
toc = time.time()
print(f"| {num_cols} | {toc - tic} seconds")
print(f"{ibis.__version__=}")
for num_cols in [10, 20, 50, 100, 200, 500, 1000]:
t = ibis.table(name="t", schema=[(f"a{i}", "int") for i in range(num_cols)])
with tictoc(num_cols):
t.drop("a8")🐚 python drop_test.py
ibis.__version__='8.0.0'
| 10 | 0.18016910552978516 seconds
| 20 | 0.0016529560089111328 seconds
| 50 | 0.0037081241607666016 seconds
| 100 | 0.007429361343383789 seconds
| 200 | 0.013902902603149414 seconds
| 500 | 0.03390693664550781 seconds
| 1000 | 0.06650948524475098 seconds🐚 python drop_test.py
ibis.__version__='9.0.0'
| 10 | 0.002298593521118164 seconds
| 20 | 0.005956888198852539 seconds
| 50 | 0.027918338775634766 seconds
| 100 | 0.09690213203430176 seconds
| 200 | 0.36721324920654297 seconds
| 500 | 2.301510810852051 seconds
| 1000 | 9.317416191101074 seconds |
|
@binste Any chance you can try with a pre-release wheel? Benchmarking locally (I'll PR this code in a bit) I do still see linear scaling, but the constant factor isn't nearly as bad as @gforsyth's benchmark shows |
|
Thanks for working on this! Yes, I can thest it with a pre-release wheel. Let me know when it's ready and I'll give it a try. |
|
@binste The pre-release wheels are published every Sunday, so give it a go as soon as you're able to using something like this to install a pre-release wheel with |
This PR speeds up drop construction and compilation in cases where the number of dropped columns is small relative to the total number of parent-table columns. There are two ways this is done: - `drop` construction is sped up by reducing the number of iterations over the full set of columns when constructing an output schema. This is where the bulk of the improvement is. - Compilation of the `drop` operation is also a bit faster for smaller sets of dropped columns on some backends due to use of `* EXCLUDE` syntax. Since the optimization is done in the `schema` property, adding a new `DropColumns` relation IR seemed like the lightest weight approach given that that also enables compilers to use `EXCLUDE` syntax, which will produce a far smaller query than using project-without-the-dropped-columns approach. Partially addresses the `drop` performance seen in #9111. To address this for all backends, they either need to all support `SELECT * EXCLUDE(col1, ..., colN)` syntax or we need to implement column pruning. Follow-ups could include applying a similar approach to `rename` (using `REPLACE` syntax for compilation). It might be possible to reduce the overhead of `relocate` as well, but I haven't explored that.
|
Tested it with |
|
@binste Following up here, would you be able to profile your code again and determine whether relocate and rename are still performing undesirably? |
|
I did a quick benchmark on for num_cols in [10, 20, 50, 100, 200, 500, 1000]:
t = ibis.table(name="t", schema=[(f"MyCol{i}", "int") for i in range(num_cols)])
with tictoc(num_cols):
t.rename("snake_case")and here are the results, this looks ok to me: ➜ python rename_test.py
ibis.__version__='8.0.0'
| 10 | 0.24251699447631836 seconds
| 20 | 0.0025930404663085938 seconds
| 50 | 0.009926080703735352 seconds
| 100 | 0.007010936737060547 seconds
| 200 | 0.021643877029418945 seconds
| 500 | 0.04538917541503906 seconds
| 1000 | 0.06621170043945312 seconds ➜ python rename_test.py
ibis.__version__='9.1.0'
| 10 | 0.0009710788726806641 seconds
| 20 | 0.0013387203216552734 seconds
| 50 | 0.0033159255981445312 seconds
| 100 | 0.006360054016113281 seconds
| 200 | 0.012788057327270508 seconds
| 500 | 0.031948089599609375 seconds
| 1000 | 0.06496000289916992 seconds |
|
@ncclementi Thanks! Yeah, I see slightly different results for |
|
Using script: from __future__ import annotations
import ibis
num_cols = 1000
t = ibis.table(name="t", schema=[(f"MyCol{i}", "int") for i in range(num_cols)])
method = "snake_case"
from pyinstrument import Profiler
profiler = Profiler()
profiler.start()
t.rename(method)
profiler.stop()
profiler.print() |
|
I think there are a couple things left to explore here.
For example, in a past version of a performance problem related to wide tables, we were doing some unnecessary validation of columns that were already part of a table and thus had already been validated. There might be something like that here. I would first start by running the smallest possible bit of code that is slow through pyinstrument and then understanding whether the per column cost can be reduced. |
|
For example, for |
|
Same with the |
|
I'll throw up a PR to remove those and see what breaks. |
|
For |
|
@binste I've put up #9641:
|
|
Great progress!! Happy to see these speedups, that's really going to help us with our dbt-ibis setup. Once there is a dev wheel file (I guess once #9644 is merged and there is a publishing on a Sunday), happy to try it out and report back! Thanks again. |
|
@binste We'll cut 9.2.0 ~today, so you should be able to give this a go shortly 🚀 |
|
Importing, executing, and compiling the 56 Ibis expressions in our data warehouse to SQL now takes 6-7 seconds 🚀🚀🚀 That's much better, thank you very much for digging into this and fixing it!! Maybe I get to do a more detailed benchmarking at a later stage to see if I can identify anything else but this definitely puts the issue to rest regarding wide tables :) |
|
@binste Nice! So, just to clarify, when you opened this issue you had a single expression that took ~30 seconds to compile and now every expression out of 56 compiles in 6-7 seconds? Want to make sure I'm capturing the improvement accurately! |
|
I don't have the exact figures anymore but it's roughly this:
|


What happened?
In a web application, I'm creating rather complicated Ibis expressions in the backend. The execution of the final expression takes < 1 second but creating the expression itself took up to 30 seconds. Took me a while to figure out what's going on but I think it's because Ibis gets considerably slower for wider tables. I don't need all columns in my app and so I was now able to improve performance by just pruning away right from the beginning what I don't need.
However, in case you see any improvement potential, below some example code to demonstrate it. If this is just inherent to Ibis, what do you think about a note in the documentation? I only found #6832 which is somewhat related.
Setup
Drop and relocate
In more complex expressions, having multiple of these drop statements can quickly sum up:
Same for relocate:
selectandmutatedo not have this issue:Selectors
I've also noticed that Ibis selectors can be much slower than using a pure Python implementation:
What version of ibis are you using?
9.0.0
What backend(s) are you using, if any?
No response
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: