Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Ibis slows down considerably for wider tables #9111

Closed
1 task done
Tracked by #9828
binste opened this issue May 3, 2024 · 24 comments · Fixed by #9644
Closed
1 task done
Tracked by #9828

perf: Ibis slows down considerably for wider tables #9111

binste opened this issue May 3, 2024 · 24 comments · Fixed by #9644
Labels
bug Incorrect behavior inside of ibis
Milestone

Comments

@binste
Copy link
Contributor

binste commented May 3, 2024

What happened?

In a web application, I'm creating rather complicated Ibis expressions in the backend. The execution of the final expression takes < 1 second but creating the expression itself took up to 30 seconds. Took me a while to figure out what's going on but I think it's because Ibis gets considerably slower for wider tables. I don't need all columns in my app and so I was now able to improve performance by just pruning away right from the beginning what I don't need.

However, in case you see any improvement potential, below some example code to demonstrate it. If this is just inherent to Ibis, what do you think about a note in the documentation? I only found #6832 which is somewhat related.

Setup

import ibis
import ibis.selectors as s
from ibis import deferred as d_

t = ibis.table(name="base1", schema={f"col{i}": "int64" for i in range(200)})

Drop and relocate

In more complex expressions, having multiple of these drop statements can quickly sum up:

t1 = t.drop("col0", "col1", "col2")  # 500ms

Same for relocate:

t1 = t.relocate("col100", before="col2")  # 1,700ms -> 1.7 seconds

select and mutate do not have this issue:

t1 = t.select("col0", "col1", "col2")  # 3ms
t1 = t.mutate(colnew=ibis.literal(1))  # 20ms

Selectors

I've also noticed that Ibis selectors can be much slower than using a pure Python implementation:

t.mutate(s.across(s.c(*[f"col{i}" for i in range(20)]), d_.cast("str")))  # 580ms
t.mutate(**{f"col{i}": d_[f"col{i}"] for i in range(20)})  # 70ms

What version of ibis are you using?

9.0.0

What backend(s) are you using, if any?

No response

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@binste binste added the bug Incorrect behavior inside of ibis label May 3, 2024
@binste
Copy link
Contributor Author

binste commented May 3, 2024

I've benchmarked Ibis 8 vs. 9 on 55 Ibis expressions which are part of a Data Warehouse built using https://github.com/binste/dbt-ibis. I've measured:

  • Execution of Ibis code: The time it takes to execute the Ibis code to get to the final expression
  • Convert Ibis expression to SQL: The time it takes to compile the final Ibis expression to SQL

Here some pseudo code to illustrate

import ibis

t = ibis.table(...)

# --------------------------
# START: Execution of Ibis code
# --------------------------
t = t.group_by(...).select(...)

# --------------------------
# END: Execution of Ibis code
# --------------------------

# --------------------------
# START: Convert Ibis expression to SQL
# --------------------------
ibis.to_sql(t)
# --------------------------
# END: Convert Ibis expression to SQL
# --------------------------

image

image

The great news is that the compilation to SQL got significantly faster with the move to SQLGlot which is super nice! :) The execution of Ibis code on the other hand got a bit slower with one expression taking significantly longer with 11 seconds. I've profiled that expression and most time is spent in the following statements:

  • Table.relocate ibis/expr/types/relations.py:4279
    • Code here looks like this: .relocate("valid_to", after="valid_from")
  • Join.wrapper ibis/expr/types/joins.py:188
  • Table.drop ibis/expr/types/relations.py:2329
  • Table.rename ibis/expr/types/relations.py:2140

Hope this helps!

@cpcloud
Copy link
Member

cpcloud commented May 3, 2024

@binste Thanks for digging into this. Interesting results!

Can you share the query that's now taking 11 seconds with Ibis 9.0?

@cpcloud
Copy link
Member

cpcloud commented May 3, 2024

Regarding drop and relocate, they're both implemented using a somewhat naive approach:

  • drop: project every column except the input columns
  • relocate: basically a reprojection that loops through every column

I suspect that for drop we can turn it into a special operation that we should at least be able to make the operation scale with the number of dropped columns instead of the total number of columns. A place to start might be:

  1. create a special Drop relational operation
  2. implement the column drop in select merging by looping over the drop list and removing each element of the drop list from the Select fields. Select fields are dicts, so that should get us to O(droplist) instead of O(allcolumns).

We may be able to take a similar approach for relocate by using a data structure more optimized for the operation. I think something that has fast ordered inserts (perhaps sortedcontainers has a sorted map container?) might be a good place to start.

@binste
Copy link
Contributor Author

binste commented May 3, 2024

Unfortunately, I can't as I'd need to mask column names and code logic for IP reasons. It's 2 input tables with each around 50 columns and 1 table with ~10 columns and then various operations on top of it. But happy to test out any PRs if there is a wheel file available!

@gforsyth
Copy link
Member

gforsyth commented May 3, 2024

Naive benchmark here, but for a quick test:

# drop_test.py
import ibis
import time
from contextlib import contextmanager


@contextmanager
def tictoc(num_cols):
    tic = time.time()
    yield
    toc = time.time()
    print(f"| {num_cols} | {toc - tic} seconds")


print(f"{ibis.__version__=}")


for num_cols in [10, 20, 50, 100, 200, 500, 1000]:
    t = ibis.table(name="t", schema=[(f"a{i}", "int") for i in range(num_cols)])

    with tictoc(num_cols):
        t.drop("a8")
🐚 python drop_test.py
ibis.__version__='8.0.0'
| 10 | 0.18016910552978516 seconds
| 20 | 0.0016529560089111328 seconds
| 50 | 0.0037081241607666016 seconds
| 100 | 0.007429361343383789 seconds
| 200 | 0.013902902603149414 seconds
| 500 | 0.03390693664550781 seconds
| 1000 | 0.06650948524475098 seconds
🐚 python drop_test.py
ibis.__version__='9.0.0'
| 10 | 0.002298593521118164 seconds
| 20 | 0.005956888198852539 seconds
| 50 | 0.027918338775634766 seconds
| 100 | 0.09690213203430176 seconds
| 200 | 0.36721324920654297 seconds
| 500 | 2.301510810852051 seconds
| 1000 | 9.317416191101074 seconds

@cpcloud
Copy link
Member

cpcloud commented Jun 25, 2024

@binste Any chance you can try with a pre-release wheel?

Benchmarking locally (I'll PR this code in a bit) I do still see linear scaling, but the constant factor isn't nearly as bad as @gforsyth's benchmark shows

ibis/tests/benchmarks/test_benchmarks.py .......                                                                                                                                                        [100%]


----------------------------------------------------------------------------- benchmark 'test_wide_drop': 7 tests -----------------------------------------------------------------------------
Name (time in ms)             Min                 Max                Mean             StdDev              Median                IQR            Outliers       OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_wide_drop[10]         1.5730 (1.0)        2.1650 (1.0)        1.6427 (1.0)       0.0519 (1.0)        1.6336 (1.0)       0.0436 (1.22)        71;24  608.7550 (1.0)         404           1
test_wide_drop[20]         2.8863 (1.83)       3.5940 (1.66)       2.9740 (1.81)      0.0554 (1.07)       2.9666 (1.82)      0.0357 (1.0)         19;14  336.2452 (0.55)        317           1
test_wide_drop[50]         6.8432 (4.35)       7.4251 (3.43)       7.0100 (4.27)      0.0750 (1.45)       7.0007 (4.29)      0.0611 (1.71)         28;6  142.6526 (0.23)        138           1
test_wide_drop[100]       13.6984 (8.71)      14.2588 (6.59)      13.9413 (8.49)      0.1101 (2.12)      13.9258 (8.52)      0.1516 (4.25)         20;1   71.7293 (0.12)         70           1
test_wide_drop[200]       26.8306 (17.06)     28.1245 (12.99)     27.5633 (16.78)     0.4098 (7.90)      27.6801 (16.94)     0.7497 (21.03)        14;0   36.2801 (0.06)         36           1
test_wide_drop[500]       67.1504 (42.69)    112.0973 (51.78)     70.7434 (43.07)    11.4473 (220.75)    67.6779 (41.43)     0.7385 (20.72)         1;1   14.1356 (0.02)         15           1
test_wide_drop[1000]     135.7117 (86.27)    185.6611 (85.76)    147.9776 (90.08)    20.9796 (404.56)   137.2259 (84.00)    21.4762 (602.41)        2;0    6.7578 (0.01)          8           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
============================================================================================== 7 passed in 8.82s ==============================================================================================

@binste
Copy link
Contributor Author

binste commented Jun 26, 2024

Thanks for working on this! Yes, I can thest it with a pre-release wheel. Let me know when it's ready and I'll give it a try.

@cpcloud
Copy link
Member

cpcloud commented Jun 26, 2024

@binste The pre-release wheels are published every Sunday, so give it a go as soon as you're able to using something like this to install a pre-release wheel with ibis-framework + the duckdb backend dependencies:

pip install --pre -U 'ibis-framework[duckdb]`

gforsyth pushed a commit that referenced this issue Jun 26, 2024
This PR speeds up drop construction and compilation in cases where the
number
of dropped columns is small relative to the total number of parent-table
columns.

There are two ways this is done:

- `drop` construction is sped up by reducing the number of iterations
  over the full set of columns when constructing an output schema.

  This is where the bulk of the improvement is.
- Compilation of the `drop` operation is also a bit faster for smaller
sets of
  dropped columns on some backends due to use of `* EXCLUDE` syntax.

Since the optimization is done in the `schema` property, adding a new
`DropColumns` relation IR seemed like the lightest weight approach given
that that also enables compilers to use `EXCLUDE` syntax, which will
produce a
far smaller query than using project-without-the-dropped-columns
approach.

Partially addresses the `drop` performance seen in #9111.

To address this for all backends, they either need to all support
`SELECT * EXCLUDE(col1, ..., colN)` syntax or we need to implement
column
pruning.

Follow-ups could include applying a similar approach to `rename` (using
`REPLACE`
syntax for compilation).

It might be possible to reduce the overhead of `relocate` as well, but
I haven't explored that.
@jcrist jcrist added this to the Q3 2024 milestone Jun 26, 2024
@binste
Copy link
Contributor Author

binste commented Jul 1, 2024

Tested it with ibis-framework==10.0.0.dev120 and it's a big improvement! Executing all our internal dbt-ibis models takes now 8 seconds instead of 41. :) If the same could be done for relocate, rename, etc. then I'd assume that number goes down further but already thanks for this! Looking forward to version 10.

@lostmygithubaccount lostmygithubaccount removed this from the Q3 2024 milestone Jul 17, 2024
@cpcloud
Copy link
Member

cpcloud commented Jul 19, 2024

@binste Following up here, would you be able to profile your code again and determine whether relocate and rename are still performing undesirably?

@ncclementi
Copy link
Contributor

I did a quick benchmark on rename() I reused Gil's example but now doing:

for num_cols in [10, 20, 50, 100, 200, 500, 1000]:
    t = ibis.table(name="t", schema=[(f"MyCol{i}", "int") for i in range(num_cols)])

    with tictoc(num_cols):
        t.rename("snake_case")

and here are the results, this looks ok to me:

 ➜ python rename_test.py
ibis.__version__='8.0.0'
| 10 | 0.24251699447631836 seconds
| 20 | 0.0025930404663085938 seconds
| 50 | 0.009926080703735352 seconds
| 100 | 0.007010936737060547 seconds
| 200 | 0.021643877029418945 seconds
| 500 | 0.04538917541503906 seconds
| 1000 | 0.06621170043945312 seconds
 ➜ python rename_test.py
ibis.__version__='9.1.0'
| 10 | 0.0009710788726806641 seconds
| 20 | 0.0013387203216552734 seconds
| 50 | 0.0033159255981445312 seconds
| 100 | 0.006360054016113281 seconds
| 200 | 0.012788057327270508 seconds
| 500 | 0.031948089599609375 seconds
| 1000 | 0.06496000289916992 seconds

@cpcloud
Copy link
Member

cpcloud commented Jul 19, 2024

@ncclementi Thanks! Yeah, I see slightly different results for main:

| 10 | 0.0028498172760009766 seconds
| 20 | 0.0045757293701171875 seconds
| 50 | 0.011339426040649414 seconds
| 100 | 0.02110433578491211 seconds
| 200 | 0.04119515419006348 seconds
| 500 | 0.10248041152954102 seconds
| 1000 | 0.2085111141204834 seconds

@cpcloud
Copy link
Member

cpcloud commented Jul 19, 2024

Using pyinstrument, I see about a 2x speedup in rename from 8.0.0 to 9.1.0:

script:

   from __future__ import annotations

   import ibis

   num_cols = 1000
   t = ibis.table(name="t", schema=[(f"MyCol{i}", "int") for i in range(num_cols)])

   method = "snake_case"

   from pyinstrument import Profiler

   profiler = Profiler()
   profiler.start()

   t.rename(method)

   profiler.stop()

   profiler.print()

@cpcloud
Copy link
Member

cpcloud commented Jul 19, 2024

I think there are a couple things left to explore here.

rename and relocate in the worst case need to look at every column, and that's currently the bottleneck AFAICT, though exploring with pyinstrument to see what is so expensive per column is worth looking into. That way we wouldn't have to gut anything in the top level method if we can figure out how to improve the per column cost.

For example, in a past version of a performance problem related to wide tables, we were doing some unnecessary validation of columns that were already part of a table and thus had already been validated. There might be something like that here.

I would first start by running the smallest possible bit of code that is slow through pyinstrument and then understanding whether the per column cost can be reduced.

@cpcloud
Copy link
Member

cpcloud commented Jul 19, 2024

For example, for rename, we probably don't need any of the DerefMap machinery that's happening in bind. DerefMap is for cases where there might otherwise be ambiguous parent tables, but in a rename we already know that everything is bound.

@cpcloud
Copy link
Member

cpcloud commented Jul 19, 2024

Same with the rewrite_project_input calls, we don't need them.

@cpcloud
Copy link
Member

cpcloud commented Jul 19, 2024

I'll throw up a PR to remove those and see what breaks.

@cpcloud
Copy link
Member

cpcloud commented Jul 19, 2024

For relocate the problem is a bit trickier, but since it's currently using the full binding machinery as well, we can probably shave a decent constant factor off the top, and follow up with an algorithmic improvements later.

@cpcloud
Copy link
Member

cpcloud commented Jul 20, 2024

@binste I've put up two three PRs that should hopefully put this issue to bed for now 😄.

#9641: rename performance

Avoids some redundant validation overhead that becomes expensive once there are enough columns.

Total improvement: 6-12x according to benchmarks.

#9644: relocate performance

This PR includes the changes from #9641 and #9646 and fixes an issue with unnecessary repeated traversal of every column.

Total improvement: 7-12x according to benchmarks.

#9646: drop performance

We tackled a big chunk of this in previous PR (#9440), but there was a bit left after poking around using the fast(er) column binding I discovered in #9641.

Total improvement: 2-3x according to benchmarks.

If you can give #9644 a try and report back that'd be great!

@binste
Copy link
Contributor Author

binste commented Jul 20, 2024

Great progress!! Happy to see these speedups, that's really going to help us with our dbt-ibis setup. Once there is a dev wheel file (I guess once #9644 is merged and there is a publishing on a Sunday), happy to try it out and report back! Thanks again.

@cpcloud
Copy link
Member

cpcloud commented Jul 22, 2024

@binste We'll cut 9.2.0 ~today, so you should be able to give this a go shortly 🚀

@binste
Copy link
Contributor Author

binste commented Jul 23, 2024

Importing, executing, and compiling the 56 Ibis expressions in our data warehouse to SQL now takes 6-7 seconds 🚀🚀🚀 That's much better, thank you very much for digging into this and fixing it!!

Maybe I get to do a more detailed benchmarking at a later stage to see if I can identify anything else but this definitely puts the issue to rest regarding wide tables :)

@cpcloud
Copy link
Member

cpcloud commented Jul 23, 2024

@binste Nice!

So, just to clarify, when you opened this issue you had a single expression that took ~30 seconds to compile and now every expression out of 56 compiles in 6-7 seconds?

Want to make sure I'm capturing the improvement accurately!

@binste
Copy link
Contributor Author

binste commented Jul 24, 2024

I don't have the exact figures anymore but it's roughly this:

  • At one point, I think in Ibis 8 or 9.0, a single expression on a wide table took ~30 seconds to compile.
  • Compiling all 56 expressions took between 30-50 seconds in Ibis 8/9.0 with improvements between 8 and 9
  • Now, we're down to a total compilation time of ~6 seconds for all 56 expressions

@cpcloud cpcloud added this to the 9.2 milestone Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants