Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable aggregation expr optimization due to N squared performance #2830

Merged

Conversation

emilyreff7
Copy link
Contributor

Currently, on instantiation of the Aggregation node, ibis calls into this expr optimization logic, where this line will attempt to simplify a projection by lifting the root. This optimization method is O(n) as it iterates over all the root.selections (here and here), and in practice when called inside the Aggregation constructor, the outcome is O(n^2) as this logic runs for each table column - the result can be very slow expression creation on wide tables when there is a Selection followed by an aggregation.

# create wide table
N = 3000
df = pd.DataFrame({'a': [1, 2, 3]})
cols = {f'col_{i}': [1] * 3 for i in range(N)}
df = df.assign(**cols)
con = Backend().connect({'df': df})
table = con.table('df')

@udf.reduction(input_type=['double'] * N, output_type='double')
def my_reduction_fn(*args, **kwargs):
    return 3.0

# define expr with selection followed by aggregation with all input columns
expr = table.mutate(a=table['a'].cast('float')
all_cols = tuple(expr[col] for col in expr.columns if col != 'a')

%%time
expr = expr.groupby(expr.a).aggregate(result=my_reduction_fn(*all_args))
Wall time: 3.13s

Performance from here is quadratic:

6,000 cols -> Wall time 12.9s  (x2 cols, x2^2 performance)
9,000 cols -> Wall time 28.9 s (x3 cols, x2^3 performance)

For now it is likely better to disable this optimization inside the Aggregation construction until we can improve the performance.

@jreback jreback added this to the Next release milestone Jun 21, 2021
@jreback jreback added the expressions Issues or PRs related to the expression API label Jun 21, 2021
@jreback
Copy link
Contributor

jreback commented Jun 21, 2021

looks fine, can you add a release note & make a tracking issue about this? (e.g. include a ref to the PR summary)

@jreback jreback merged commit dcb4dac into ibis-project:master Jun 22, 2021
@cpcloud cpcloud removed this from the Next release milestone Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions Issues or PRs related to the expression API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants