Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Rewrite pandas execution to use topological sort instead of recursion #1758

Merged
merged 13 commits into from
May 16, 2019

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Apr 13, 2019

cc @toryhaavik for review

@cpcloud cpcloud added this to the Next Feature Release milestone Apr 13, 2019
ibis/expr/window.py Outdated Show resolved Hide resolved
ibis/pandas/client.py Outdated Show resolved Hide resolved
ibis/file/parquet.py Outdated Show resolved Hide resolved
ibis/file/hdf5.py Outdated Show resolved Hide resolved
ibis/file/csv.py Outdated Show resolved Hide resolved
ibis/pandas/client.py Outdated Show resolved Hide resolved
ibis/pandas/execution/generic.py Show resolved Hide resolved
@cpcloud cpcloud force-pushed the pandas-toposort branch 3 times, most recently from a72c22c to a341862 Compare April 14, 2019 14:19
@cpcloud cpcloud added feature Features or general enhancements pandas The pandas backend refactor Issues or PRs related to refactoring the codebase labels Apr 14, 2019
ibis/pandas/core.py Outdated Show resolved Hide resolved
ibis/pandas/core.py Outdated Show resolved Hide resolved
# compute the nodes that are not currently in scope
for node in nodes:
# allow clients to pre compute nodes as they like
pre_executed_scope = pre_execute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should talk about this a bit. based on the naming, i would expect pre_execute to happen before execute_first. and i think executing pre_execute in a loop over each node is a little different than how it happens now. we may not have good test cases that exercise the nuances of pre_execute here.

type(query).__name__
)
)
return execute_last(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why wouldn't this be part of the main execute? each client needs to call execute_last, and if i call the execution function rather than this method, i won't get the execute_last behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because execute_last will always reset the index because of the default implementation. Window function execution requires the index to be preserved, and it makes recursive calls to execute which means we cannot reset the index.

@cpcloud cpcloud force-pushed the pandas-toposort branch 4 times, most recently from 5ba02f0 to c7df2d1 Compare April 22, 2019 21:58
@cpcloud cpcloud force-pushed the pandas-toposort branch 2 times, most recently from 97b3de5 to 9f552a5 Compare May 14, 2019 20:23
@cpcloud cpcloud self-assigned this May 14, 2019
@cpcloud cpcloud merged commit 063035c into ibis-project:master May 16, 2019
@cpcloud cpcloud deleted the pandas-toposort branch May 16, 2019 00:22
cpcloud added a commit that referenced this pull request Jun 19, 2019
This caused a few failures with downstream code, so I've reverted the
changes to ibis/pandas/core.py.  I've left other fixes from #1758
unchanged.  I'll revisit the reimplementation in the near future.
Author: Phillip Cloud <cpcloud@gmail.com>

Closes #1837 from cpcloud/revert-topo and squashes the following commits:

c9662ac [Phillip Cloud] Just pass kwargs through
5d8ec82 [Phillip Cloud] Make zero argument UDFs work again
9fa4481 [Phillip Cloud] Revert toposort change
d9c00ef [Phillip Cloud] Remove unnecessary code
deed6aa [Phillip Cloud] Revert select parts
cpcloud added a commit that referenced this pull request Jun 20, 2019
Example run:    ```  $ dev/genrelease.py 1.2.0  * :release:`1.2.0
2019-06-19`  * 🐛`1842 major` Add max_lookback to window replace
and combine functions  * 🐛`1837 major` Partially revert #1758  *
:support:`1839` Tighter version spec for pytest  * :support:`1838`
allow pandas timedelta in rows_with_max_lookback  * :support:`1825`
Accept rows-with-max-lookback as preceding parameter  *
:support:`1786` PostGIS support  * :support:`1826` Allow passing a
branch to ci/feedstock.py  ```
Author: Phillip Cloud <cpcloud@gmail.com>

Closes #1845 from cpcloud/release-note-generator and squashes the following commits:

4e7eae1 [Phillip Cloud] Add rls/release to roles
f260477 [Phillip Cloud] SUPP: Generate release notes from commits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements pandas The pandas backend refactor Issues or PRs related to refactoring the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants