Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary ._copartition() on binary operations #6979

Closed
dchigarev opened this issue Feb 28, 2024 · 0 comments · Fixed by #6980
Closed

Unnecessary ._copartition() on binary operations #6979

dchigarev opened this issue Feb 28, 2024 · 0 comments · Fixed by #6980
Assignees
Labels
P1 Important tasks that we should complete soon Performance 🚀 Performance related issues and pull requests.

Comments

@dchigarev
Copy link
Collaborator

quote from this comment:

This block with binary operations unnecessary triggers lazy executions because of ._copartition() call that ensures that indices/partitioning of both arguments are equal before performing a binary operation:

df['a'] = (df.l_extendedprice) * (1 - (df.l_discount))
df['b'] = (((df.l_extendedprice) * (1 - (df.l_discount))) * (1 + (df.l_tax)))

The current implementation of ._copartition() simply triggers computation of actual indexing and actual row lengths for both dataframes and then compares them. But in this case we know that all arguments are views of the same dataframe, meaning that both indexing and partitioning are identical, so we can potentially skip this check. The mechanism of detecting sibling frames when comparing indices/partitioning was already implemented in #6491. What we can do here is simply enable it for ._copartition() method as well.

@dchigarev dchigarev added Performance 🚀 Performance related issues and pull requests. P1 Important tasks that we should complete soon labels Feb 28, 2024
@dchigarev dchigarev self-assigned this Feb 28, 2024
dchigarev added a commit to dchigarev/modin that referenced this issue Feb 28, 2024
…al indices on binary operations

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
anmyachev pushed a commit that referenced this issue Feb 29, 2024
… binary operations (#6980)

Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 Important tasks that we should complete soon Performance 🚀 Performance related issues and pull requests.
Projects
None yet
1 participant