Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT-#5809: New implementation of the Ray lazy execution queue #6731

Merged
merged 19 commits into from Feb 5, 2024

Conversation

AndreyPavlenko
Copy link
Collaborator

For details, please read the description of the _DeferredExecution class.

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves Evaluate the pros and cons of lazy functions submission (via partition.add_to_apply_calls) #5809
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

@dchigarev
Copy link
Collaborator

Looks promising! I tried to test it on a simple example and the results are quite good. The second case is the worst one - it's when we need to return a temporary result for each node. It's a bit slower than the current master, but I think that it shouldn't occur that often.

image

script to measure
import modin.pandas as pd
import numpy as np
from timeit import default_timer as timer
import modin.config as cfg

N = 10_000_000
M = 10

# initialize workers and trigger import
pd.DataFrame(np.arange(cfg.NPartitions.get() * cfg.MinPartitionSize.get())).to_numpy()
df = pd.DataFrame({f"col{i}": np.arange(N) for i in range(M)})._query_compiler._modin_frame
df.wait_computations()

# CASE 1, completely lazy:
t1 = timer()

df1 = df.map(lambda df: df + 10)
df2 = df1.map(lambda df: df.astype(float))
df3 = df2.map(lambda df: df * 3)
df4 = df3.map(lambda df: df.fillna(0))
df5 = df4.map(lambda df: df / 10)
del df
del df1
del df2
del df3
del df4

df5.wait_computations()
print(timer() - t1)

# CASE 2, return temporary result for each step
t1 = timer()

df1 = df.map(lambda df: df + 10)
df2 = df1.map(lambda df: df.astype(float))
df3 = df2.map(lambda df: df * 3)
df4 = df3.map(lambda df: df.fillna(0))
df5 = df4.map(lambda df: df / 10)
# del df
# del df1
# del df2
# del df3
# del df4

df5.wait_computations()
print(timer() - t1)

# CASE 3, return temporary result for some steps
t1 = timer()

df1 = df.map(lambda df: df + 10)
df2 = df1.map(lambda df: df.astype(float))
df3 = df2.map(lambda df: df * 3)
df4 = df3.map(lambda df: df.fillna(0))
df5 = df4.map(lambda df: df / 10)
del df
# del df1
del df2
# del df3
del df4

df5.wait_computations()
print(timer() - t1)

I'll take a deeper dive into the code later, but as a first sketch it looks pretty good!

@Garra1980
Copy link
Collaborator

Has problem mentioned here gone away with this PR?

@AndreyPavlenko
Copy link
Collaborator Author

Has problem mentioned here gone away with this PR?

Yes, it has. #6695 (comment)

@anmyachev
Copy link
Collaborator

@AndreyPavlenko could you rebase on master in order to be able to see performance with the latest changes?

@AndreyPavlenko
Copy link
Collaborator Author

@AndreyPavlenko could you rebase on master in order to be able to see performance with the latest changes?

Done.

@anmyachev
Copy link
Collaborator

@AndreyPavlenko could you rebase on master in order to be able to see performance with the latest changes?

Done.

I see a slowdown on our workload of about 20-25 seconds.

@AndreyPavlenko AndreyPavlenko force-pushed the issue-5809 branch 4 times, most recently from 8024762 to 8b66ab9 Compare November 29, 2023 22:04
Comment on lines 140 to 141
# If there are no subscribers, we still need the result here.
self.subscribe()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get it, we don't call .unsubscribe() at the end of the function, aren't we just adding a dummy reference here that will never be removed?

Also, shouldn't the subscribe call be under this condition to match the comment?

Suggested change
# If there are no subscribers, we still need the result here.
self.subscribe()
if self.subscribers == 0:
# If there are no subscribers, we still need the result here.
self.subscribe()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the execution, this node will never be executed again and this counter has no impact.

Anyway, this new implementation is broken now. Working on it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now. There were an issue after #6883 introduced a new remote function, that returns multiple values.

@AndreyPavlenko AndreyPavlenko force-pushed the issue-5809 branch 3 times, most recently from cf96119 to 38417a3 Compare February 2, 2024 17:51
Comment on lines +129 to +130
if LazyExecution.get():
return self.__constructor__(de)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreyPavlenko when and in what form do you plan to add tests when LazyExecution is enabled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we shouldn't overload the unit testing, but instead schedule weekly runs with this option enabled.

Copy link
Collaborator

@anmyachev anmyachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreyPavlenko thank you for responding to all my comments, there were a lot of them.

LGTM!

Copy link
Collaborator

@dchigarev dchigarev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreyPavlenko thank you for the PR, great work!

@YarShev YarShev merged commit 382bf9d into modin-project:master Feb 5, 2024
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evaluate the pros and cons of lazy functions submission (via partition.add_to_apply_calls)
5 participants