Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pure multiprocessing for Dask #6922

Closed
YarShev opened this issue Feb 6, 2024 · 0 comments · Fixed by #6923
Closed

Use pure multiprocessing for Dask #6922

YarShev opened this issue Feb 6, 2024 · 0 comments · Fixed by #6923
Labels
Dask ⚡ Issues related to the Dask engine new feature/request 💬 Requests and pull requests for new features

Comments

@YarShev
Copy link
Collaborator

YarShev commented Feb 6, 2024

On a customer benchmark I am measuring read_csv is significantly slower with the default client rather than with the configured one.

read_csv: 13.72040319442749  # Client() # worker threads and processes take place
read_csv: 3.934723377227783  # Client(n_workers=16, threads_per_worker=1) # pure multiprocessing

We can set DaskThreadsPerWorker to 1 to get pure multiprocessing in Dask.

@YarShev YarShev added new feature/request 💬 Requests and pull requests for new features Dask ⚡ Issues related to the Dask engine labels Feb 6, 2024
YarShev added a commit to YarShev/modin that referenced this issue Feb 6, 2024
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
anmyachev pushed a commit that referenced this issue Feb 6, 2024
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dask ⚡ Issues related to the Dask engine new feature/request 💬 Requests and pull requests for new features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant