Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessed feature extraction is not supported on Windows #51

Open
jvdd opened this issue Dec 10, 2021 · 2 comments
Open

Multiprocessed feature extraction is not supported on Windows #51

jvdd opened this issue Dec 10, 2021 · 2 comments
Labels
wontfix This will not be worked on

Comments

@jvdd
Copy link
Member

jvdd commented Dec 10, 2021

Currently we do not support multiprocessed feature extraction on Windows, and we do not plan (in the near future) to work on this.

What do we use for multiprocessing?
For our multiprocessing functionality we use multiprocess, a fork of multiprocessing that uses under the hood dill for serialization. In contrast to multiprocessing, this package allows multiprocessing lambdas, locally defined functions, imported functions and furthermore sharing of (simple data) in memory.
We believe that users should be able to conveniently define their functions as imports of packages, inline defined functions, or even lambdas. Therefore we will not switch to multiprocessing.

What is the problem with Windows?
Serializing in Windows happens differently than on Linux (and MacOS).

So why will we not set dill.settings["recursive"] = True if the OS is Windows?
Although this works in most cases, we do not believe that the outcome is what we want;

  1. This tends to be slower.
  2. This fails when multiprocessing a feature collection that contains nested functions (see: closured function fails with RecursionError uqfoundation/dill#211). For example tsfresh its feature functions contain such nested functions.

Some related issues (I think);

NOTE: multiprocessing is still supported on Linux & MacOS 😄

@jvdd jvdd added the wontfix This will not be worked on label Dec 10, 2021
@jvdd jvdd added help wanted Extra attention is needed and removed wontfix This will not be worked on labels Mar 28, 2022
@jvdd
Copy link
Member Author

jvdd commented Mar 28, 2022

If the community has a proper fix for this, feel free to create a PR! 😃

(This is currently a low priority feature for the core devs as we are all working on linux.)

@jvdd jvdd added wontfix This will not be worked on and removed help wanted Extra attention is needed labels Jan 29, 2023
@jvdd
Copy link
Member Author

jvdd commented Jan 29, 2023

After revisiting this open issue with the latest version of dill supporting nested function pickling, we can now support multiprocessing on Windows. However, I noticed in PR #91 that multiprocessing on Windows can be quite slow compared to sequential feature extraction, at best showing only a 5% improvement. As a result, I've decided to close PR #91 and leave the implementation as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

1 participant