Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use multiprocessing pool instead of dask
This change is informed by several issues: The previous version stacked `delayed` put a delayed call around a generator. Starting with dask 2023.9.2 this lead the traceback attached at the end of this message. I opted to remove dask completely because even though I managed to get it working with the newer dask version, the task apparently isn't gaining much from multi-threading. Even with only two workers, CPU cores aren't at 100% on my machine. Using a multi-processing scheduler resulted in errors ("worker unexpectetly finished" or something like that). So instead I opted to use Python's default multiprocessing pool which speeds up the example nicely (from 33s to 9s on my machine). Traceback (most recent call last): File "/home/lg/Res/scikit-image/doc/examples/applications/plot_haar_extraction_selection_classification.py", line 81, in <module> X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=150, [... shortened ...] File "/home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/sklearn/utils/validation.py", line 347, in _num_samples raise TypeError( TypeError: Singleton array array(<generator object <genexpr> at 0x7f5cee64d2a0>, dtype=object) cannot be considered a valid collection.
- Loading branch information