-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asPointwise
with n_jobs>1
does not progress
#19
Comments
Thanks, really appreciate you bringing up these issues and possible fix. Would you have an example to reproduce this ? For example this completes with no problem for me:
edit: I managed to reproduce the error using asPointwise directly rather than fit_pw. I don't know why the latter works while the former gets stuck like this. If I don't find a solution with multiprocessing I'll try to implement your suggestion with joblib |
Oh I was about to explain more but I see your edit. I was specifically using the ESS estimator which does not have a built-in
This code completes quickly for LPCA with n_jobs=1 or 2, and for [ESS, MADA, MLE, MOM, TLE] with n_jobs=1. |
Thanks for looking into this, and for the great library! I had a question about terminology. Estimators like LPCA inherit from Based on these descriptions, it seems like the class names of |
You can use ESS pointwise like so:
So you can do this in your code:
For lPCA I made an exception - most people are used to running PCA on the entire dataset to estimate ID, so this is the default when using However paper references associated with this class rather use PCA locally and aggregate estimates to obtain global ID. Indeed ID estimation using PCA on an entire dataset will fail for non-linear data. So arguably, PCA is more of a local ID estimator that tries to obtain ID of the manifold tangent space. Hence the name lPCA (local PCA) for the class to point this out while keeping default use global. Overall I am trying to keep users from inadequately using estimators. I agree the terminology is confusing so suggestions are very welcome. |
I see, that makes much more sense and is simpler. I understand why Here's a suggestion then - in the README you use DANCo as an example of a global estimator (fine), and lPCA as an example of a local estimator. Then when I look up their implementation I see they both inherit from |
I'll put this one my to do list: the lPCA exception should probably be removed and the docs provide a clear example with a lPCA.fit_once method to apply PCA "as usual" a single time on the whole dataset |
Can you give me access rights to make branches and push? I will make a PR switching this function joblib, and then another for adding parallelism internally for estimators like ESS. |
Maybe I need to request for you to be added to the scikit-learn-contrib org. I don't see an option to give you access rights myself |
Actually you should be able to PR even without being member |
No, I have no option to make a PR or even a new branch on this repo. I tried pushing a branch directly but I get a permissions error. |
Could you try again following the steps at https://github.com/firstcontributions/first-contributions ? |
Closed by PR #23 |
(This was tested after fixing another issue locally #17)
scikit-dimension/skdim/_commonfuncs.py
Line 105 in bde433a
asPointwise
works as intended whenn_jobs=1
. However, in my experience whenn_jobs>1
the async call does not complete.I can recommend using the
joblib
library instead ofmultiprocessing
, for example:The text was updated successfully, but these errors were encountered: