ValueError: assignment destination is read-only, when paralleling with n_jobs > 1 #5956
Comments
I am taking a look at this |
Is it not related to #5481, which seems more generic? |
It is, but |
Not that it matters but SparseCoder is an estimator: from sklearn.base import BaseEstimator
from sklearn.decomposition import SparseCoder
issubclass(SparseCoder, BaseEstimator) # True |
I guess the error wasn't detected in #4807 as it is raised only when using |
Was there a resolution to this bug? I've run into something similar while doing
Someone ran into the same exact problem on StackOverflow - |
@alichaudry I just commented on a similar issue here. |
I confirm that there is an error and it is floating in nature. sklearn.decomposition.SparseCoder(D, transform_algorithm = 'omp', n_jobs=64).transform(X) if X.shape[0] > 4000 it fails with ValueError: assignment destination is read-only OS: Linux 3.10.0-327.13.1.el7.x86_64 |
Hi there, OS: OSX |
currently I am still dealing with this issue and it is nearly a year since. this is still an open issue. |
If you have a solution, please contribute it, @williamdjones |
#4807 is probably the more advanced effort to address this. |
@williamdjones I was not suggesting that it's solved, but that it's an issue that is reported at a different place, and having multiple issues related to the same problem makes keeping track of it harder. |
Not sure where to report this, or if it's related, but I get the |
@JGH1000 NOT A SOLUTION, but I would try using a random forest for feature selection instead since it is stable and has working joblib functionality. |
Thanks @williamdjones, I used several different methods but found that RandomizedLasso works best for couple of particular datasets. In any case, it works but a bit slow. Not a deal breaker. |
@JGH1000 No problem. If you don't mind, I'm curious about the dimensionality of the datasets for which RLasso was useful versus those for which it was not. |
@williamdjones it was a small sample size (40-50), high-dimension (40,000-50,000) dataset. I would not say that other methods were bad, but RLasso provided results/ranking that were much more consistent with several univariate tests + domain knowledge. I guess this might not be the 'right' features but I had more trust in this method. Shame to hear it will be removed from scikit. |
The problem still seems to exist on 24 core Ubuntu processor for RLasso with n_jobs = -1 and sklearn 0.19.1 |
@coldfog @garciaev I don't know if it is still relevant for you, but I ran into the same problem using joblib without scikit-learn. The reason is the max_nbytes parameter within the Parallel invocation of the Joblib-library when you set n_jobs>1, which is 1M by default. The definition of this parameter is: "Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder". So, once the arrays pass the size of 1M, joblib will throw the error "ValueError: assignment destination is read-only". In order to overcome this, the parameter has to be set higher, e.g. max_nbytes='50M'. If you want a quick-fix, you can add max_nbytes='50M' to the file "sklearn/decomposition/dict_learning.py" at line 297 in the Parallel class initiation to increase the allowed size of temporary files. |
Just to complement @lvermue answer. I did what he suggested, but instead inside |
And you can find |
When I run
SparseCoder
with n_jobs > 1, there is a chance to raise exceptionValueError: assignment destination is read-only
. The code is shown as follow:The bigger
data_dims
is, the higher chance get. Whendata_dims
is small (lower than 2000, I verified), everything works fine. Oncedata_dims
is bigger than 2000, there is a chance to get the exception. Whendata_dims
is bigger than 5000, it is 100% raised.My version infor:
OS: OS X 10.11.1
python: Python 2.7.10 |Anaconda 2.2.0
numpy: 1.10.1
sklearn: 0.17
The full error information is shown as follow
The text was updated successfully, but these errors were encountered: