-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallalism in gridsearcCV is ending up with a permission error #12546
Comments
can you please try with current master? |
@amueller Thanks for the reply, do you mean to update the version? I dont know what is meant by current master. |
Yes, update the version to the development version that's on github right now. |
I actually remembered that this is happening since i upgraded the version to 0.20 so i downgraded it and its working now. Thanks for the help |
There are issues in 0.20.0 with joblib. Downgrading is a work-around but hopefully this will also be solved in 0.20.1 (to be release later this week) or the current development version |
Using Parallel with |
@tolikkansk it would be great if you could provide a small example reproducing the PermissionError? Also do you know if it's a random failure or if it happens every time you run your code? |
@albertcthomas Today this failure happens every time you run your code today. I ran into this problem yesterday firstly, before scrpit worked correctly, also I added part of pipeline which I run from main.py: Windows-10 HOME v.1709 16299.847 |
Thanks! Instead of sending pictures, readability and reusability of your code can be greatly improved if you format your code snippets and complete error messages appropriately. For example:
generates: print(something) And:
generates: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'hello' Also, if the code is very long you can link to it like this. You can edit your comments at any time to improve readability. This helps maintainers a lot. |
Is there a solution for this issue? @amueller @albertcthomas I am working with the downgraded version 0.19.2, as mentioned by @Sai-Macharla, and so far so good. But both for 0.20.1 and .2 this issue wasnt resolved, right? |
I don't think it's resolved in either 0.20.1 or 0.20.2. |
Also I don't know what makes it work in 0.19.2 but not in 0.20.2? The default joblib backend is now loky but this seems to be an issue related to memmaping. |
Currently that code in disk.py catches |
Actually
From what I experimented you sometimes need to wait a few minutes before being able to delete the folder, see this comment in the related joblib issue joblib/joblib#806 |
Sorry, I missed the warning. |
I am working with the APS SCANIA dataset and sklearn 0.20.2 import numpy as np
import pandas as pd
import time
import sklearn
print(sklearn.__version__)
df_train = pd.read_csv("./aps_failure_training_set.csv", na_values=["na"])
df_test = pd.read_csv("./aps_failure_test_set.csv", na_values=["na"])
# ---
df_train['class'] = (df_train["class"] == "pos").astype("int")
df_test['class'] = (df_test["class"] == "pos").astype("int")
# ---
y_train = df_train['class']
y_test = df_test['class']
X_train = df_train.drop('class', axis=1)
X_test = df_test.drop('class',axis=1)
# ---
X_train.fillna(X_train.mean(), inplace=True)
X_test.fillna(X_test.mean(), inplace=True)# ---
def undersample(df_X, df_y):
# Get number of positive class
num_pos = len(df_y[df_y == 1])
# Get a list of numbers of rows with neg values
indices_neg = df_y[df_y == 0].index
# Choose randomly a number of values from the neg list
num_draws = num_pos
random_indices = np.random.choice(indices_neg, num_draws, replace=False)
# Get the list of indices with pos values to use
indices_pos = df_y[df_y == 1].index
# List with the undersample indices
under_sample_indices = np.concatenate([indices_pos, random_indices])
# Extract undersaample values from dataframe
X_undersample = df_X.loc[under_sample_indices]
print(X_undersample.shape)
y_undersample = df_y[under_sample_indices]
print(y_undersample.shape)
return (X_undersample, y_undersample)
# ---
df_XX = pd.DataFrame(X_train)
X_train_under, y_train_under = undersample(df_XX, y_train)
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
print("First training, model with n_jobs=1")
clf = RandomForestClassifier()
gs = RandomizedSearchCV(clf,param_distributions={"n_estimators": np.arange(5, 50, 5),"max_depth": np.arange(5, 8, 1)}, n_iter=10, cv=5, scoring="accuracy", verbose=1,n_jobs=-1 )
start_1 = time.time()
gs.fit(X_train_under, y_train_under)
print("Best params", gs.best_params_)
best_clf = gs.best_estimator_
print("results at each iteration:", gs.cv_results_['mean_test_score'])
print("Took %s seconds" % (time.time()-start_1)) This returns me the following error: C:\Program Files\Python36\lib\site-packages\sklearn\externals\joblib\disk.py:122: UserWarning: Unable to delete folder C:\Users\kjn-lc\AppData\Local\Temp\joblib_memmapping_folder_7712_1434446855 after 5 tentatives.
.format(folder_path, RM_SUBDIRS_N_RETRY))
|
Thanks a lot @lucascolz! |
Interestingly, import numpy as np
import pandas as pd
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
for _ in range(10):
X_train = np.random.rand(int(2e6)).reshape((int(1e6), 2))
y_train = np.random.randint(0, 2, int(1e6))
X_train = pd.DataFrame(X_train)
clf = RandomForestClassifier()
gs = RandomizedSearchCV(
clf,
param_distributions={"n_estimators": np.array([1]),
"max_depth": np.array([2])},
n_iter=1,
cv=2,
scoring="accuracy",
verbose=1,
n_jobs=2
)
gs.fit(X_train, y_train) always fails (never at the first iteration of the for loop). Note the use of a pandas dataframe for However when import numpy as np
import pandas as pd
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
for _ in range(10):
X_train = np.random.rand(int(2e6)).reshape((int(1e6), 2))
y_train = np.random.randint(0, 2, int(1e6))
clf = RandomForestClassifier()
gs = RandomizedSearchCV(
clf,
param_distributions={"n_estimators": np.array([1]),
"max_depth": np.array([2])},
n_iter=1,
cv=2,
scoring="accuracy",
verbose=1,
n_jobs=2
)
gs.fit(X_train, y_train) does not fail. |
@lucascolz do you see the error every time you run your code? Can you try by passing numpy arrays instead of pandas dataframes? X_train_under = X_train_under.values
y_train_under = y_train_under.values |
@albertcthomas When I am able to use again the same computer, I can tell if the error persists. |
thanks @lucascolz. |
@albertcthomas Thanks for providing the snippets. With dataframe as X_train: Fitting 2 folds for each of 1 candidates, totalling 2 fits
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.5s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.5s finished
C:\Users\lucas\Anaconda3\lib\site-packages\sklearn\externals\joblib\disk.py:122: UserWarning: Unable to delete folder C:\Users\lucas\AppData\Local\Temp\joblib_memmapping_folder_113764_4112142399 after 5 tentatives.
.format(folder_path, RM_SUBDIRS_N_RETRY)) The run with numpy arrays: Fitting 2 folds for each of 1 candidates, totalling 2 fits
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.4s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.4s finished
C:\Users\lucas\Anaconda3\lib\site-packages\sklearn\externals\joblib\disk.py:122: UserWarning: Unable to delete folder C:\Users\lucas\AppData\Local\Temp\joblib_memmapping_folder_113764_4112142399 after 5 tentatives.
.format(folder_path, RM_SUBDIRS_N_RETRY)) In scikit-learn 0.19.2, I tried running the code you provided, so: import numpy as np
import pandas as pd
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
import sklearn
print(sklearn.__version__)
for _ in range(10):
X_train = np.random.rand(int(2e6)).reshape((int(1e6), 2))
y_train = np.random.randint(0, 2, int(1e6))
X_train = pd.DataFrame(X_train)
clf = RandomForestClassifier()
gs = RandomizedSearchCV(
clf,
param_distributions={"n_estimators": np.array([1]),
"max_depth": np.array([2])},
n_iter=1,
cv=2,
scoring="accuracy",
verbose=1,
n_jobs=-1
)
gs.fit(X_train, y_train) It runs the random search without error. Have to check the library dependancies, as in another computer it run smoothly with 0.20. If you have any idea what I can still try, I am open to suggestions. In the meanwhile, I will continue with 0.19 |
Thanks for the report. |
@lucascolz just to confirm: on your other computer you are also using Windows? |
@albertcthomas Yes, I am using Windows 10 64 bits in all machines. |
Actually, when in the same ipython session I first run the snippet with |
This tends to happen to me also. I think I will stick to the numpy array structure for as long as I can, or use the 0.19. I wasnt able to debug why exactly this problem happens. |
So you are saying that when you use numpy arrays you don't have the permission error? Thanks @lucascolz for helping us investigating this issue |
Yes, in my last tests it happened the same as you mentioned. Numpy arrays work (at least on the implementation I am working ) and if I try a dataframe, it tends to raise an error after the fifth iteraction. |
@Sai-Macharla could you let us know if you were working with pandas dataframes or numpy arrays (for |
Actually even if |
I also have this problem, using My workaround was to comment out def terminate(self):
if self._workers is not None:
# Terminate does not shutdown the workers as we want to reuse them
# in latter calls but we free as much memory as we can by deleting
# the shared memory
#delete_folder(self._workers._temp_folder)
self._workers = None
self.reset_batch_stats() in |
Please offer your comment at joblib.
|
I have this error on Python 3.6.8 scikit-learn 0.21.1 and joblib 0.13.2 Window 64 |
Downgrading joblib to 0.11 might be the simplest fix
|
This issue is tracked upstream in joblib/joblib#806 please add any additional comments there instead. |
Thank you very much.
It works with joblib 0.11
Le lun. 27 mai 2019 à 09:49, Roman Yurchak <notifications@github.com> a
écrit :
… This issue is tracked upstream in joblib/joblib#806
<joblib/joblib#806> please add any additional
comment there instead.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#12546?email_source=notifications&email_token=AJSQZF7CDWS3SGG5FD5E6ADPXOHBJA5CNFSM4GCLYTM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWJBY5Q#issuecomment-496114806>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJSQZF53SSTGQO24JL47Y5LPXOHBJANCNFSM4GCLYTMQ>
.
|
Changing the backend to 'threading' worked for me:
|
@dhbrand 'threading' suffers from the Python Global Interpreter Lock. |
My intuition is that pandas is generating cyclic references to the memmapped large numpy array which is therefore collected with a delay. Let me try to write a minimal reproduction case. |
@ogrisel This comment above contains a reproduction case, but we might be able to have a smaller one, only involving joblib. |
Yes this is what I am looking for. |
I cannot reproduce the error in the reproduction case above with scikit-learn 0.21.3 and joblib 0.14.0... |
I have set-up a windows VM to debug this but unfortunately I cannot get: #12546 (comment) to fail on this machine which is going to make things harder to debug and fix on my end. I will still try to come up with a blind minimal reproducing example with the reference cycle hypothesis. |
Yes I cannot reproduce the error either. I also tried with previous scikit-learn versions (0.20 and 0.19). |
The fact that I cannot reproduce with a VM might be caused by the fact that memory mapped files might behave differently in a VM. I will try to reproduce with a CI worker in this PR: joblib/joblib#942 |
I think we can close this issue as I cannot reproduce on Windows (with scikit-learn 0.21.3, pandas 0.25.1 and joblib 0.14.0) and the associated test was not failing on Appveyor in the related joblib PR (see joblib/joblib#942). |
Indeed it seems that it's no longer possible to reproduce the original issue with the latest versions of pandas / scikit-learn / joblib. We will still work on a redesign of how the temporary folder cleanup happens in joblib with the loky backend so as to fix the two minimal reproducing examples reported as joblib/joblib#942 and joblib/joblib#944 but those specific cases do not seem to be triggered when using the scikit-learn estimator API. |
I was able to replicate this issue with the following error code: The issue appears to be somehow when loading the python 3.7.5 Here is the code:
Now load and pass to model:
Output:
Note, this does not appear to cause an error if you just use the arrays as is and don't save/load them. In my case, it's pretty expensive to create those arrays each time, which was why I was saving/loading them. |
Please try with scikit-learn 0.22
|
Same issue on Windows 10 (GridSearchCV and XGBoost) while using n_jobs=-1 in GSCV PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\Username\AppData\Local\Temp\joblib_memmapping_folder_3836_5434669858\3836-1959279810320-4711347bf8754e4c9388340d5d0b4491.pkl' python 3.6.6 |
This error is likely to be fixed by the work being currently done on the joblib/loky side. In the meantime, @mannyfin can you please edit your comment?
that will generate something easier to read: print(something)
|
i am running with scikit-learn==0.21.2 |
Hi, the same problem still continues today. Also, the problem is not specific to GridSearchCV. MultiOutputClassifier has the same problem. I use both GridSearchCV and MultiOutputClassifier and I tried all combinations of GridSearchCV and MultiOutputClassifier (e.g., both are n_jobs=-1, or only one of them is n_jobs=-1). For instance, if I make only MultiOutputClassifier n_jobs=-1, I receive that error immediately. As I make n_jobs smaller, e.g., -11, the error pops up at a later time. |
joblib 0.15.0 should fix the remaining |
Description - Parallelism(n_jobs =-1) in grid search cv is stopping with a permission error.
Steps/Code to Reproduce -
Expected Results : No error is expected
Actual Results
Versions
Windows-10-10.0.17134-SP0
Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:27:45) [MSC v.1900 64 bit (AMD64)]
NumPy 1.15.2
SciPy 1.1.0
Scikit-Learn 0.20.0
The text was updated successfully, but these errors were encountered: