Parallalism in gridsearcCV is ending up with a permission error #12546

Sai-Macharla · 2018-11-07T18:01:10Z

Description - Parallelism(n_jobs =-1) in grid search cv is stopping with a permission error.

Steps/Code to Reproduce -

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import TimeSeriesSplit , GridSearchCV
from sklearn.preprocessing import StandardScaler 
from sklearn.utils import parallel_backend
#Standardization of Data
X_Train_Vectors_Std = StandardScaler(with_mean = False).fit_transform(X_Train_Vectors)
X_test_Vectors_Std = StandardScaler(with_mean = False).fit_transform(X_test_Vectors)
#creating List of lambda values that are to be searched
lambdaList = [10**-4, 10**-2, 10**0, 10**2, 10**4]
time_split = TimeSeriesSplit(n_splits=5)
param_search= dict(C = lambdaList)
grid = GridSearchCV(estimator = LogisticRegression(solver='saga'), param_grid = param_search,n_jobs = -1, scoring = 'f1_weighted', cv=time_split.split(X_Train_Vectors_Std)
                          ,return_train_score = True )
grid.fit(X_Train_Vectors_Std,Y_Train)

Expected Results : No error is expected

Actual Results

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\disk.py:122: UserWarning: Unable to delete folder C:\Users\HANI\AppData\Local\Temp\joblib_memmapping_folder_13296_3875384810 after 5 tentatives.
  .format(folder_path, RM_SUBDIRS_N_RETRY))

---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
<ipython-input-6-c065dfe04993> in <module>()
      9 grid = GridSearchCV(estimator = LogisticRegression(solver='saga'), param_grid = param_search,n_jobs = -1, scoring = 'f1_weighted', cv=time_split.split(X_Train_Vectors_Std)
     10                           ,return_train_score = True )
---> 11 grid.fit(X_Train_Vectors_Std,Y_Train)

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
    720                 return results_container[0]
    721 
--> 722             self._run_search(evaluate_candidates)
    723 
    724         results = results_container[0]

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in __exit__(self, exc_type, exc_value, traceback)
    730 
    731     def __exit__(self, exc_type, exc_value, traceback):
--> 732         self._terminate_backend()
    733         self._managed_backend = False
    734 

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in _terminate_backend(self)
    760     def _terminate_backend(self):
    761         if self._backend is not None:
--> 762             self._backend.terminate()
    763 
    764     def _dispatch(self, batch):

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in terminate(self)
    524             # in latter calls but we free as much memory as we can by deleting
    525             # the shared memory
--> 526             delete_folder(self._workers._temp_folder)
    527             self._workers = None
    528 

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\disk.py in delete_folder(folder_path, onerror)
    113             while True:
    114                 try:
--> 115                     shutil.rmtree(folder_path, False, None)
    116                     break
    117                 except (OSError, WindowsError):

C:\ProgramData\Anaconda3\lib\shutil.py in rmtree(path, ignore_errors, onerror)
    492             os.close(fd)
    493     else:
--> 494         return _rmtree_unsafe(path, onerror)
    495 
    496 # Allow introspection of whether or not the hardening against symlink

C:\ProgramData\Anaconda3\lib\shutil.py in _rmtree_unsafe(path, onerror)
    387                 os.unlink(fullname)
    388             except OSError:
--> 389                 onerror(os.unlink, fullname, sys.exc_info())
    390     try:
    391         os.rmdir(path)

C:\ProgramData\Anaconda3\lib\shutil.py in _rmtree_unsafe(path, onerror)
    385         else:
    386             try:
--> 387                 os.unlink(fullname)
    388             except OSError:
    389                 onerror(os.unlink, fullname, sys.exc_info())

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\HANI\\AppData\\Local\\Temp\\joblib_memmapping_folder_13296_3875384810\\13296-2443532547352-7b8cd102e07c472ab00885ea9ca3e72d.pkl'

Versions

Windows-10-10.0.17134-SP0
Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:27:45) [MSC v.1900 64 bit (AMD64)]
NumPy 1.15.2
SciPy 1.1.0
Scikit-Learn 0.20.0

amueller · 2018-11-07T18:28:05Z

can you please try with current master?

Sai-Macharla · 2018-11-07T18:42:45Z

@amueller Thanks for the reply, do you mean to update the version? I dont know what is meant by current master.

amueller · 2018-11-07T18:55:51Z

Yes, update the version to the development version that's on github right now.

Sai-Macharla · 2018-11-08T02:40:03Z

I actually remembered that this is happening since i upgraded the version to 0.20 so i downgraded it and its working now. Thanks for the help

amueller · 2018-11-13T18:04:32Z

There are issues in 0.20.0 with joblib. Downgrading is a work-around but hopefully this will also be solved in 0.20.1 (to be release later this week) or the current development version

albertcthomas · 2018-11-14T14:59:07Z

Using Parallel with max_nbytes=None is also a possible work-around as this disables memmapping of large array. However I am not sure this is doable with GridSearchCV.

kliushenkov · 2018-12-28T14:19:32Z

Hi! I have same problem in version 0.20.2. How can I solve this problems?

albertcthomas · 2018-12-28T14:23:37Z

@tolikkansk it would be great if you could provide a small example reproducing the PermissionError? Also do you know if it's a random failure or if it happens every time you run your code?

kliushenkov · 2018-12-28T14:41:36Z

@albertcthomas Today this failure happens every time you run your code today. I ran into this problem yesterday firstly, before scrpit worked correctly, also I added part of pipeline which I run from main.py:

Windows-10 HOME v.1709 16299.847
Python 3.6.4 |Anaconda custom (64-bit)| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)] on win32
Scikit-Learn 0.20.2

albertcthomas · 2018-12-28T14:50:03Z

Thanks! Instead of sending pictures, readability and reusability of your code can be greatly improved if you format your code snippets and complete error messages appropriately. For example:

```python
print(something)
```

generates:

print(something)

And:

```pytb
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'hello'
```

generates:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'hello'

Also, if the code is very long you can link to it like this.

You can edit your comments at any time to improve readability. This helps maintainers a lot.

lucascolz · 2019-01-23T09:18:59Z

Is there a solution for this issue?

@amueller @albertcthomas I am working with the downgraded version 0.19.2, as mentioned by @Sai-Macharla, and so far so good. But both for 0.20.1 and .2 this issue wasnt resolved, right?

albertcthomas · 2019-01-23T09:43:01Z

I don't think it's resolved in either 0.20.1 or 0.20.2.
Is it possible for you to give us a reproducible example?

albertcthomas · 2019-01-23T09:48:23Z

Also I don't know what makes it work in 0.19.2 but not in 0.20.2? The default joblib backend is now loky but this seems to be an issue related to memmaping.

jnothman · 2019-01-23T10:03:38Z

Currently that code in disk.py catches (OSError, WindowsError). Would it be reasonable (even without understanding the source of the problem) to just add PermissionError so that at least this is handled with a retry rather than immediate failure?

albertcthomas · 2019-01-23T10:21:27Z

Actually PermissionError is an OSError (according to python doc) and the PermissionError is raised after 5 attempts to delete the folder according to the user warning:

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\disk.py:122:
UserWarning: Unable to delete folder C:\Users\HANI\AppData\Local\Temp\joblib_memmapping_folder_13296_3875384810
after 5 tentatives.

From what I experimented you sometimes need to wait a few minutes before being able to delete the folder, see this comment in the related joblib issue joblib/joblib#806

jnothman · 2019-01-23T10:23:47Z

Sorry, I missed the warning.

lucascolz · 2019-01-23T11:51:29Z

Is it possible for you to give us a reproducible example?

I am working with the APS SCANIA dataset and sklearn 0.20.2

import numpy as np
import pandas as pd
import time

import sklearn
print(sklearn.__version__)


df_train = pd.read_csv("./aps_failure_training_set.csv", na_values=["na"])
df_test = pd.read_csv("./aps_failure_test_set.csv", na_values=["na"])
# ---
df_train['class'] = (df_train["class"] == "pos").astype("int")
df_test['class'] = (df_test["class"] == "pos").astype("int")
# ---
y_train = df_train['class']
y_test = df_test['class']

X_train = df_train.drop('class', axis=1)
X_test = df_test.drop('class',axis=1)
# ---
X_train.fillna(X_train.mean(), inplace=True)
X_test.fillna(X_test.mean(), inplace=True)# ---

def undersample(df_X, df_y):
    # Get number of positive class
    num_pos = len(df_y[df_y == 1])
    # Get a list of numbers of rows with neg values
    indices_neg = df_y[df_y == 0].index
    # Choose randomly a number of values from the neg list
    num_draws = num_pos
    random_indices = np.random.choice(indices_neg, num_draws, replace=False)

    # Get the list of indices with pos values to use
    indices_pos = df_y[df_y == 1].index

    # List with the undersample indices
    under_sample_indices = np.concatenate([indices_pos, random_indices])

    # Extract undersaample values from dataframe
    X_undersample = df_X.loc[under_sample_indices]
    print(X_undersample.shape)
    y_undersample = df_y[under_sample_indices]
    print(y_undersample.shape)

    return (X_undersample, y_undersample)

# ---

df_XX = pd.DataFrame(X_train)
X_train_under, y_train_under = undersample(df_XX, y_train)


from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

from sklearn.ensemble import RandomForestClassifier

print("First training, model with n_jobs=1")

clf = RandomForestClassifier()
gs = RandomizedSearchCV(clf,param_distributions={"n_estimators": np.arange(5, 50, 5),"max_depth": np.arange(5, 8, 1)}, n_iter=10, cv=5, scoring="accuracy", verbose=1,n_jobs=-1 )
start_1 = time.time()
gs.fit(X_train_under, y_train_under)

print("Best params", gs.best_params_)
best_clf = gs.best_estimator_
print("results at each iteration:", gs.cv_results_['mean_test_score'])
print("Took %s seconds" % (time.time()-start_1))

This returns me the following error:

C:\Program Files\Python36\lib\site-packages\sklearn\externals\joblib\disk.py:122: UserWarning: Unable to delete folder C:\Users\kjn-lc\AppData\Local\Temp\joblib_memmapping_folder_7712_1434446855 after 5 tentatives.
  .format(folder_path, RM_SUBDIRS_N_RETRY))

albertcthomas · 2019-01-23T12:13:01Z

Thanks a lot @lucascolz!

albertcthomas · 2019-01-23T16:17:15Z

Interestingly,

import numpy as np
import pandas as pd
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier


for _ in range(10):
    X_train = np.random.rand(int(2e6)).reshape((int(1e6), 2))
    y_train = np.random.randint(0, 2, int(1e6))
    X_train = pd.DataFrame(X_train)

    clf = RandomForestClassifier()
    gs = RandomizedSearchCV(
        clf,
        param_distributions={"n_estimators": np.array([1]),
                                           "max_depth": np.array([2])},
        n_iter=1,
        cv=2,
        scoring="accuracy",
        verbose=1,
        n_jobs=2
    )

    gs.fit(X_train, y_train)

always fails (never at the first iteration of the for loop). Note the use of a pandas dataframe for X_train.

However when X_train is a numpy array

import numpy as np
import pandas as pd
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier


for _ in range(10):
    X_train = np.random.rand(int(2e6)).reshape((int(1e6), 2))
    y_train = np.random.randint(0, 2, int(1e6))

    clf = RandomForestClassifier()
    gs = RandomizedSearchCV(
        clf,
        param_distributions={"n_estimators": np.array([1]),
                                           "max_depth": np.array([2])},
        n_iter=1,
        cv=2,
        scoring="accuracy",
        verbose=1,
        n_jobs=2
    )

    gs.fit(X_train, y_train)

does not fail.

albertcthomas · 2019-01-23T16:21:07Z

@lucascolz do you see the error every time you run your code?

Can you try by passing numpy arrays instead of pandas dataframes?

X_train_under = X_train_under.values
y_train_under = y_train_under.values

lucascolz · 2019-01-24T10:17:48Z

@albertcthomas
I tried today the same code I have provided and also both codes you have provided with and without the use of a dataframe structure.
All of them ran without errors in a different computer, which is strange.

When I am able to use again the same computer, I can tell if the error persists.

albertcthomas · 2019-01-24T10:52:52Z

thanks @lucascolz.

lucascolz · 2019-01-24T23:02:20Z

@albertcthomas Thanks for providing the snippets.
By me both pieces of code, with numpy arrays and dataframes I get an error.

With dataframe as X_train:

Fitting 2 folds for each of 1 candidates, totalling 2 fits
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.5s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.5s finished
C:\Users\lucas\Anaconda3\lib\site-packages\sklearn\externals\joblib\disk.py:122: UserWarning: Unable to delete folder C:\Users\lucas\AppData\Local\Temp\joblib_memmapping_folder_113764_4112142399 after 5 tentatives.
  .format(folder_path, RM_SUBDIRS_N_RETRY))

The run with numpy arrays:

Fitting 2 folds for each of 1 candidates, totalling 2 fits
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.4s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.4s finished
C:\Users\lucas\Anaconda3\lib\site-packages\sklearn\externals\joblib\disk.py:122: UserWarning: Unable to delete folder C:\Users\lucas\AppData\Local\Temp\joblib_memmapping_folder_113764_4112142399 after 5 tentatives.
  .format(folder_path, RM_SUBDIRS_N_RETRY))

In scikit-learn 0.19.2, I tried running the code you provided, so:

import numpy as np
import pandas as pd
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
import sklearn

print(sklearn.__version__)
for _ in range(10):
    X_train = np.random.rand(int(2e6)).reshape((int(1e6), 2))
    y_train = np.random.randint(0, 2, int(1e6))
    X_train = pd.DataFrame(X_train)

    clf = RandomForestClassifier()
    gs = RandomizedSearchCV(
        clf,
        param_distributions={"n_estimators": np.array([1]),
                                           "max_depth": np.array([2])},
        n_iter=1,
        cv=2,
        scoring="accuracy",
        verbose=1,
        n_jobs=-1
    )

    gs.fit(X_train, y_train)

It runs the random search without error.

Have to check the library dependancies, as in another computer it run smoothly with 0.20.

If you have any idea what I can still try, I am open to suggestions. In the meanwhile, I will continue with 0.19

albertcthomas · 2019-01-24T23:45:39Z

Thanks for the report.

albertcthomas · 2019-01-25T08:32:38Z

@lucascolz just to confirm: on your other computer you are also using Windows?

lucascolz · 2019-01-25T08:54:28Z

@albertcthomas Yes, I am using Windows 10 64 bits in all machines.

albertcthomas · 2019-01-25T09:57:12Z

Actually, when in the same ipython session I first run the snippet with X_train as a pandas dataframe and then the snippet with X_train as a numpy array I get the error for both. But if quit the ipython session, restart a new one and run the snippet with X_train as a numpy array I don't have an error.

lucascolz · 2019-01-25T11:44:49Z

Actually, when in the same ipython session, I first run the snippet with X_train as a pandas dataframe and then the snippet with X_train as a numpy array I get the error for both. But if quit the ipython session, restart a new one and run the snippet with X_train as a numpy array I don't have an error.

This tends to happen to me also. I think I will stick to the numpy array structure for as long as I can, or use the 0.19. I wasnt able to debug why exactly this problem happens.
Thanks @albertcthomas for your time.

albertcthomas · 2019-01-25T12:38:32Z

This tends to happen to me also. I think I will stick to the numpy array structure for as long as I can

So you are saying that when you use numpy arrays you don't have the permission error?

Thanks @lucascolz for helping us investigating this issue

lucascolz · 2019-01-25T14:30:26Z

Yes, in my last tests it happened the same as you mentioned. Numpy arrays work (at least on the implementation I am working ) and if I try a dataframe, it tends to raise an error after the fifth iteraction.

albertcthomas · 2019-01-25T14:42:36Z

This might be useful information for joblib dev @ogrisel @tomMoral

albertcthomas · 2019-01-26T13:19:26Z

@Sai-Macharla could you let us know if you were working with pandas dataframes or numpy arrays (for X_Train_Vectors or X_Test_Vectors) when you had the PermissionError?

albertcthomas · 2019-01-26T14:02:53Z

@Sai-Macharla could you let us know if you were working with pandas dataframe or numpy arrays (for X_Train_Vectors or X_Test_Vectors) when you had the PermissionError?

Actually even if X_Train_Vectors is a pandas dataframe, X_Train_Vectors_Std should be a numpy array. Maybe Y_train is a pandas dataframe?

draimundo · 2019-04-28T09:38:49Z

I also have this problem, using
numpy 1.16.3,scipy 1.2.1,scikit-learn 0.20.3,python 3.7.3.

My workaround was to comment out delete_folder(self._workers._temp_folder):

    def terminate(self):
        if self._workers is not None:
            # Terminate does not shutdown the workers as we want to reuse them
            # in latter calls but we free as much memory as we can by deleting
            # the shared memory
            #delete_folder(self._workers._temp_folder)
            self._workers = None

        self.reset_batch_stats()

in \sklearn\externals\joblib\_parallel_backends.py. This is a quick and dirty fix, but at least I get my results this way (the crash happening at the end, when you trained your model for 12h is very frustrating), and I delete the folder in \AppData\Local\Temp\ myself, which didn't pose any problem until now. Just writing this here if anyone is in the same situation, this is clearly not a good fix.

jnothman · 2019-04-28T09:41:24Z

Please offer your comment at joblib.

ariezra · 2019-05-27T06:24:09Z

I have this error on Python 3.6.8 scikit-learn 0.21.1 and joblib 0.13.2 Window 64
I did not find solution to fix it.
Can you help?

jnothman · 2019-05-27T07:44:11Z

Downgrading joblib to 0.11 might be the simplest fix

rth · 2019-05-27T07:47:48Z

This issue is tracked upstream in joblib/joblib#806 please add any additional comments there instead.

ariezra · 2019-05-27T21:49:30Z

Thank you very much. It works with joblib 0.11 Le lun. 27 mai 2019 à 09:49, Roman Yurchak <notifications@github.com> a écrit :

…

This issue is tracked upstream in joblib/joblib#806 <joblib/joblib#806> please add any additional comment there instead. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#12546?email_source=notifications&email_token=AJSQZF7CDWS3SGG5FD5E6ADPXOHBJA5CNFSM4GCLYTM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWJBY5Q#issuecomment-496114806>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJSQZF53SSTGQO24JL47Y5LPXOHBJANCNFSM4GCLYTMQ> .

dhbrand · 2019-09-20T18:31:09Z

Changing the backend to 'threading' worked for me:

from sklearn.utils import parallel_backend
with parallel_backend('threading'):
    grid_result = grid.fit(X_train, y_train)

albertcthomas · 2019-09-21T09:00:42Z

@dhbrand 'threading' suffers from the Python Global Interpreter Lock.

ogrisel · 2019-10-02T13:25:51Z

Yes, in my last tests it happened the same as you mentioned. Numpy arrays work (at least on the implementation I am working ) and if I try a dataframe, it tends to raise an error after the fifth iteraction.

This might be useful information for joblib dev @ogrisel @tomMoral

My intuition is that pandas is generating cyclic references to the memmapped large numpy array which is therefore collected with a delay. Let me try to write a minimal reproduction case.

albertcthomas · 2019-10-02T13:52:11Z

@ogrisel This comment above contains a reproduction case, but we might be able to have a smaller one, only involving joblib.

ogrisel · 2019-10-02T13:54:23Z

Yes this is what I am looking for.

albertcthomas · 2019-10-02T14:07:22Z

I cannot reproduce the error in the reproduction case above with scikit-learn 0.21.3 and joblib 0.14.0...

ogrisel · 2019-10-02T17:22:52Z

I have set-up a windows VM to debug this but unfortunately I cannot get: #12546 (comment) to fail on this machine which is going to make things harder to debug and fix on my end.

I will still try to come up with a blind minimal reproducing example with the reference cycle hypothesis.

albertcthomas · 2019-10-02T20:16:51Z

Yes I cannot reproduce the error either. I also tried with previous scikit-learn versions (0.20 and 0.19).

ogrisel · 2019-10-03T08:58:31Z

The fact that I cannot reproduce with a VM might be caused by the fact that memory mapped files might behave differently in a VM.

I will try to reproduce with a CI worker in this PR: joblib/joblib#942

albertcthomas · 2019-10-03T14:56:12Z

I think we can close this issue as I cannot reproduce on Windows (with scikit-learn 0.21.3, pandas 0.25.1 and joblib 0.14.0) and the associated test was not failing on Appveyor in the related joblib PR (see joblib/joblib#942).

ogrisel · 2019-10-04T07:43:02Z

Indeed it seems that it's no longer possible to reproduce the original issue with the latest versions of pandas / scikit-learn / joblib. We will still work on a redesign of how the temporary folder cleanup happens in joblib with the loky backend so as to fix the two minimal reproducing examples reported as joblib/joblib#942 and joblib/joblib#944 but those specific cases do not seem to be triggered when using the scikit-learn estimator API.

mannyfin · 2019-12-31T18:48:52Z

I was able to replicate this issue with the following error code: PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\MYUSERNAME\\AppData\\Local\\Temp\\joblib_memmapping_folder_6760_6314942242\\6760-2628857107848-f5d50da7bebd405da639878ca41dafb0.pkl'

The issue appears to be somehow when loading the npy files. It does not occur if you create them in memory. See reproducible code below. I didn't include the imports since code is already long.

python 3.7.5
scikit-learn 0.21.3
pandas 0.25.3
numpy 1.17.4
joblib 0.14.1
PyCharm Pro 2019.2 using a script in debug mode.

Here is the code:
First I created and saved the feature arrays:

                arr = np.random.uniform(size=(l, l))
                indices = np.random.choice(np.arange(arr.size), replace=False,
                                           size=int(arr.size * 0.05))
                arr[np.unravel_index(indices, arr.shape)] = 0
                np.fill_diagonal(arr, 1)
                arr = np.tril(arr) + np.tril(arr, -1).T
                return arr
            f = 250
            arr1 = make_array(f)
            arr2 = make_array(f)
            arr3 = make_array(f)
            ytrain = np.random.randint(1, 8, (f,))
            #save arrays: Comment out next 4 lines and no error is thrown.
            np.save('arr1.npy', arr1)
            np.save('arr2.npy', arr2)
            np.save('arr3.npy', arr3)
            np.save('ytrain.npy', ytrain)

Now load and pass to model:

class ClfSwitcher(BaseEstimator, TransformerMixin):

    def __init__(self,  estimator):
        """
        A Custom BaseEstimator that can switch between classifiers.
        :param estimator: sklearn object - The classifier
        """
        self.estimator = estimator

    def fit(self, X, y=None):
        self.estimator.fit(X, y)
        return self

    def predict(self, X, y=None):
        return self.estimator.predict(X)

    def predict_proba(self, X):
        return self.estimator.predict_proba(X)

    def score(self, X, y):
        return self.estimator.score(X, y)


class SampleExtractor(BaseEstimator, TransformerMixin):
    def __init__(self, idx):
        self.idx = idx

    def fit(self, X):
        return X[:, :, self.idx]

    def transform(self, X):
        return X[:, :, self.idx]

    def fit_transform(self, X, y=None, **fit_params):
        self.fit(X)
        return self.transform(X)

wght = [0.1, 0.25, 0.5, 0.75, 1]
weights = [[i, j, k] for i in wght for j in wght for k in wght]

    classifier1 = RandomForestClassifier()
    classifier2 = GaussianProcessClassifier()
    classifier3 = AdaBoostClassifier()

    s_pipe = Pipeline([('s_sample', SampleExtractor(0)),
                            ('s_scaler', StandardScaler()),
                            ('s_classifier', ClfSwitcher(estimator=OneVsRestClassifier(classifier1)))])

    j_pipe = Pipeline([('s_sample', SampleExtractor(1)),
                             ('j_scaler', StandardScaler()),
                             ('j_classifier', ClfSwitcher(estimator=OneVsRestClassifier(classifier2)))])
    o_pipe = Pipeline([('s_sample', SampleExtractor(2)),
                             ('overlap_scaler', StandardScaler()),
                             ('overlap_classifier', ClfSwitcher(estimator=OneVsRestClassifier(classifier3)))])

    eclf1 = Pipeline([('voting', VotingClassifier(estimators=[('pipe1', s_pipe ), ('pipe2', j_pipe),
                                                              ('pipe3', o_pipe)],
                                                  voting='soft')
                       )])
    if kwargs['do_grid_search']:

        param_grid = {'voting__weights': weights,
                      'voting__pipe1__s_classifier__estimator__estimator__max_depth': [5],
                      'voting__pipe1__s_classifier__estimator__estimator__n_estimators': [5],
                      'voting__pipe1__s_classifier__estimator__estimator__max_features': ['auto'],
                      'voting__pipe2__j_classifier__estimator__estimator__kernel': [1.0 * RBF(1.0)],
                      'voting__pipe3__o_classifier__estimator__estimator__n_estimators': [5],
                      }

        grid_search = GridSearchCV(eclf1, param_grid=param_grid, cv=3, n_jobs=6, verbose=1,
                                   scoring='f1_micro', error_score=-1)
        # Load files: Comment out next 4 lines and no error should be thrown
        arr1 = np.squeeze(np.load('arr1.npy'))
        arr2 = np.squeeze(np.load('arr2.npy'))
        arr3 = np.squeeze(np.load('arr3.npy'))
        ytrain = np.load('ytrain.npy')
        xtrain = np.transpose(np.array([arr1, arr2, arr3]), (1, 2, 0))
        grid_search.fit(xtrain, ytrain)  # error occurs here

Output:

Fitting 3 folds for each of 125 candidates, totalling 375 fits
[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done  38 tasks      | elapsed:   19.6s
[Parallel(n_jobs=6)]: Done 188 tasks      | elapsed:  1.1min
[Parallel(n_jobs=6)]: Done 375 out of 375 | elapsed:  2.0min finished
C:\Anaconda\envs\similarity\lib\site-packages\sklearn\model_selection\_search.py:814: DeprecationWarning: The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal.
  DeprecationWarning)
C:\Anaconda\envs\similarity\lib\site-packages\joblib\disk.py:122: UserWarning: Unable to delete folder C:\Users\MYUSERNAME\AppData\Local\Temp\joblib_memmapping_folder_6760_6314942242 after 5 tentatives.
  .format(folder_path, RM_SUBDIRS_N_RETRY))
Traceback (most recent call last):
  File "C:\Anaconda\envs\similarity\lib\site-packages\joblib\disk.py", line 115, in delete_folder
    shutil.rmtree(folder_path, False, None)
  File "C:\Anaconda\envs\similarity\lib\shutil.py", line 516, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "C:\Anaconda\envs\similarity\lib\shutil.py", line 400, in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
  File "C:\Anaconda\envs\similarity\lib\shutil.py", line 398, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\MYUSERNAME\\AppData\\Local\\Temp\\joblib_memmapping_folder_6760_6314942242\\6760-2628857107848-f5d50da7bebd405da639878ca41dafb0.pkl'

Note, this does not appear to cause an error if you just use the arrays as is and don't save/load them. In my case, it's pretty expensive to create those arrays each time, which was why I was saving/loading them.

jnothman · 2020-01-01T00:10:20Z

Please try with scikit-learn 0.22

malyginiv · 2020-01-04T12:53:40Z

Same issue on Windows 10 (GridSearchCV and XGBoost) while using n_jobs=-1 in GSCV

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\Username\AppData\Local\Temp\joblib_memmapping_folder_3836_5434669858\3836-1959279810320-4711347bf8754e4c9388340d5d0b4491.pkl'

python 3.6.6
scikit-learn 0.22.1
numpy 1.18.0
pandas 0.22.0
joblib 0.14.1
xgboost 0.90

albertcthomas · 2020-01-05T11:28:10Z

This error is likely to be fixed by the work being currently done on the joblib/loky side.

In the meantime, @mannyfin can you please edit your comment?

Using
```
print(something)
```

that will generate something easier to read:

print(something)

Adding the missing imports and the missing parts in the creation of the arrays.
Thanks!

NachiLieder · 2020-03-04T09:29:35Z

Thank you very much. It works with joblib 0.11 Le lun. 27 mai 2019 à 09:49, Roman Yurchak notifications@github.com a écrit :
…
This issue is tracked upstream in joblib/joblib#806 <joblib/joblib#806> please add any additional comment there instead. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#12546?email_source=notifications&email_token=AJSQZF7CDWS3SGG5FD5E6ADPXOHBJA5CNFSM4GCLYTM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWJBY5Q#issuecomment-496114806>, or mute the thread https://github.com/notifications/unsubscribe-auth/AJSQZF53SSTGQO24JL47Y5LPXOHBJANCNFSM4GCLYTMQ .

i am running with scikit-learn==0.21.2
and it didnt work

DrCMY · 2020-04-29T11:58:11Z

Hi, the same problem still continues today. Also, the problem is not specific to GridSearchCV. MultiOutputClassifier has the same problem. I use both GridSearchCV and MultiOutputClassifier and I tried all combinations of GridSearchCV and MultiOutputClassifier (e.g., both are n_jobs=-1, or only one of them is n_jobs=-1). For instance, if I make only MultiOutputClassifier n_jobs=-1, I receive that error immediately. As I make n_jobs smaller, e.g., -11, the error pops up at a later time.
I am also on Windows 10 and have the latest packages such as joblib 0.14.1.

ogrisel · 2020-05-15T19:04:57Z

joblib 0.15.0 should fix the remaining PermissionError on Windows.

albertcthomas mentioned this issue Nov 9, 2018

Windows permission error joblib/joblib#806

Closed

albertcthomas mentioned this issue Oct 3, 2019

Random PermissionError on windows when using automatic memory mapping caused by reference cycles joblib/joblib#942

Closed

ogrisel closed this as completed Oct 4, 2019

Parallalism in gridsearcCV is ending up with a permission error #12546

Parallalism in gridsearcCV is ending up with a permission error #12546

Comments

Sai-Macharla commented Nov 7, 2018 • edited by ogrisel

Description - Parallelism(n_jobs =-1) in grid search cv is stopping with a permission error.

Steps/Code to Reproduce -

Expected Results : No error is expected

Actual Results

Versions

amueller commented Nov 7, 2018

Sai-Macharla commented Nov 7, 2018

amueller commented Nov 7, 2018

Sai-Macharla commented Nov 8, 2018 • edited

amueller commented Nov 13, 2018

albertcthomas commented Nov 14, 2018

kliushenkov commented Dec 28, 2018 • edited

albertcthomas commented Dec 28, 2018

kliushenkov commented Dec 28, 2018 • edited

albertcthomas commented Dec 28, 2018

lucascolz commented Jan 23, 2019

albertcthomas commented Jan 23, 2019

albertcthomas commented Jan 23, 2019

jnothman commented Jan 23, 2019

albertcthomas commented Jan 23, 2019

jnothman commented Jan 23, 2019

lucascolz commented Jan 23, 2019 • edited

albertcthomas commented Jan 23, 2019

albertcthomas commented Jan 23, 2019 • edited

albertcthomas commented Jan 23, 2019

lucascolz commented Jan 24, 2019

albertcthomas commented Jan 24, 2019

lucascolz commented Jan 24, 2019

albertcthomas commented Jan 24, 2019

albertcthomas commented Jan 25, 2019

lucascolz commented Jan 25, 2019

albertcthomas commented Jan 25, 2019 • edited

lucascolz commented Jan 25, 2019

albertcthomas commented Jan 25, 2019

lucascolz commented Jan 25, 2019

albertcthomas commented Jan 25, 2019

albertcthomas commented Jan 26, 2019 • edited

albertcthomas commented Jan 26, 2019 • edited

draimundo commented Apr 28, 2019

jnothman commented Apr 28, 2019 via email

ariezra commented May 27, 2019 • edited

jnothman commented May 27, 2019 via email

rth commented May 27, 2019 • edited

ariezra commented May 27, 2019 via email

dhbrand commented Sep 20, 2019

albertcthomas commented Sep 21, 2019

ogrisel commented Oct 2, 2019 • edited

albertcthomas commented Oct 2, 2019

ogrisel commented Oct 2, 2019

albertcthomas commented Oct 2, 2019 • edited

ogrisel commented Oct 2, 2019

albertcthomas commented Oct 2, 2019

ogrisel commented Oct 3, 2019

albertcthomas commented Oct 3, 2019

ogrisel commented Oct 4, 2019 • edited

mannyfin commented Dec 31, 2019 • edited

jnothman commented Jan 1, 2020 via email

malyginiv commented Jan 4, 2020 • edited

albertcthomas commented Jan 5, 2020 • edited

NachiLieder commented Mar 4, 2020

DrCMY commented Apr 29, 2020

ogrisel commented May 15, 2020

Sai-Macharla commented Nov 7, 2018 •

edited by ogrisel

Sai-Macharla commented Nov 8, 2018 •

edited

kliushenkov commented Dec 28, 2018 •

edited

kliushenkov commented Dec 28, 2018 •

edited

lucascolz commented Jan 23, 2019 •

edited

albertcthomas commented Jan 23, 2019 •

edited

albertcthomas commented Jan 25, 2019 •

edited

albertcthomas commented Jan 26, 2019 •

edited

albertcthomas commented Jan 26, 2019 •

edited

ariezra commented May 27, 2019 •

edited

rth commented May 27, 2019 •

edited

ogrisel commented Oct 2, 2019 •

edited

albertcthomas commented Oct 2, 2019 •

edited

ogrisel commented Oct 4, 2019 •

edited

mannyfin commented Dec 31, 2019 •

edited

malyginiv commented Jan 4, 2020 •

edited

albertcthomas commented Jan 5, 2020 •

edited