Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to run Ridge() without memmap? #57

Closed
renierts opened this issue Jan 21, 2022 · 2 comments · Fixed by #144
Closed

Is it possible to run Ridge() without memmap? #57

renierts opened this issue Jan 21, 2022 · 2 comments · Fixed by #144
Assignees
Labels
enhancement New feature or request question Further information is requested
Projects

Comments

@renierts
Copy link

Hello,

many thanks for providing your great library. I tried to develop an adapter class for ReservoirPy so that it fits the API of scikit-learn. It seems as if this works for everything but for the Ridge() node, where I receive the following error:

  File ".virtualenv2\lib\site-packages\reservoirpy\node.py", line 555, in create_buffer
    self._buffers[name] = memmap_buffer(self, data=data,
  File ".virtualenv2\lib\site-packages\reservoirpy\utils\parallel.py", line 75, in memmap_buffer
    memmap = np.memmap(temp, shape=shape, mode=mode, dtype=dtype)
  File ".virtualenv2\lib\site-packages\numpy\core\memmap.py", line 228, in __new__
    f_ctx = open(os_fspath(filename), ('r' if mode == 'c' else mode)+'b')
OSError: [Errno 22] Invalid argument: 'C:\\Users\\UserName\\AppData\\Local\\Temp\\reservoirpy-temp\\Ridge-0XXT'

I think, the problem arises, because the model selection tools by scikit-learn copy an estimator. Is it possible to deactivate memmap for the Ridge Node?

@nTrouvain
Copy link
Collaborator

Hello Peter !

For now, memmap are required to use the Ridge node, as it was designed to allow parallel computation of linear regression. This parallel computation relies on shared arrays between processes, and the only safe and easy way to do this is to use memory mapped objects. This behavior will probably change in the future, as memory mapped arrays are not well supported on all platforms.

Could you provide a more explicit example of what you are trying to do?

Also, thank you very much for your interest in ReservoirPy. As adding a scikit-learn adapter is part of the library future features plan, do not hesitate to ask for help, submit your code through a pull request and suggest any change in the current code. We would be really happy to count you among the contributors!

@nTrouvain nTrouvain added enhancement New feature or request question Further information is requested labels Jan 24, 2022
@nTrouvain nTrouvain added this to Issues in v0.3 via automation Jan 24, 2022
@nTrouvain nTrouvain self-assigned this Jan 24, 2022
@renierts
Copy link
Author

renierts commented Jan 24, 2022

Hi Nathan!

Thanks for your explanation. I already thought that this was the reason for using memory mapping.

In PyRCN, we use base objects (scikit-learn BaseEstimator etc.) to define our building blocks. The advantage is now that we can use e.g. RandomizedSearchCV for hyperparameter tuning.

A very simple example of what I want to do is the following code snippet (from RandomizedSearchCV):

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform


iris = load_iris()
logistic = LogisticRegression(solver='saga', tol=1e-2, max_iter=200,
                              random_state=0)
distributions = dict(C=uniform(loc=0, scale=4),
                     penalty=['l2', 'l1'])
clf = RandomizedSearchCV(logistic, distributions, random_state=0)  # TODO: replace logistic by an ESN
search = clf.fit(iris.data, iris.target)
search.best_params_

Now, I need an adapter so that I can replace logistic by an ESN from reservoirpy. And this seems to work fine for everything but the memory mapping. The problem, is that RandomizedSearchCV copies the object to get optimized. I assume that now multiple instances are trying to access the memmapped files in the same time.

Do you have an offline regression node that allows to sequentially fit the linear regression? This would already solve my problem.

Regarding your future plan to add an adapter between reservoirpy and scikit-learn - I will definitely provide you the adapter as soon as it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
v0.3
Issues
Development

Successfully merging a pull request may close this issue.

2 participants