Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dill load and sklearn clone result in error #1026

Open
DCoupry opened this issue Oct 23, 2023 · 7 comments
Open

dill load and sklearn clone result in error #1026

DCoupry opened this issue Oct 23, 2023 · 7 comments

Comments

@DCoupry
Copy link

DCoupry commented Oct 23, 2023

Dumping a skorch model with dill then reloading it (does not matter if with dill or pickle) makes it incompatible with sklearn.base.clone , apparently due to some attributes becoming empty - the optimizers I think, but Ihad no time to investigate further. This behaviour occurs neither with pickle or joblib.

This makes using functions such as cross_val_predict unuseable after loading a previously dumped model.

to reproduce:

python=3.10
tried with a bunch of versions for dill / torch / skorch / sklearn. all bug out.

from sklearn.datasets import make_regression
from sklearn.base import clone
import torch
import skorch
import dill

X, y = make_regression()
base_model = skorch.NeuralNetRegressor(torch.nn.Linear(100,1))
cloned_model = clone(base_model)
dumped_model = dill.loads(dill.dumps(base_model))
cloned_dumped_model = clone(dumped_model)

base_model.fit(X, y) # works
cloned_model .fit(X, y) # works
dumped_model .fit(X, y) # works
cloned_dumped_model .fit(X, y) # does not work
@BenjaminBossan
Copy link
Collaborator

I could not 100% reproduce the issue, thus I had to make some small changes:

from sklearn.datasets import make_regression
from sklearn.base import clone
import numpy as np
import torch
import skorch
import dill

dill.__version__  # 0.3.6

X, y = make_regression()
X, y = X.astype(np.float32), y.astype(np.float32).reshape(-1, 1)  # added
base_model = skorch.NeuralNetRegressor(torch.nn.Linear(100, 1))
cloned_model = clone(base_model)
dumped_model = dill.loads(dill.dumps(base_model))
cloned_dumped_model = clone(dumped_model)

base_model.fit(X, y) # works
cloned_model.fit(X, y) # works
dumped_model.fit(X, y) # THIS ALREADY FAILS FOR ME
cloned_dumped_model.fit(X, y) # fails with same error

First, could you please confirm that my snippet produces the same error for you?

Second, is the error you get also:

...

1225 self.notify("on_batch_begin", batch=batch, training=training)
1226 step = step_fn(batch, **fit_params)
-> 1227 self.history.record_batch(prefix + "_loss", step["loss"].item())
1228 batch_size = (get_len(batch[0]) if isinstance(batch, (tuple, list))
1229 else get_len(batch))
1230 self.history.record_batch(prefix + "_batch_size", batch_size)

TypeError: 'NoneType' object is not subscriptable

@DCoupry
Copy link
Author

DCoupry commented Oct 23, 2023

I can reproduce it, yes. And indeed the dumped version does die also.
I am confused as the dumped model did work for me at one point, but the cloned one did not. trying to refine this.

What does work is:

dumped_model = dill.loads(pickle.dumps(base_model))
dumped_model.fit(X, y)

the error is the same. the loss is None here. The process looks okay to me and goes through all the initializations, and I have tracked it to the train_step function, where if you print the optimizers you will get an empty list. But when you take the models themselves, and print the pre-fit atributes, everything looks good! quite frustrating.

@DCoupry
Copy link
Author

DCoupry commented Oct 23, 2023

okay, after some checks:

from sklearn.datasets import make_regression
from sklearn.base import clone
import numpy as np
import torch
import skorch
import dill
import pickle

dill.__version__  # 0.3.6

X, y = make_regression()
X, y = X.astype(np.float32), y.astype(np.float32).reshape(-1, 1)  # added
base_model = skorch.NeuralNetRegressor(torch.nn.Linear(100, 1))
cloned_model = clone(base_model)
dumped_model = dill.loads(dill.dumps(base_model))
dumped_fitted_model = dill.loads(dill.dumps(base_model.fit(X, y)))
cloned_dumped_model = clone(dumped_model)
cloned_dumped_fitted_model = clone(cloned_dumped_model)

base_model.fit(X, y) # works
cloned_model.fit(X, y) # works
dumped_model.fit(X, y) # fails
dumped_fitted_model.fit(X, y) # works
cloned_dumped_fitted_model.fit(X, y) # fails
cloned_dumped_model.fit(X, y) # fails with same error

@BenjaminBossan
Copy link
Collaborator

BenjaminBossan commented Oct 24, 2023

Thanks for investigating further. This is super strange IMO, because the _optimizers attribute is empty but _modules and _crtiteria are not empty, even though these three attributes are treated exactly the same. Do you know if dill uses __getstate__ and __setstate__ or if it has equivalent methods? Maybe we can salvage something there.

Edit: Just checked it, dill does call __getstate__ and __setstate__, which makes this even more confusing.

@DCoupry
Copy link
Author

DCoupry commented Oct 31, 2023

we could print a trace of execution with the final fit and diff across, maybe?

@BenjaminBossan
Copy link
Collaborator

Sorry, I don't understand. How can this be done?

@DCoupry
Copy link
Author

DCoupry commented Oct 31, 2023

I was thinking pdb might be of some help here, will report if I manage anything. In the meantime, I have found that dumping byref with dill solves the fail:

# works
dill.loads(dill.dumps(base_model, byref=True)).fit(X, y) 
clone(dill.loads(dill.dumps(base_model.fit(X, y), , byref=True))).fit(X, y)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants