Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for saving and loading models #377

Closed
wants to merge 16 commits into from

Conversation

NegatioN
Copy link

Is this something you would want in the project @maciejkula ?
This is just a rough outline of some code I have laying around to do something like this, so it might need to be fixed up a bit and whatnot, but I'd rather do that after I know your opinion :)

And if it already exists, it's a bit embarrassing but I didn't find it! :)

@maciejkula
Copy link
Collaborator

This is not strictly correct, as it doesn't save and restore model hyperparameters.

Based on the principle that pickling is dangerous and inefficient, it may be useful to have something like this. Would you be willing to work on this a little more to whip it into shape?

@NegatioN NegatioN force-pushed the master branch 3 times, most recently from 555d930 to bef356a Compare November 4, 2018 11:29
@NegatioN
Copy link
Author

NegatioN commented Nov 4, 2018

I'm not sure if it's smart to try handling the user/item mappings found in the default dataset implementation: https://github.com/lyst/lightfm/blob/master/lightfm/data.py#L135

It definitely adds friction that it's handled outside of the save/load functions, but I think they need to be integrated tighter to make it easy to include in the numpy-object.

Other than that, both vectors and hyperparams should be taken into consideration now @maciejkula :)

Seems like black-tests are passing on my machine, so I'm not quite sure what to make of that.

@NegatioN
Copy link
Author

@maciejkula I've fixed the formatting errors now, would you mind taking a look? Seems like the current error is related to imports in an untouched file.

@DPGrev
Copy link
Contributor

DPGrev commented Nov 26, 2018

@NegatioN It is because your branch is behind the lyst/lightfm master branch.

In 9090456 sklearn got updated. Which means that RandomizedSearchCV is now part of sklearn.model_selection. This is not the case for your submitted pull request since you are behind on the current master.

@maciejkula
Copy link
Collaborator

Looks good, but I have two more comments.

Firstly, what do you think about making load a class method? I think it might be clearer to not need to instantiate a model class first, and then completely override it. This way, instead of

model = LightFM()
model.load(path)

we would have

model = LightFM.load(path)

Secondly, can I ask for one more test please, along the following lines:

  1. Fit a model.
  2. Evaluate it.
  3. Save and load it.
  4. Evaluate again. The metrics should be exactly the same.

@NegatioN
Copy link
Author

NegatioN commented Dec 4, 2018

Build is now failing because (of what I can only assume), there is something wrong with the how the caching is done in CircleCI. I can fix the commit history after this issue is resolved.

It either fails because it has a definition of .load() defined which it think takes two inputs (as both a classmethod and staticmethod it only takes one), and if I rename the function to load_uncached to force through some way to un-cache this, I get the error-message: AttributeError: type object 'LightFM' has no attribute 'load_uncached'.

Please correct me if I'm wrong, but I don't think I can help fixing this.

Copy link
Collaborator

@maciejkula maciejkula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleared the caches and restarted the build. Let's see what happens.

if not callable(ob) and not x.startswith("__"):
assert x in saved_model_params

_cleanup()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the test fails, this is never executed, and is never cleaned up. Could you use pytest fixtures for setup/teardown?

Copy link
Author

@NegatioN NegatioN Dec 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done now. Needs another cache-clean. Also, if you do end up merging this, just squash everything. I don't think it makes sense to keep any of the history except the initial commit.

@NegatioN
Copy link
Author

NegatioN commented Jan 7, 2019

I understand that this has been long in the making, but I believe I have addressed the changes you wanted @maciejkula ? It just needs another cache refresh

@maciejkula
Copy link
Collaborator

Thanks for your patience. I poked Circle and will have another look.

@rth
Copy link

rth commented Mar 11, 2019

Based on the principle that pickling is dangerous and inefficient, it may be useful to have something like this.

Loading pickled models from unknown sources should never happen in practice I think? Also note that numpy.load (used in this PR) has allow_pickle=True by default, so that is not really safer than regular pickling. The inefficient part is not true if one uses joblib.dump / load (e.g. used in scikit-learn).

Overall, I'm not saying that is PR is not useful (thanks for working on it @NegatioN), just wanted to add more context. In the long term, ONNX export (https://github.com/onnx/onnx) might be a more production ready and language agnostic solution.

@NegatioN
Copy link
Author

NegatioN commented Mar 11, 2019

@rth ONNX is definitely better. I guess it should be possible to translate the whole model structure as well, so even if the lyst model definitions change, they are still compatible. And exporting models to different frameworks for serving or training could be interesting too.

I do believe the numpy save/load functionality here would work with allow_pickle=False too though, as we're only really persisting floats and strings.

I'm kinda considering this PR dead, as there is little traction, and it is more of a "nice-to-have" and not really important for the usage of the library. Just figured I'd contribute my local utils since no save/load functionality existed.

@rth
Copy link

rth commented Mar 11, 2019

I do believe the numpy save/load functionality here would work with allow_pickle=False too though, as we're only really persisting floats and strings.

Definitely.

For ONNX they don't support sparse arrays yet, I think, so it's more a long term solution that could be investigated in the future..

@CireS31
Copy link

CireS31 commented May 24, 2019

Hello,
Finally are save/load functionalities implemented in lightFM? If not, what is the best solution to save the model?
Thank you in advance.

@impaktor
Copy link

Based on the principle that pickling is dangerous and inefficient, it may be useful to have something like this.

Loading pickled models from unknown sources should never happen in practice I think?

When is it unsafe to pickle a LightFM model? Provided one doesn't change version of any of the underlying software, my understanding is the "normal" pickle that's in python is safe to use, also with LightFM objects. Please correct me if I'm wrong.

@CireS31 since this PR isn't merged: no.

@alecokas
Copy link

Hi, is this issue still being actively looked at?

@NegatioN
Copy link
Author

@alecokas Nope.
I'm closing for now, until @maciejkula decides to re-open it.

@NegatioN NegatioN closed this Feb 12, 2020
@ahmadalli
Copy link

@impaktor pickle is not safe for loading objects from unkown sources: https://checkoway.net/musings/pickle/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants