Add support for saving and loading models #377

NegatioN · 2018-09-25T11:16:08Z

Is this something you would want in the project @maciejkula ?
This is just a rough outline of some code I have laying around to do something like this, so it might need to be fixed up a bit and whatnot, but I'd rather do that after I know your opinion :)

And if it already exists, it's a bit embarrassing but I didn't find it! :)

maciejkula · 2018-11-04T00:56:41Z

This is not strictly correct, as it doesn't save and restore model hyperparameters.

Based on the principle that pickling is dangerous and inefficient, it may be useful to have something like this. Would you be willing to work on this a little more to whip it into shape?

NegatioN · 2018-11-04T12:01:23Z

I'm not sure if it's smart to try handling the user/item mappings found in the default dataset implementation: https://github.com/lyst/lightfm/blob/master/lightfm/data.py#L135

It definitely adds friction that it's handled outside of the save/load functions, but I think they need to be integrated tighter to make it easy to include in the numpy-object.

Other than that, both vectors and hyperparams should be taken into consideration now @maciejkula :)

Seems like black-tests are passing on my machine, so I'm not quite sure what to make of that.

NegatioN · 2018-11-26T12:42:22Z

@maciejkula I've fixed the formatting errors now, would you mind taking a look? Seems like the current error is related to imports in an untouched file.

DPGrev · 2018-11-26T22:01:51Z

@NegatioN It is because your branch is behind the lyst/lightfm master branch.

In 9090456 sklearn got updated. Which means that RandomizedSearchCV is now part of sklearn.model_selection. This is not the case for your submitted pull request since you are behind on the current master.

maciejkula · 2018-11-30T05:52:37Z

Looks good, but I have two more comments.

Firstly, what do you think about making load a class method? I think it might be clearer to not need to instantiate a model class first, and then completely override it. This way, instead of

model = LightFM()
model.load(path)

we would have

model = LightFM.load(path)

Secondly, can I ask for one more test please, along the following lines:

Fit a model.
Evaluate it.
Save and load it.
Evaluate again. The metrics should be exactly the same.

…ded from file

NegatioN · 2018-12-04T13:55:27Z

Build is now failing because (of what I can only assume), there is something wrong with the how the caching is done in CircleCI. I can fix the commit history after this issue is resolved.

It either fails because it has a definition of .load() defined which it think takes two inputs (as both a classmethod and staticmethod it only takes one), and if I rename the function to load_uncached to force through some way to un-cache this, I get the error-message: AttributeError: type object 'LightFM' has no attribute 'load_uncached'.

Please correct me if I'm wrong, but I don't think I can help fixing this.

maciejkula

I cleared the caches and restarted the build. Let's see what happens.

maciejkula · 2018-12-06T06:25:45Z

tests/test_persist.py

+        if not callable(ob) and not x.startswith("__"):
+            assert x in saved_model_params
+
+    _cleanup()


When the test fails, this is never executed, and is never cleaned up. Could you use pytest fixtures for setup/teardown?

Should be done now. Needs another cache-clean. Also, if you do end up merging this, just squash everything. I don't think it makes sense to keep any of the history except the initial commit.

NegatioN · 2019-01-07T20:00:11Z

I understand that this has been long in the making, but I believe I have addressed the changes you wanted @maciejkula ? It just needs another cache refresh

maciejkula · 2019-02-11T06:00:16Z

Thanks for your patience. I poked Circle and will have another look.

rth · 2019-03-11T13:00:41Z

Based on the principle that pickling is dangerous and inefficient, it may be useful to have something like this.

Loading pickled models from unknown sources should never happen in practice I think? Also note that numpy.load (used in this PR) has allow_pickle=True by default, so that is not really safer than regular pickling. The inefficient part is not true if one uses joblib.dump / load (e.g. used in scikit-learn).

Overall, I'm not saying that is PR is not useful (thanks for working on it @NegatioN), just wanted to add more context. In the long term, ONNX export (https://github.com/onnx/onnx) might be a more production ready and language agnostic solution.

NegatioN · 2019-03-11T13:25:09Z

@rth ONNX is definitely better. I guess it should be possible to translate the whole model structure as well, so even if the lyst model definitions change, they are still compatible. And exporting models to different frameworks for serving or training could be interesting too.

I do believe the numpy save/load functionality here would work with allow_pickle=False too though, as we're only really persisting floats and strings.

I'm kinda considering this PR dead, as there is little traction, and it is more of a "nice-to-have" and not really important for the usage of the library. Just figured I'd contribute my local utils since no save/load functionality existed.

rth · 2019-03-11T13:36:38Z

I do believe the numpy save/load functionality here would work with allow_pickle=False too though, as we're only really persisting floats and strings.

Definitely.

For ONNX they don't support sparse arrays yet, I think, so it's more a long term solution that could be investigated in the future..

CireS31 · 2019-05-24T15:04:02Z

Hello,
Finally are save/load functionalities implemented in lightFM? If not, what is the best solution to save the model?
Thank you in advance.

impaktor · 2019-06-17T14:39:39Z

Based on the principle that pickling is dangerous and inefficient, it may be useful to have something like this.

Loading pickled models from unknown sources should never happen in practice I think?

When is it unsafe to pickle a LightFM model? Provided one doesn't change version of any of the underlying software, my understanding is the "normal" pickle that's in python is safe to use, also with LightFM objects. Please correct me if I'm wrong.

@CireS31 since this PR isn't merged: no.

alecokas · 2020-02-11T11:16:12Z

Hi, is this issue still being actively looked at?

NegatioN · 2020-02-12T06:48:58Z

@alecokas Nope.
I'm closing for now, until @maciejkula decides to re-open it.

ahmadalli · 2020-04-01T05:06:45Z

@impaktor pickle is not safe for loading objects from unkown sources: https://checkoway.net/musings/pickle/

NegatioN force-pushed the master branch 3 times, most recently from 555d930 to bef356a Compare November 4, 2018 11:29

NegatioN force-pushed the master branch from 27d3057 to 4b985e0 Compare November 26, 2018 12:38

NegatioN force-pushed the master branch from 4b985e0 to 7d7a902 Compare November 29, 2018 08:15

NegatioN added 10 commits December 4, 2018 14:18

Add support for saving and loading models

1c640fa

Add a test to confirm we're saving all relevant parameters

a27d15d

Add test to ensure model is instanciated on load()

26e172f

Properly clean up after tests

7f63dcd

Formatting

4f98c92

Reformat to match black settings

a4a5856

Add test to confirm performance of model is identical after model loa…

23efc0c

…ded from file

Change load method to classmethod

16bf53b

Remove redundant check of actual model performance

4beaebd

Don't load dataset multiple times

2968358

NegatioN force-pushed the master branch from 63a9198 to 025da3f Compare December 4, 2018 13:18

reformat

56060a7

NegatioN force-pushed the master branch from 025da3f to 56060a7 Compare December 4, 2018 13:33

Trigger rebuild

12ab436

NegatioN force-pushed the master branch from 5fad1c1 to 12ab436 Compare December 4, 2018 13:39

Change from classmethod to staticmethod since circleCI is not having it

b981710

Trigger build

6e556ad

NegatioN force-pushed the master branch from 2e33f22 to 6e556ad Compare December 4, 2018 13:59

maciejkula requested changes Dec 6, 2018

View reviewed changes

NegatioN added 2 commits December 15, 2018 14:13

use pytest fixtures to instanziate and clean up

67af48b

also update method tooltip

b7fd48d

NegatioN closed this Feb 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for saving and loading models #377

Add support for saving and loading models #377

NegatioN commented Sep 25, 2018

maciejkula commented Nov 4, 2018

NegatioN commented Nov 4, 2018 •

edited

NegatioN commented Nov 26, 2018

DPGrev commented Nov 26, 2018

maciejkula commented Nov 30, 2018

NegatioN commented Dec 4, 2018 •

edited

maciejkula left a comment

maciejkula Dec 6, 2018

NegatioN Dec 15, 2018 •

edited

NegatioN commented Jan 7, 2019

maciejkula commented Feb 11, 2019

rth commented Mar 11, 2019

NegatioN commented Mar 11, 2019 •

edited

rth commented Mar 11, 2019

CireS31 commented May 24, 2019

impaktor commented Jun 17, 2019

alecokas commented Feb 11, 2020

NegatioN commented Feb 12, 2020

ahmadalli commented Apr 1, 2020

Add support for saving and loading models #377

Add support for saving and loading models #377

Conversation

NegatioN commented Sep 25, 2018

maciejkula commented Nov 4, 2018

NegatioN commented Nov 4, 2018 • edited

NegatioN commented Nov 26, 2018

DPGrev commented Nov 26, 2018

maciejkula commented Nov 30, 2018

NegatioN commented Dec 4, 2018 • edited

maciejkula left a comment

Choose a reason for hiding this comment

maciejkula Dec 6, 2018

Choose a reason for hiding this comment

NegatioN Dec 15, 2018 • edited

Choose a reason for hiding this comment

NegatioN commented Jan 7, 2019

maciejkula commented Feb 11, 2019

rth commented Mar 11, 2019

NegatioN commented Mar 11, 2019 • edited

rth commented Mar 11, 2019

CireS31 commented May 24, 2019

impaktor commented Jun 17, 2019

alecokas commented Feb 11, 2020

NegatioN commented Feb 12, 2020

ahmadalli commented Apr 1, 2020

NegatioN commented Nov 4, 2018 •

edited

NegatioN commented Dec 4, 2018 •

edited

NegatioN Dec 15, 2018 •

edited

NegatioN commented Mar 11, 2019 •

edited