New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for saving and loading models #377
Conversation
This is not strictly correct, as it doesn't save and restore model hyperparameters. Based on the principle that pickling is dangerous and inefficient, it may be useful to have something like this. Would you be willing to work on this a little more to whip it into shape? |
555d930
to
bef356a
Compare
I'm not sure if it's smart to try handling the user/item mappings found in the default dataset implementation: https://github.com/lyst/lightfm/blob/master/lightfm/data.py#L135 It definitely adds friction that it's handled outside of the save/load functions, but I think they need to be integrated tighter to make it easy to include in the numpy-object. Other than that, both vectors and hyperparams should be taken into consideration now @maciejkula :) Seems like black-tests are passing on my machine, so I'm not quite sure what to make of that. |
@maciejkula I've fixed the formatting errors now, would you mind taking a look? Seems like the current error is related to imports in an untouched file. |
Looks good, but I have two more comments. Firstly, what do you think about making model = LightFM()
model.load(path) we would have model = LightFM.load(path) Secondly, can I ask for one more test please, along the following lines:
|
Build is now failing because (of what I can only assume), there is something wrong with the how the caching is done in CircleCI. I can fix the commit history after this issue is resolved. It either fails because it has a definition of Please correct me if I'm wrong, but I don't think I can help fixing this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleared the caches and restarted the build. Let's see what happens.
tests/test_persist.py
Outdated
if not callable(ob) and not x.startswith("__"): | ||
assert x in saved_model_params | ||
|
||
_cleanup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the test fails, this is never executed, and is never cleaned up. Could you use pytest fixtures for setup/teardown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be done now. Needs another cache-clean. Also, if you do end up merging this, just squash everything. I don't think it makes sense to keep any of the history except the initial commit.
I understand that this has been long in the making, but I believe I have addressed the changes you wanted @maciejkula ? It just needs another cache refresh |
Thanks for your patience. I poked Circle and will have another look. |
Loading pickled models from unknown sources should never happen in practice I think? Also note that Overall, I'm not saying that is PR is not useful (thanks for working on it @NegatioN), just wanted to add more context. In the long term, ONNX export (https://github.com/onnx/onnx) might be a more production ready and language agnostic solution. |
@rth ONNX is definitely better. I guess it should be possible to translate the whole model structure as well, so even if the lyst model definitions change, they are still compatible. And exporting models to different frameworks for serving or training could be interesting too. I do believe the numpy save/load functionality here would work with I'm kinda considering this PR dead, as there is little traction, and it is more of a "nice-to-have" and not really important for the usage of the library. Just figured I'd contribute my local utils since no save/load functionality existed. |
Definitely. For ONNX they don't support sparse arrays yet, I think, so it's more a long term solution that could be investigated in the future.. |
Hello, |
When is it unsafe to pickle a LightFM model? Provided one doesn't change version of any of the underlying software, my understanding is the "normal" pickle that's in python is safe to use, also with LightFM objects. Please correct me if I'm wrong. @CireS31 since this PR isn't merged: no. |
Hi, is this issue still being actively looked at? |
@alecokas Nope. |
@impaktor pickle is not safe for loading objects from unkown sources: https://checkoway.net/musings/pickle/ |
Is this something you would want in the project @maciejkula ?
This is just a rough outline of some code I have laying around to do something like this, so it might need to be fixed up a bit and whatnot, but I'd rather do that after I know your opinion :)
And if it already exists, it's a bit embarrassing but I didn't find it! :)