Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serializing UMAP #273

Open
martinobertoni opened this issue Aug 9, 2019 · 7 comments
Open

Serializing UMAP #273

martinobertoni opened this issue Aug 9, 2019 · 7 comments
Labels
Good Reads Issues that discuss important topics regarding UMAP, that provide useful code or nice visualizations

Comments

@martinobertoni
Copy link

Hi! thanks for implementing UMAP it's very handy!
When serializing a trained UMAP object via pickle I had the following error:

pickle.dump(myumap, open('pickle.pkl', 'w'))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled

A workaround I've found is:

pickle.dump(myumap, open('pickle.pkl', 'w'), protocol=-1)

Do you think it is safe? Is there a better/recommended serialization approach?

@lmcinnes
Copy link
Owner

Sadly I think that is the recommended approach at this point. You could also look at using joblib for serialization instead of pickle.

@sleighsoft sleighsoft added the Good Reads Issues that discuss important topics regarding UMAP, that provide useful code or nice visualizations label Sep 13, 2019
@wmayner
Copy link
Contributor

wmayner commented Jan 3, 2020

I'm getting a different error when I try to pickle a fitted UMAP object, even when I use protocol=-1:

TypeError: can't pickle _nrt_python._MemInfo objects

I also tried using wrap_non_picklable_objects from joblib, and that didn't work either.

The reason I'd like to do this is because

  • the wonderful new plotting interface in 0.4 requires a UMAP object, rather than the transformed data
  • the parameters that I used are stored on that object, which is convenient for reproduction

Is there a workaround that you know of? For now I'll just save the transformed data and parameters, and plot things myself, but I think it would be nice to be able to simply serialize the object.

@wmayner
Copy link
Contributor

wmayner commented Jan 24, 2020

It seems that the _nrt_python._MemInfo error is caused by the attempt to serialize pynndescent._rp_trees.FlatTree objects.

@lmcinnes
Copy link
Owner

I think the simplest thing is to delete the _rp_trees attribute. You'll only need it if you want to transform new data, and for the use cases you describe that should be fine. I'll try to figure out a more long term fix when I get some time.

@wmayner
Copy link
Contributor

wmayner commented Jan 30, 2020

For Googlers: the relevant attribute seems to be _rp_forest. Deleting this allowed the UMAP objects to be pickled.

@lefnire
Copy link

lefnire commented Aug 1, 2020

bump. I need to use loaded umap (pickle, joblib, etc) for inference ("You'll only need [._rp_trees] it if you want to transform new data"). I think that might be a main reason people would want to serialize incidentally; future inference. Thanks for the project!

@aakash0017
Copy link

aakash0017 commented Jun 21, 2024

Were we able to find the solution to this? I'm not able to serialize AlignedUMAP. Error I get is TypeError: cannot pickle '_nrt_python._MemInfo' object. I want to store this object to add new datapoints in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good Reads Issues that discuss important topics regarding UMAP, that provide useful code or nice visualizations
Projects
None yet
Development

No branches or pull requests

6 participants