Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model loading is incompatible with newer RcppAnnoy #31

Closed
jlmelville opened this issue Sep 22, 2019 · 14 comments
Closed

Model loading is incompatible with newer RcppAnnoy #31

jlmelville opened this issue Sep 22, 2019 · 14 comments
Labels
bug

Comments

@jlmelville
Copy link
Owner

@jlmelville jlmelville commented Sep 22, 2019

Dirk Eddelbuettel, RcppAnnoy author, has reached out to inform me of that a new version of RcppAnnoy is coming, backed with an updated version of Annoy.

Unfortunately, uwot's ability to load previously saved models is broken by these changes. It used to be possible to specify an arbitrary dimensionality when creating an index, and then loading a serialized Annoy model would overwrite that dimensionality to whatever the serialized index was supposed to have, e.g.:

ann <- methods::new(RcppAnnoy::AnnoyEuclidean, 1)
ann$load(index_path)

Annoy author Erik Bernhardsson further pointed out that this should never have worked, which adds a little extra urgency to making a fix.

Fortunately, the information needed is readily to hand at load time, so this isn't a difficult (or backwards-compatibility breaking) fix.

jlmelville added a commit that referenced this issue Sep 22, 2019
jlmelville added a commit that referenced this issue Sep 23, 2019
@jlmelville
Copy link
Owner Author

@jlmelville jlmelville commented Sep 23, 2019

0.1.4 is now on CRAN with this fix. Thanks to Dirk and Erik for their assistance on this one.

@jlmelville jlmelville closed this Sep 23, 2019
@OuNao
Copy link

@OuNao OuNao commented Feb 13, 2020

Hi!

Maybe this issue is not completly solved.

Using R 3.6.0 and uwot 0.1.5 I can save a model but get a error on load the same model:
Error: index size 76853092 is not a multiple of vector size 16

If I try the uwot version 0.1.3 I can load the model with no problems (the model created with 0.1.5 uwot version)

What I´m doing wrong?

Thanks,

EDIT:
Module saved with: R 3.6.0, uwot 0.1.5, RcppAnnoy 0.0.14
Load try 1: R 3.6.0, uwot 0.1.5, RcppAnnoy 0.0.14 -> ERROR
Load try 2: R 3.5.3, uwot 0.1.3, RcppAnnoy 0.0.11 -> SUCESS!

@jlmelville
Copy link
Owner Author

@jlmelville jlmelville commented Feb 14, 2020

Can you provide the commands you ran?

@jlmelville jlmelville reopened this Feb 14, 2020
@OuNao
Copy link

@OuNao OuNao commented Feb 15, 2020

Hi,

Create the model:
model<-uwot::umap(X = ff, scale = TRUE, n_threads=4, init = "spectral", metric="euclidean", ret_model = TRUE)
Save the model:
uwot::save_uwot(model, file = "mymodel")
Load the model:
model2<-uwot::load_uwot(file="mymodel")

Thanks,

@jlmelville
Copy link
Owner Author

@jlmelville jlmelville commented Feb 15, 2020

Does it make a difference if you do:

# Create the model:
model <- uwot::umap(X = ff, scale = TRUE, n_threads = 4, init = "spectral", 
                    metric = "euclidean", ret_model = TRUE)
model_file <- tempfile("mymodel")
# Save the model:
uwot::save_uwot(model, file = model_file)
# Load the model:
model2 <- uwot::load_uwot(file = model_file)

I have discovered there are lots of problems with save_uwot and load_uwot, although they are not connected with RcppAnnoy directly, so this might be a different bug.

@OuNao
Copy link

@OuNao OuNao commented Feb 15, 2020

Hi,
The actual code already use tempfile. I simplified the above code.
Nope, there's no difference with or without intermediary tempfile.
Thanks,

@OuNao
Copy link

@OuNao OuNao commented Feb 15, 2020

So,

Test load uwot in R 3.6.0 + RcppAnnoy 0.0.14 + uwot 0.1.13 -> error!
Test load uwot in R 3.6.0 + RcppAnnoy 0.0.11 + uwot 0.1.13 -> error!

I don't know why R 3.5.3 + RcppAnnoy 0.0.11 + uwot 0.1.13 works...
The problem is R 3.6.0?

@jlmelville
Copy link
Owner Author

@jlmelville jlmelville commented Feb 15, 2020

Can you try with the latest R 3.6.x release, which is currently R 3.6.2?

@OuNao
Copy link

@OuNao OuNao commented Feb 15, 2020

I will try R-3.6.2. Before this, some debuging:
I can isolate the problem to ann$load(nn_fname):
Error: index size 76853092 is not a multiple of vector size 16
ann is defined before this in ann <- create_ann(metric, ndim = length(model$metric[[i]])), but model$metric is NULL so length(model$metric[[i]]) is 0 (zero). That's ok?

@jlmelville
Copy link
Owner Author

@jlmelville jlmelville commented Feb 15, 2020

No that's probably not ok. This is a bug, sorry. Unfortunately this bug isn't something I have been able to reproduce so I missed it (weirdly the lack of a correct dimensionality still isn't biting me).

model$metric[[i]] should be set to the number of features used in the NN index. Because you have scale = TRUE, this should be fixable by doing:

model$metric[[1]] <- length(model$scale_info$`scaled:nzvcols`)

Hopefully that will work for your current case for now.

@OuNao
Copy link

@OuNao OuNao commented Feb 15, 2020

Hi,

Yes this solve the problem! But ann must be created with ann <- create_ann(metric, ndim = model$metric[[i]]) and not ann <- create_ann(metric, ndim = length(model$metric[[i]])).

Any ETA for this changes appear on CRAN?

I can use model$metric[[1]] <- length(model$scale_info$`scaled:nzvcols`) before save the model but load_uwot still throws error without the above change.

Thanks.

@jlmelville
Copy link
Owner Author

@jlmelville jlmelville commented Feb 15, 2020

Yes this solve the problem! But ann must be created with ann <- create_ann(metric, ndim = model$metric[[i]]) and not ann <- create_ann(metric, ndim = length(model$metric[[i]])).

Yes, you are again correct.

Any ETA for this changes appear on CRAN?

I don't know. There are several more problems with save_uwot and load_uwot that need fixing, plus some other outstanding issues. Updates to packages on CRAN are not supposed to happen more than once every 1-2 months, so this requires submitting fixes in batches.

@OuNao
Copy link

@OuNao OuNao commented Feb 16, 2020

No problem.

I can still use R 3.5.3.
I will wait for the changes appear on Github and move from cran. I could Aldo use the manual solition from issue #19.

So. Many alternatives.

Thanks a lot for this wonderfull package.

Regards,

@jlmelville
Copy link
Owner Author

@jlmelville jlmelville commented Mar 16, 2020

Version 0.1.8 has made it to CRAN and this should be fixed now. Please re-open if problems persist.

@jlmelville jlmelville closed this Mar 16, 2020
yuhanH pushed a commit to yuhanH/uwot that referenced this issue Jul 20, 2020
yuhanH pushed a commit to yuhanH/uwot that referenced this issue Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.