Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a preferred method of saving and loading h2o word2vec models in python? #144

Closed
geoffkip opened this issue Sep 16, 2020 · 0 comments

Comments

@geoffkip
Copy link

geoffkip commented Sep 16, 2020

I have trained a word2vec model in the python h2o package. Is there a simple way for me to save that word2vec model and load it back later for use?

I have tried the h2o.save_model() and h2o.load_model() functions with no luck. I get an error using that approach like


water.exceptions.H2OIllegalArgumentException
[1] "water.exceptions.H2OIllegalArgumentException: Illegal argument: dir of function: importModel:
I am using the same version of h2o to train and load the model back in so the issue outlined in this question is not applicable Can't import binay h2o model with h2o.loadModel() function: 412 Precondition Failed

Any one with any insights on how to save and load an h2o word2vec model?

I realize more importantly than saving the model it is important to save the word vector embeddings to use later as a pre-trained model.

Is doing something like this best practice?

import h2o
from h2o.estimators import H2OWord2vecEstimator

df['text'] = df['text'].ascharacter()
  
# Break text into sequence of words
words = tokenize(df["text"])
    
# Initializing h2o
print('Initializing h2o.')
h2o.init(ip=h2o_ip, port=h2o_port, min_mem_size=h2o_min_memory) 
   
# Build word2vec model:
w2v_model = H2OWord2vecEstimator(sent_sample_rate = 0.0, epochs = 10)
w2v_model.train(training_frame=words)

#Create word vector embedding h20 frame
w2v_frame = w2v_model.to_frame()

#Export word embeddings to file for later use
h2o.export_file(w2v_frame,'/mnt/results/words_embeddings.csv',force=True)

# Import word embeddings later for pretrained model 
w2v_frame = h2o.import_file('/mnt/results/words_embeddings.csv')

#Define pretrained word2vec model
w2v_model2 = H2OWord2vecEstimator(pre_trained = w2v_frame, vec_size = 100)

# Train on words
w2v_model2.train(training_frame=words)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant