Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to save a fine-tuned vectorizer object for later use #71

Closed
adkinsty opened this issue Oct 12, 2022 · 1 comment
Closed

Best way to save a fine-tuned vectorizer object for later use #71

adkinsty opened this issue Oct 12, 2022 · 1 comment

Comments

@adkinsty
Copy link

adkinsty commented Oct 12, 2022

Thanks for creating this package! I just have one quick question.

After fine-tuning the vectorizer on my text:

vecs = Vectors.from_pretrained(model_name)
vectorizer = Average(vecs)
vectorizer.train(IndexedList(sent_train))

what is the best way to save the vectorizer object for later use? Currently I am trying to use pickle, like so:

with open(f'{path}/vectorizer.pkl', 'wb') as f:
    pickle.dump(vectorizer, f)

The resulting pickle file has a size of 6.9gb.

Thanks for your time.

@adkinsty adkinsty changed the title Best way to save a fine-tuned Vectorizer object for later use Best way to save a fine-tuned vectorizer object for later use Oct 12, 2022
@adkinsty
Copy link
Author

adkinsty commented Oct 12, 2022

Ah, actually, perhaps I was confused. I had assumed that the .train() method does some sort of fitting/fine-tuning with the text whereas .infer() merely transforms the text. But if not, then there is no need to save the vectorizer for re-use. I can simply initialize a new vectorizer and use that to transform new text data.

P.S. the pre-trained model I'm using here is fasttext-crawl-subwords-300

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant