Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save sif_model.sv.vectors.npy file is very large? #14

Closed
RyanHuangNLP opened this issue Oct 12, 2019 · 1 comment
Closed

save sif_model.sv.vectors.npy file is very large? #14

RyanHuangNLP opened this issue Oct 12, 2019 · 1 comment

Comments

@RyanHuangNLP
Copy link

RyanHuangNLP commented Oct 12, 2019

I found my sif_model.sv.vectors.npy file is just (758194, 100) matrix, but that file is 15G, while I save a (800000, 100) matrix to npy file, it just 600mb, so is it normal? I train the sif model on 30 million sentences

-rw-r--r-- 1 ke ke  43M oct 11 19:09 sif_model
-rw-r--r-- 1 ke ke  15G oct 11 19:09 sif_model.sv.vectors.npy  <<----- this file very large
-rw-r--r-- 1 ke ke 290M oct 11 19:07 sif_model.wv.vectors.npy
@oborchers
Copy link
Owner

Hi! If you train the model on 30 million sentences you should end up with an array of size (30*10^6, 100).

The formula to determine the approx size of the array is:
sentences * vector_size * np.dtype(np.float32).itemsize

For your purpose that'd be equal to:
30e6*100*np.dtype(np.float32).itemsize / 1024**3
which is about 12G. Thus 15 is a bit higher than expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants