Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError for calculating cosine similarity scores #3

Closed
sumuzhao opened this issue Oct 15, 2019 · 5 comments
Closed

MemoryError for calculating cosine similarity scores #3

sumuzhao opened this issue Oct 15, 2019 · 5 comments

Comments

@sumuzhao
Copy link

Hi,

I tried to pre-calculate the cosine similarity scores based on the counter-fitting word vectors, but met the Memory Error problems. The word vectors are (65713, 300) and finally the similarity matrix is (65713, 65713). There are some dot and element-wise division operations. I got 8G RAM. Any suggestions?

Thanks a lot!

@jind11
Copy link
Owner

jind11 commented Oct 15, 2019

hi, the cos similarity matrix consumes about 30 GB RAM, which caused your out of memory problem. Do you have a larger RAM machine? Or you can also convert the float precision from 64 bits to lower one, say 32 bit or 16 bit.

@sumuzhao
Copy link
Author

Well...I'll try to reduce the float precision. But I don't think it can work due to my low RAM... I'll think if there are any alternatives for this, such as reduce the size of the vocabularies...
Anyway, thanks for your suggestion.

@jind11
Copy link
Owner

jind11 commented Oct 15, 2019

yes, you can also shrink the vocab size.

@jind11 jind11 closed this as completed Mar 21, 2020
@SatyapragyanDas
Copy link

Well...I'll try to reduce the float precision. But I don't think it can work due to my low RAM... I'll think if there are any alternatives for this, such as reduce the size of the vocabularies...
Anyway, thanks for your suggestion.

While reducing the precision by using the following line:
df = df.astype(np.float32)
But i get the following error:
ValueError: could not convert string to float: 'tt0000574'.
What should be done?

@jind11
Copy link
Owner

jind11 commented Jan 11, 2021

May I know where this line is used? I am not sure what "df" here refers to. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants