Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo bitext mining fails in faiss population #5

Closed
oneraghavan opened this issue Jul 16, 2022 · 6 comments
Closed

Demo bitext mining fails in faiss population #5

oneraghavan opened this issue Jul 16, 2022 · 6 comments

Comments

@oneraghavan
Copy link
Contributor

2022-07-16 18:21 ERROR 2593602:stopes.populate_faiss_index - Error in index population with embeddings: /nlsasfs/home/ai4bharat/trial_user/indicmining/repos/stopes/outputs/2022-07-15/20-34-12/embed.V32m/laser3_encoder/encf.000.fuv, & index: /nlsasfs/home/ai4bharat/trial_user/indicmining/repos/stopes/outputs/2022-07-16/09-39-18/index.V32m/fuv/demo_wmt22.OPQ64,IVF8192,PQ64.fuv.train.idx
Traceback (most recent call last):
File "/nlsasfs/home/trial_acc/trial_user/anaconda3/envs/stopes2/lib/python3.8/site-packages/stopes/modules/bitext/indexing/populate_faiss_index.py", line 162, in run
add_embedding_to_index(
File "/nlsasfs/home/trial_acc/trial_user/anaconda3/envs/stopes2/lib/python3.8/site-packages/stopes/modules/bitext/indexing/populate_faiss_index.py", line 292, in add_embedding_to_index
faiss.write_index(
File "/nlsasfs/home/trial_acc/trial_user/anaconda3/envs/stopes2/lib/python3.8/site-packages/faiss/swigfaiss_avx2.py", line 9843, in write_index
return _swigfaiss_avx2.write_index(args)
RuntimeError: Error in void faiss::write_index(const faiss::Index
, faiss::IOWriter*) at /home/conda/feedstock_root/build_artifacts/faiss-split_1644327811086/work/faiss/impl/index_write.cpp:590: don't know how to serialize this type of index

@Mortimerp9
Copy link
Contributor

interesting, I ran this yesterday end to end without issues. What version of faiss/faiss-gpu do you have?

@oneraghavan
Copy link
Contributor Author

@Mortimerp9 The faiss-gpu version is 1.7.2 . how did you install faiss-gpu ?

@Mortimerp9
Copy link
Contributor

I installed it with pip, which doesn't give the official version. You might want to try to install the conda version instead to be sure.

Anyway, digging in the interweb seems to point to this error happening if the index becomes too large for your GPU. It's odd that it can't serialize this type given that it's done it in the previous train step.

fuv is a language in the dataset with one of the biggest set of data. Maybe try with a smaller language (you can just pick another language code in the lsit of downloaded files).

@oneraghavan
Copy link
Contributor Author

@Mortimerp9 The issue happens when index is gpu , but during partial saving of gpu index, we do not covert it into cpu index. Fix it in Pr #6

@Mortimerp9
Copy link
Contributor

@oneraghavan, your fix in #6 has been merged, so I'll close this issue. If you see it again, let me know.

@oneraghavan
Copy link
Contributor Author

@Mortimerp9 Sorry, I was away from work, in a remote place, could to fix this, Thanks for merging it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants