Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

dense_retriever -- MemoryError: std::bad_alloc #33

Closed
aarzchan opened this issue Jul 24, 2020 · 11 comments
Closed

dense_retriever -- MemoryError: std::bad_alloc #33

aarzchan opened this issue Jul 24, 2020 · 11 comments

Comments

@aarzchan
Copy link

Hi! It seems that no matter what value I set index_buffer to, I get the following error when running dense_retriever.py:

Traceback (most recent call last):
  File "dense_retriever.py", line 331, in <module>
    main(args)
  File "dense_retriever.py", line 268, in main
    retriever.index_encoded_data(input_paths, buffer_size=index_buffer_sz)
  File "dense_retriever.py", line 100, in index_encoded_data
    self.index.index_data(buffer)
  File "/home/aarchan/qa-aug/qa-aug/dpr/indexer/faiss_indexers.py", line 93, in index_data
    self.index.add(vectors)
  File "/home/aarchan/anaconda2/envs/qa-aug/lib/python3.8/site-packages/faiss/__init__.py", line 138, in replacement_add
    self.add_c(n, swig_ptr(x))
  File "/home/aarchan/anaconda2/envs/qa-aug/lib/python3.8/site-packages/faiss/swigfaiss.py", line 1454, in add
    return _swigfaiss.IndexFlat_add(self, n, x)
MemoryError: std::bad_alloc

For reference, the machine I'm running this on has 128GB RAM, but it doesn't seem to be enough. Could you please help me with this issue? Thanks!

@vlad-karpukhin
Copy link
Contributor

Hi Aaron,
yes, unfortunately 128GB server is not enough for the retriever inference time setup even with flat index. HNSW index requires even more memory (it alone takes ~ 160GB of ram).

@aarzchan
Copy link
Author

How much RAM is required for the flat index?

@vlad-karpukhin
Copy link
Contributor

vlad-karpukhin commented Jul 24, 2020

The flat index setup showed max 95 GB ram consumption for the entire process (i.e index+ wikipedia passages data in memory, etc.).

@aarzchan
Copy link
Author

Hmm, I'm confused. Why is 128GB RAM not enough for the flat index setup, if the flat index setup only requires 95GB RAM?

@vlad-karpukhin
Copy link
Contributor

I have no idea why 128 it not enough, I run it on 512 GB server and measured the highest RES consumption. Virtual max was 135.
may be you have some other processes consuming some RAM so you have less than 95 left for the retriever?

@aarzchan
Copy link
Author

I've checked that, prior to running dense_retriever.py, my server is using less than 1GB of RAM, so it seems that all of the RAM usage is coming from the retriever script. I do also have access to a server with 256GB RAM, so I'll try running on that machine.

@soheeyang
Copy link
Contributor

soheeyang commented Aug 9, 2020

I run it on 512 GB server and measured the highest RES consumption. Virtual max was 135.

How many CPU cores does the server have and how long does it take to run dense_retriever with the server?

@vlad-karpukhin
Copy link
Contributor

vlad-karpukhin commented Aug 10, 2020 via email

@soheeyang
Copy link
Contributor

It has 64 cores

Thank you :D

@vlad-karpukhin
Copy link
Contributor

Seems like this can be closed now.

@luomancs
Copy link

I have the same issue of MemoryError: std::bad_alloc when I use dense-retriever.py, the size of my indexing is 50G and my server has 86G Memory. I changed the function iterate_encoded_files in faiss_indexers.py, specifically as follows,

def iterate_encoded_files(vector_files: list) -> Iterator[Tuple[object, np.array]]:
for i, file in enumerate(vector_files):
logger.info('Reading file %s', file)
doc_vectors = []
with open(file, "rb") as reader:
doc_vectors.extend(pickle.load(reader))
# for doc in doc_vectors:
# db_id, doc_vector = doc
# yield db_id, doc_vector
# del doc_vectors
# gc.collect()
return doc_vectors

the function loads all the indexing at once (since 50G is less than 86G, it is fine), so that DenseIndexer index_data all indexing at once rather than using a buffer.

now MemoryError: std::bad_alloc has gone.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants