Index does not fit to RAM #64

littlewine · 2021-08-18T06:27:33Z

Hi,
I am trying to run inference (retrieve.py)in a quite big collection and I get the following error :

RuntimeError: [enforce fail at CPUAllocator.cpp:64]. DefaultCPUAllocator: can't allocate memory:you tried to allocate 664833940224 bytes. Error code 12 (Cannot allocate memory)

To my understanding, that means that colbert requires 600+ gb of memory, which the machine does not have. Is there any way to bypass this issue?

Can I read the file from an SSD and how much slower would the process be?

Some colleagues also suggested using huggingface.co datasets for that but I'm unsure whether this is feasible or where to start!

The text was updated successfully, but these errors were encountered:

okhat · 2021-08-18T06:43:38Z

Might want to use batch retrieval and batch re-ranking? Brief instructions are in the README.

It does search in two steps, but doesn't have to load the full index into memory.

okhat · 2021-08-18T06:44:50Z

Alternatively, you might be interested in our binarization branch. It makes the index much smaller. It's still a work in progress; we'll improve its quality in the next 2-3 weeks. (Until then, you might lose a couple of points of quality if you use the binarization branch.)

okhat · 2021-08-20T16:31:23Z

@littlewine Is this resolved?

littlewine · 2021-08-20T16:52:43Z

Yes indeed! Thanks for your help!

Lim-Sung-Jun · 2022-11-10T13:24:09Z

Might want to use batch retrieval and batch re-ranking? Brief instructions are in the README.

It does search in two steps, but doesn't have to load the full index into memory.

Can you explain why you didn't add 512 in batch_retrieval?

littlewine closed this as completed Aug 20, 2021

shaoyijia mentioned this issue Aug 16, 2023

How to index large corpus which cannot be loaded into the memory? #234

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index does not fit to RAM #64

Index does not fit to RAM #64

littlewine commented Aug 18, 2021 •

edited

Loading

okhat commented Aug 18, 2021

okhat commented Aug 18, 2021

okhat commented Aug 20, 2021

littlewine commented Aug 20, 2021

Lim-Sung-Jun commented Nov 10, 2022

Index does not fit to RAM #64

Index does not fit to RAM #64

Comments

littlewine commented Aug 18, 2021 • edited Loading

okhat commented Aug 18, 2021

okhat commented Aug 18, 2021

okhat commented Aug 20, 2021

littlewine commented Aug 20, 2021

Lim-Sung-Jun commented Nov 10, 2022

littlewine commented Aug 18, 2021 •

edited

Loading