Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ondisk merge consuming RAM #1262

Closed
int64max opened this issue Jun 23, 2020 · 8 comments
Closed

Ondisk merge consuming RAM #1262

int64max opened this issue Jun 23, 2020 · 8 comments
Labels

Comments

@int64max
Copy link

int64max commented Jun 23, 2020

Summary

I'm following the demo_ondisk_ivf.py demo to merge several indexes (partitions) into one. The total combined size of all partitions exceeds my RAM capacity. The documentation says this should be fine, but the RAM consumption keeps growing until the system crashes.

Platform

OS: Ubuntu 18

Running on:

  • [*] CPU

Interface:

  • [*] C++

Reproduction instructions

The partitions (indexes to merge) were created with 'IVF512,Flat'. The code for merging is as follows:


  // Gather all partitions into this list.
  std::vector<faiss::InvertedLists *> invlists;

  // Create inv indexes for all partitions.
  for (const std::string& index_file : indexes) {
    faiss::IndexIVFFlat* index = (faiss::IndexIVFFlat*) faiss::read_index(
      index_file.c_str(), faiss::IO_FLAG_MMAP); // IO_FLAG_ONDISK_SAME_DIR);
    invlists.push_back(index->invlists);

    // Avoid that the invlists get deallocated with the index. Then safely delete the index.
    // This prevents all indexes from getting loaded into memory for merger.
    index->own_invlists = false;
    delete index;
  }

  // The trained index into which we'll merge all partitions.
  faiss::IndexIVFFlat* index =(faiss::IndexIVFFlat*) faiss::read_index("/tmp/ondisk/trained.fi");
  auto ondisk = new faiss::OnDiskInvertedLists(index->nlist, index->code_size, 
     "/tmp/ondisk/merged_index.ivfdata");

  // Okay, let's now merge all inv indexes into a single ivfdata file.
  const faiss::InvertedLists **ils = (const faiss::InvertedLists**) invlists.data();
  auto ondisk_merged_ntotal = ondisk->merge_from(ils, invlists.size(), true /* verbose */);

  std::cout << "Writing merged inverted list to '" << final_index_file << "'" << std::endl;
  faiss::write_index(index, final_index_file.c_str());

When ondisk->merge_from() runs, I can see the merge progress with output lines like:
merged 437 lists in 135.845 s

And I can see the RAM usage growing. It keeps growing until the system runs out of memory and crashes. According to the demo, using faiss::IO_FLAG_MMAP should have prevented all partitioned indexes getting loaded into RAM. What am I missing? I tried a lot of variations (e.g., merging only two partitions at a time), but no luck. Any guidance will be appreciated. Thanks.

@mdouze
Copy link
Contributor

mdouze commented Jun 24, 2020

Just a silly question: are you sure /tmp is not a RAM disk?

@int64max
Copy link
Author

Thanks for the response. It is an M.2 NVMe SSD.

@int64max
Copy link
Author

To clarify, no, it is not a RAM disk:

$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 26G 0 26G 0% /dev
tmpfs 5.1G 1.4M 5.1G 1% /run
/dev/sda2 246G 167G 67G 72% /
tmpfs 26G 0 26G 0% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 26G 0 26G 0% /sys/fs/cgroup
tmpfs 5.1G 28K 5.1G 1% /run/user/129
tmpfs 5.1G 0 5.1G 0% /run/user/1000

I tried indexing on /tmp, as well as a folder in my home directory.

Not sure if it matters, but to give a complete picture: this is running in a VirtualBox VM.

@int64max
Copy link
Author

Here are some more interesting observations that may help debugging.

The total combined size of all partitions is ~22.5 gb. I had observed that merging consumes twice the amount of memory. So I gave the VM 53 gb RAM as to try processing in-memory. Now even with this setup, when I tried to merge, the system crashed once the memory usage reached ~37 gb. I tried the in-memory merge multiple times, and I observed the crash at ~37 gb limit every time.

As a separate exercise, I tried merging some of the partitions. E.g., only first half of the partitions, or only the second half. The merge works, which means the partitioned indexes themselves are fine.

Am I hitting some bug?

@mdouze
Copy link
Contributor

mdouze commented Jun 25, 2020

Merging does consume twice the amount of disk (not RAM).
The RAM consumption is negligible: it contains only the metadata and the quantizer.
So I do believe there is some artifact due to the VM.

Could you try the following code: issue_1262.ipynb
it tests the system-level functions that Faiss is using.

@int64max
Copy link
Author

The script crashed my VM :)

$ python
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import numpy as np
import os

fd = os.open('/tmp/block', os.O_RDWR|os.O_CREAT)
os.ftruncate(fd, 60 * (1<<30))
m = np.memmap('/tmp/block', mode='r+')
m.size
64424509440
m[:50]
memmap([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0], dtype=uint8)
step = 1 << 20
for i in range(0, m.size, step):
... print(f'{i} / {m.size}', end='\r', flush=True)
... m[i:i+step] = 123
...
12970885120 / 64424509440

That's the last printout before the crash. Does this imply memmap isn't working correctly on the VM? Some configuration issue?

Thanks for the help so far.

@mdouze
Copy link
Contributor

mdouze commented Jun 25, 2020

The VM does not emulate the Linux functionality properly.
It is unclear if this is a configuration problem or just a limitation of the VM or the host OS.

@int64max
Copy link
Author

I spun up a VM on AWS EC2. I then copied the same index partitions and ran the same C++ code for merging the partitions. The VM had only a few GB of RAM. The program successfully merged the indexes! So the issue was that my original VM wasn't emulating Linux functionality correctly.

Thanks again for the prompt responses and help figuring this out. I'm closing the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants