Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Indexing "bug" while writing grenad chunks #37

Closed
Kerollmops opened this issue Nov 13, 2020 · 1 comment · Fixed by #47
Closed

Indexing "bug" while writing grenad chunks #37

Kerollmops opened this issue Nov 13, 2020 · 1 comment · Fixed by #47
Labels
bug Something isn't working

Comments

@Kerollmops
Copy link
Member

Kerollmops commented Nov 13, 2020

Grenad is a append only key value store, which means that it can only create key value by sorting the keys first and merging the conflicting ones, these key values are also called sorted-string tables (SSTable).

The library also supports sorters, this is a datastructure that allows the user to give keys in any specific order, grenad will sort the key-values in memory and the user can define the maximum amount of memory to use, once this sorter reach the max memory specified it sort and merge them in memory then dump them on disk.

The grenad library, and more specifically the sorter datastructure, is used in the indexing system, keys and values are given to the sorter and the sorter is limited to the max memory specified at program startup, divided by the number of threads dedicated to the indexing task and then divided by the number of sorters in each of the threads.

This comes to be a problem, sometimes the sorter datastructure "blocks" the indexing system, it tries to dump one (1) entry to disk, this is because we use RoaringBitmap and that the max memory of the sorter can only accept one value in memory. RoaringBitmap sometimes needs to represent u32s with multiple containers that takes much space and therefore this serialized value triggers the chunk disk dumping.

There is maybe multiple solutions to fix that:

  • Increase the amount of memory allocated to the sorters by either:
    • Reducing the number of threads used for indexing, this gives more memory to the individual sorter for each threads.
    • Increasing the max memory allocated to the program itself.
  • Improve and reduce the size of the RoaringBitmap serialization by supporting run containers.
  • Detect in the sorter datastructure the fact that we are trying to merge and write one (1) entry and do something (but what?).
  • Share the same allocator for all of the threads and grenad sorters, combined with an allocator wrapper it would help balance memory usage.
  • We could also use something like a memmapped Vec, like zkp-mmap-vec but with resize features.
  • We could also move the data that is too big into another file, it would work well with an enum that specifies where the data is located and oversized data is moved into another place (like a file).
@Kerollmops Kerollmops added the bug Something isn't working label Nov 13, 2020
@Kerollmops
Copy link
Member Author

Fixed by meilisearch/grenad#2. Note it was not the imaginary bug explained here that was producing the problem, but the problem has been fixed anyway.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant