Modify read directory for files/blocks that are opened for merging. #4493

itiyama · 2022-09-13T04:16:25Z

Opensearch uses either MmapDirectory or NioFSDirectory for reading files, depending on the file type. This works well for files that are opened for search operation, where the entire file is not brought in to the page cache or where the same data is accessed multiple times. Merging scans the file from start to end and touches every single block for the files. Blocks/files/fields that are not accessed for search, but just for merge need not be loaded into page cache or read via mmap function, thereby optimizing the page cache activity for more relevant operations.

This can be done for entire files in the following way:

Lucene exposes 2 abstractions: DirectIODirectory and IOContext. We will use the IOContext to identify whether a file is being opened for merge or search operation.If opened for merge, the DirectIODirectory will be used.
Files that are already open for search operation can be directly read without looking at the IOContext.

Some customer index all their data every day once, followed by snapshot. The snapshot is then restored to a read cluster during low read traffic times. We should allow such users to configure their read directory to DirectIO by default. This could be followed by the IOContext based optimization.

For blocks, I need to think a bit more and check whether Lucene supports it inherently or by extension points.

itiyama · 2022-09-15T01:34:53Z

@nknize What are your thoughts?

itiyama added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 13, 2022

dreamer-89 added Indexing & Search and removed untriaged labels Sep 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify read directory for files/blocks that are opened for merging. #4493

Modify read directory for files/blocks that are opened for merging. #4493

itiyama commented Sep 13, 2022 •

edited

itiyama commented Sep 15, 2022

Modify read directory for files/blocks that are opened for merging. #4493

Modify read directory for files/blocks that are opened for merging. #4493

Comments

itiyama commented Sep 13, 2022 • edited

itiyama commented Sep 15, 2022

itiyama commented Sep 13, 2022 •

edited