Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify read directory for files/blocks that are opened for merging. #4493

Open
itiyama opened this issue Sep 13, 2022 · 1 comment
Open

Modify read directory for files/blocks that are opened for merging. #4493

itiyama opened this issue Sep 13, 2022 · 1 comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search

Comments

@itiyama
Copy link

itiyama commented Sep 13, 2022

Opensearch uses either MmapDirectory or NioFSDirectory for reading files, depending on the file type. This works well for files that are opened for search operation, where the entire file is not brought in to the page cache or where the same data is accessed multiple times. Merging scans the file from start to end and touches every single block for the files. Blocks/files/fields that are not accessed for search, but just for merge need not be loaded into page cache or read via mmap function, thereby optimizing the page cache activity for more relevant operations.

This can be done for entire files in the following way:

  1. Lucene exposes 2 abstractions: DirectIODirectory and IOContext. We will use the IOContext to identify whether a file is being opened for merge or search operation.If opened for merge, the DirectIODirectory will be used.
  2. Files that are already open for search operation can be directly read without looking at the IOContext.

Some customer index all their data every day once, followed by snapshot. The snapshot is then restored to a read cluster during low read traffic times. We should allow such users to configure their read directory to DirectIO by default. This could be followed by the IOContext based optimization.

For blocks, I need to think a bit more and check whether Lucene supports it inherently or by extension points.

@itiyama itiyama added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 13, 2022
@itiyama
Copy link
Author

itiyama commented Sep 15, 2022

@nknize What are your thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search
Projects
None yet
Development

No branches or pull requests

2 participants