You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Opensearch uses either MmapDirectory or NioFSDirectory for reading files, depending on the file type. This works well for files that are opened for search operation, where the entire file is not brought in to the page cache or where the same data is accessed multiple times. Merging scans the file from start to end and touches every single block for the files. Blocks/files/fields that are not accessed for search, but just for merge need not be loaded into page cache or read via mmap function, thereby optimizing the page cache activity for more relevant operations.
This can be done for entire files in the following way:
Lucene exposes 2 abstractions: DirectIODirectory and IOContext. We will use the IOContext to identify whether a file is being opened for merge or search operation.If opened for merge, the DirectIODirectory will be used.
Files that are already open for search operation can be directly read without looking at the IOContext.
Some customer index all their data every day once, followed by snapshot. The snapshot is then restored to a read cluster during low read traffic times. We should allow such users to configure their read directory to DirectIO by default. This could be followed by the IOContext based optimization.
For blocks, I need to think a bit more and check whether Lucene supports it inherently or by extension points.
The text was updated successfully, but these errors were encountered:
Opensearch uses either MmapDirectory or NioFSDirectory for reading files, depending on the file type. This works well for files that are opened for search operation, where the entire file is not brought in to the page cache or where the same data is accessed multiple times. Merging scans the file from start to end and touches every single block for the files. Blocks/files/fields that are not accessed for search, but just for merge need not be loaded into page cache or read via mmap function, thereby optimizing the page cache activity for more relevant operations.
This can be done for entire files in the following way:
Some customer index all their data every day once, followed by snapshot. The snapshot is then restored to a read cluster during low read traffic times. We should allow such users to configure their read directory to DirectIO by default. This could be followed by the IOContext based optimization.
For blocks, I need to think a bit more and check whether Lucene supports it inherently or by extension points.
The text was updated successfully, but these errors were encountered: