Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should ReadOnlyEngine always load terms dictionary off-heap ? #51247

Closed
jimczi opened this issue Jan 21, 2020 · 4 comments
Closed

Should ReadOnlyEngine always load terms dictionary off-heap ? #51247

jimczi opened this issue Jan 21, 2020 · 4 comments
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement

Comments

@jimczi
Copy link
Contributor

jimczi commented Jan 21, 2020

Today ReadOnlyEngine loads the terms dictionary for all fields off-heap only if the underlying directory is using mmap to access the file. This means that frozen indices can consume more heap if they are opened with niofs, which defeats one of the main benefit since they could use more memory than a non-frozen index opened with the default directory.
I am opening this issue to discuss whether we should always load terms dictionary off-heap for fields contained in a read-only index. Currently only frozen indices would be concerned but this decision would affect searchable snapshots too since they cannot use mmap to load data from a remote filesystem.

@jimczi jimczi added >enhancement :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. labels Jan 21, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Engine)

@ywelsch
Copy link
Contributor

ywelsch commented Jan 22, 2020

This was explicitly changed for ReadOnlyEngine here: #43158. It sounds like you want to revert that PR which said:

Given the significant performance impact that NIOFS has when term dicts are
loaded off-heap this change enforces FstLoadMode#AUTO that loads term dicts
off-heap only if the underlying index input indicates a memory map.

Can we quantify the performance impact?

@jimczi
Copy link
Contributor Author

jimczi commented Jan 22, 2020

Can we quantify the performance impact?

The main impact would be on queries that needs to read heavily from the terms dictionary (big terms query for instance) and id lookup for updates/deletes. The latter does not concern read-only engines and the former is not recommended but we could at least always load the _id field off-heap. This pr shows some numbers on large terms query but "normal" queries were not impacted.

Another option would be to deprecate/remove the niofs usage in favor of the hybrid directory. I don't know if niofs is widely used but it seems that hybdrid directory would always be a better choice in newer versions.

I opened this issue with searchable snapshots in mind so even if we decide that we want to keep the current behavior for frozen indices we should add a way to override the default for new engines.

@ywelsch
Copy link
Contributor

ywelsch commented Feb 19, 2020

We discussed this in the distrib sync, and are fine moving forward with this.

tlrx added a commit to tlrx/elasticsearch that referenced this issue Mar 18, 2020
This is a partial restore of elastic#43158, following decision taken in elastic#51247
@tlrx tlrx closed this as completed in e1096b9 Mar 18, 2020
tlrx added a commit that referenced this issue Mar 18, 2020
This is a partial restore of #43158, following decision taken in #51247

Closes #51247
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement
Projects
None yet
Development

No branches or pull requests

4 participants