Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude nested documents in LuceneChangesSnapshot #51279

Merged
merged 5 commits into from
Jan 22, 2020

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jan 22, 2020

LuceneChangesSnapshot can be slow if nested documents are heavily used. Also, it estimates the number of operations to be recovered in peer recoveries inaccurately. With this change, we prefer excluding the nested non-root documents in a Lucene query instead.

Kudos to @DaveCTurner for finding this issue. See: https://discuss.elastic.co/t/es-7-5-translog-recovery-is-extremely-slow/215505/11.

@dnhatn dnhatn added >enhancement :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v8.0.0 v7.6.1 labels Jan 22, 2020
@dnhatn dnhatn requested review from ywelsch and jimczi January 22, 2020 02:40
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Distributed)

@dnhatn dnhatn added v7.7.0 and removed v7.6.1 labels Jan 22, 2020
@dnhatn dnhatn requested a review from jpountz January 22, 2020 03:18
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me, but I'd like another pair of eyes to look at it too in case there's more traps in this area of which I'm unaware. I left a comment on comments.

Can we do something similar to address the user's issue with not falling back on file-based recovery eagerly enough? It's a known issue:

// NB safeCommitInfo.docCount is a very low-level count of the docs in the index, and in particular if this shard contains nested
// docs then safeCommitInfo.docCount counts every child doc separately from the parent doc. However every part of a nested document
// has the same seqno, so we may be overestimating the cost of a file-based recovery when compared to an ops-based recovery and
// therefore preferring ops-based recoveries inappropriately in this case. Correctly accounting for nested docs seems difficult to
// do cheaply, and the circumstances in which this matters should be relatively rare, so we use this naive calculation regardless.
// TODO improve this measure for when nested docs are in use

It would be good if we could more accurately estimate the "size" of the safe commit to compare it to the operation count in this method.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I adapted the version labels as this is a bugfix

@dnhatn
Copy link
Member Author

dnhatn commented Jan 22, 2020

Can we do something similar to address the user's issue with not falling back on file-based recovery eagerly enough?

Sure, I will work on a follow-up for this.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments but otherwise LGTM.

@dnhatn
Copy link
Member Author

dnhatn commented Jan 22, 2020

Thanks everyone :)

@dnhatn dnhatn merged commit 1511486 into elastic:master Jan 22, 2020
@dnhatn dnhatn deleted the nested-docs branch January 22, 2020 17:13
dnhatn added a commit that referenced this pull request Jan 22, 2020
LuceneChangesSnapshot can be slow if nested documents are heavily used.
Also, it estimates the number of operations to be recovered in peer
recoveries inaccurately. With this change, we prefer excluding the
nested non-root documents in a Lucene query instead.
dnhatn added a commit that referenced this pull request Jan 22, 2020
LuceneChangesSnapshot can be slow if nested documents are heavily used.
Also, it estimates the number of operations to be recovered in peer
recoveries inaccurately. With this change, we prefer excluding the
nested non-root documents in a Lucene query instead.
debadair pushed a commit to debadair/elasticsearch that referenced this pull request Jan 28, 2020
LuceneChangesSnapshot can be slow if nested documents are heavily used. 
Also, it estimates the number of operations to be recovered in peer
recoveries inaccurately. With this change, we prefer excluding the
nested non-root documents in a Lucene query instead.
@polyfractal polyfractal added v7.6.0 and removed v7.6.1 labels Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v7.6.0 v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants