Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache completion stats between refreshes #51991

Merged

Conversation

DaveCTurner
Copy link
Contributor

Computing the stats for completion fields may involve a significant amount of
work since it walks every field of every segment looking for completion fields.
Innocuous-looking APIs like GET _stats or GET _cluster/stats do this for
every shard in the cluster. This repeated work is unnecessary since these stats
do not change between refreshes; in many indices they remain constant for a
long time.

This commit introduces a cache for these stats which is invalidated on a
refresh, allowing most stats calls to bypass the work needed to compute them on
most shards.

Closes #51915

Computing the stats for completion fields may involve a significant amount of
work since it walks every field of every segment looking for completion fields.
Innocuous-looking APIs like `GET _stats` or `GET _cluster/stats` do this for
every shard in the cluster. This repeated work is unnecessary since these stats
do not change between refreshes; in many indices they remain constant for a
long time.

This commit introduces a cache for these stats which is invalidated on a
refresh, allowing most stats calls to bypass the work needed to compute them on
most shards.

Closes elastic#51915
@DaveCTurner DaveCTurner added >enhancement :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.7.0 labels Feb 6, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Engine)

@jimczi
Copy link
Contributor

jimczi commented Feb 6, 2020

I added a way to load the completion suggester off-heap but never had time to check the performance after the change so we're still loading the entire FST on-heap in ES. Stats shouldn't be costly to retrieve, that should be considered as a bug in the completion field. If we cannot always load the completion suggester off-heap we should probably add a fast path to get the memory consumption without really loading the FST. That's debatable though because getting the stats today loads every completion field in memory which is equivalent to running a query on every completion fields. Once the FST is loaded, the stats call should be fast so I also wonder if the cache is helpful in this case ?

@DaveCTurner
Copy link
Contributor Author

Once the FST is loaded, the stats call should be fast so I also wonder if the cache is helpful in this case ?

I frequently see org.apache.lucene.index.FilterLeafReader.terms() in the hot threads outputs from cases like this. I am interpreting this to mean that we mostly spend time simply looking up fields to find out if they're completion fields or not, and normally they're not.

For instance:

The fact that I do not often see hot threads actually loading the FSTs does suggest that they are indeed shared between calls.

@DaveCTurner
Copy link
Contributor Author

@elasticmachine please run elasticsearch-ci/docs

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveCTurner Thank you for working on this. I left a comment on the caching implementation.

As Jim said, I think we can solve the issue by loading terms of the completion fields only instead of all fields from FieldInfos. We can retrieve the list of the completion fields from the MapperService. WDYT?

List<String> completionFields = StreamSupport.stream(mapperService.fieldTypes().spliterator(), false)
    .filter(field -> field instanceof CompletionFieldMapper.CompletionFieldType)
    .map(MappedFieldType::name).collect(Collectors.toList());

@Override
public void afterRefresh(boolean didRefresh) {
if (didRefresh) {
completionStatsFutureRef.set(null);
Copy link
Member

@dnhatn dnhatn Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of invalidating the entire current cache, we can mark the current cache as outdated (i.e., need to refresh), then we can reuse the stats of some LeafReader that haven't changed between refreshes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we re-use a LeafReader across a refresh, does it keep its suggester loaded? If so, do we not already avoid most of the work of recomputing stats?

Note that we need to break the stats down by field, because the user can select the fields in the API. If I understand correctly I think re-using stats on a per-segment basis too would require tracking everything on a per-segment-per-field basis which seems unnecessary.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find, looks good.

I wonder if you considered checking the mappings of the index for whether it can have any completion terms? I think the caching you have is good anyway, but checking this could speed up the first stats call after a full cluster restart and maybe we could utilize it to not cache for every shard. Not really suggesting to do this in this PR and might not be worth doing it, but would like to hear your opinion on that.

@DaveCTurner
Copy link
Contributor Author

@dnhatn, re:

We can retrieve the list of the completion fields from the MapperService.

True (and maybe that's the real answer here) but today neither Engine has access to a MapperService and that feels right to me. We could plumb it in from the IndexShard of course (yay more dependencies).

@DaveCTurner
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks David.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@DaveCTurner DaveCTurner merged commit a3a98c7 into elastic:master Feb 27, 2020
@DaveCTurner DaveCTurner deleted the 2020-02-05-CompletionStatsCache branch February 27, 2020 07:33
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Feb 27, 2020
Computing the stats for completion fields may involve a significant amount of
work since it walks every field of every segment looking for completion fields.
Innocuous-looking APIs like `GET _stats` or `GET _cluster/stats` do this for
every shard in the cluster. This repeated work is unnecessary since these stats
do not change between refreshes; in many indices they remain constant for a
long time.

This commit introduces a cache for these stats which is invalidated on a
refresh, allowing most stats calls to bypass the work needed to compute them on
most shards.

Closes elastic#51915
Backport of elastic#51991
DaveCTurner added a commit that referenced this pull request Feb 27, 2020
Computing the stats for completion fields may involve a significant amount of
work since it walks every field of every segment looking for completion fields.
Innocuous-looking APIs like `GET _stats` or `GET _cluster/stats` do this for
every shard in the cluster. This repeated work is unnecessary since these stats
do not change between refreshes; in many indices they remain constant for a
long time.

This commit introduces a cache for these stats which is invalidated on a
refresh, allowing most stats calls to bypass the work needed to compute them on
most shards.

Closes #51915
Backport of #51991
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CompletionStats need only be recomputed on a refresh
6 participants