Cache completion stats between refreshes #51991

DaveCTurner · 2020-02-06T10:43:40Z

Computing the stats for completion fields may involve a significant amount of
work since it walks every field of every segment looking for completion fields.
Innocuous-looking APIs like GET _stats or GET _cluster/stats do this for
every shard in the cluster. This repeated work is unnecessary since these stats
do not change between refreshes; in many indices they remain constant for a
long time.

This commit introduces a cache for these stats which is invalidated on a
refresh, allowing most stats calls to bypass the work needed to compute them on
most shards.

Closes #51915

Computing the stats for completion fields may involve a significant amount of work since it walks every field of every segment looking for completion fields. Innocuous-looking APIs like `GET _stats` or `GET _cluster/stats` do this for every shard in the cluster. This repeated work is unnecessary since these stats do not change between refreshes; in many indices they remain constant for a long time. This commit introduces a cache for these stats which is invalidated on a refresh, allowing most stats calls to bypass the work needed to compute them on most shards. Closes elastic#51915

elasticmachine · 2020-02-06T10:43:43Z

Pinging @elastic/es-distributed (:Distributed/Engine)

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

server/src/main/java/org/elasticsearch/index/engine/Engine.java

jimczi · 2020-02-06T11:56:05Z

I added a way to load the completion suggester off-heap but never had time to check the performance after the change so we're still loading the entire FST on-heap in ES. Stats shouldn't be costly to retrieve, that should be considered as a bug in the completion field. If we cannot always load the completion suggester off-heap we should probably add a fast path to get the memory consumption without really loading the FST. That's debatable though because getting the stats today loads every completion field in memory which is equivalent to running a query on every completion fields. Once the FST is loaded, the stats call should be fast so I also wonder if the cache is helpful in this case ?

DaveCTurner · 2020-02-06T12:28:09Z

Once the FST is loaded, the stats call should be fast so I also wonder if the cache is helpful in this case ?

I frequently see org.apache.lucene.index.FilterLeafReader.terms() in the hot threads outputs from cases like this. I am interpreting this to mean that we mostly spend time simply looking up fields to find out if they're completion fields or not, and normally they're not.

For instance:

The fact that I do not often see hot threads actually loading the FSTs does suggest that they are indeed shared between calls.

DaveCTurner · 2020-02-06T13:10:04Z

@elasticmachine please run elasticsearch-ci/docs

dnhatn

@DaveCTurner Thank you for working on this. I left a comment on the caching implementation.

As Jim said, I think we can solve the issue by loading terms of the completion fields only instead of all fields from FieldInfos. We can retrieve the list of the completion fields from the MapperService. WDYT?

List<String> completionFields = StreamSupport.stream(mapperService.fieldTypes().spliterator(), false)
    .filter(field -> field instanceof CompletionFieldMapper.CompletionFieldType)
    .map(MappedFieldType::name).collect(Collectors.toList());

dnhatn · 2020-02-06T15:34:56Z

server/src/main/java/org/elasticsearch/index/engine/CompletionStatsCache.java

+    @Override
+    public void afterRefresh(boolean didRefresh) {
+        if (didRefresh) {
+            completionStatsFutureRef.set(null);


Instead of invalidating the entire current cache, we can mark the current cache as outdated (i.e., need to refresh), then we can reuse the stats of some LeafReader that haven't changed between refreshes.

If we re-use a LeafReader across a refresh, does it keep its suggester loaded? If so, do we not already avoid most of the work of recomputing stats?

Note that we need to break the stats down by field, because the user can select the fields in the API. If I understand correctly I think re-using stats on a per-segment basis too would require tracking everything on a per-segment-per-field basis which seems unnecessary.

henningandersen

Great find, looks good.

I wonder if you considered checking the mappings of the index for whether it can have any completion terms? I think the caching you have is good anyway, but checking this could speed up the first stats call after a full cluster restart and maybe we could utilize it to not cache for every shard. Not really suggesting to do this in this PR and might not be worth doing it, but would like to hear your opinion on that.

server/src/main/java/org/elasticsearch/index/engine/CompletionStatsCache.java

server/src/main/java/org/elasticsearch/index/engine/Engine.java

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

rest-api-spec/src/main/resources/rest-api-spec/test/indices.stats/40_updates_on_refresh.yml

DaveCTurner · 2020-02-10T17:22:47Z

@dnhatn, re:

We can retrieve the list of the completion fields from the MapperService.

True (and maybe that's the real answer here) but today neither Engine has access to a MapperService and that feels right to me. We could plumb it in from the IndexShard of course (yay more dependencies).

DaveCTurner · 2020-02-21T16:29:50Z

@elasticmachine update branch

dnhatn

LGTM, thanks David.

henningandersen

LGTM.

Computing the stats for completion fields may involve a significant amount of work since it walks every field of every segment looking for completion fields. Innocuous-looking APIs like `GET _stats` or `GET _cluster/stats` do this for every shard in the cluster. This repeated work is unnecessary since these stats do not change between refreshes; in many indices they remain constant for a long time. This commit introduces a cache for these stats which is invalidated on a refresh, allowing most stats calls to bypass the work needed to compute them on most shards. Closes elastic#51915 Backport of elastic#51991

Computing the stats for completion fields may involve a significant amount of work since it walks every field of every segment looking for completion fields. Innocuous-looking APIs like `GET _stats` or `GET _cluster/stats` do this for every shard in the cluster. This repeated work is unnecessary since these stats do not change between refreshes; in many indices they remain constant for a long time. This commit introduces a cache for these stats which is invalidated on a refresh, allowing most stats calls to bypass the work needed to compute them on most shards. Closes #51915 Backport of #51991

DaveCTurner added >enhancement :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.7.0 labels Feb 6, 2020

DaveCTurner requested review from dnhatn and henningandersen February 6, 2020 10:43

DaveCTurner commented Feb 6, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/index/engine/Engine.java Outdated Show resolved Hide resolved

DaveCTurner added 2 commits February 6, 2020 10:51

Merge branch 'master' into 2020-02-05-CompletionStatsCache

a590d74

Imports

db5be3a

dnhatn reviewed Feb 6, 2020

View reviewed changes

henningandersen reviewed Feb 7, 2020

View reviewed changes

DaveCTurner added 4 commits February 10, 2020 16:33

Merge branch 'master' into 2020-02-05-CompletionStatsCache

2adb2b9

Push down

c3ae272

Don't cache exceptional results

5e9a3ab

wait_for_events

e3262eb

Impors

a5d200e

DaveCTurner mentioned this pull request Feb 12, 2020

Push back on excessive requests for stats #51992

Open

DaveCTurner requested review from dnhatn and henningandersen February 21, 2020 16:29

Merge branch 'master' into 2020-02-05-CompletionStatsCache

f19cd76

dnhatn approved these changes Feb 24, 2020

View reviewed changes

henningandersen approved these changes Feb 25, 2020

View reviewed changes

DaveCTurner merged commit a3a98c7 into elastic:master Feb 27, 2020

DaveCTurner deleted the 2020-02-05-CompletionStatsCache branch February 27, 2020 07:33

DaveCTurner mentioned this pull request Feb 27, 2020

Cache completion stats between refreshes #52872

Merged

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket (Part 2) elastic/elasticsearch-net#4533

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache completion stats between refreshes #51991

Cache completion stats between refreshes #51991

DaveCTurner commented Feb 6, 2020

elasticmachine commented Feb 6, 2020

jimczi commented Feb 6, 2020 •

edited

Loading

DaveCTurner commented Feb 6, 2020

DaveCTurner commented Feb 6, 2020

dnhatn left a comment

dnhatn Feb 6, 2020 •

edited

Loading

DaveCTurner Feb 10, 2020

henningandersen left a comment

DaveCTurner commented Feb 10, 2020

DaveCTurner commented Feb 21, 2020

dnhatn left a comment

henningandersen left a comment

Cache completion stats between refreshes #51991

Cache completion stats between refreshes #51991

Conversation

DaveCTurner commented Feb 6, 2020

elasticmachine commented Feb 6, 2020

jimczi commented Feb 6, 2020 • edited Loading

DaveCTurner commented Feb 6, 2020

DaveCTurner commented Feb 6, 2020

dnhatn left a comment

Choose a reason for hiding this comment

dnhatn Feb 6, 2020 • edited Loading

Choose a reason for hiding this comment

DaveCTurner Feb 10, 2020

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

DaveCTurner commented Feb 10, 2020

DaveCTurner commented Feb 21, 2020

dnhatn left a comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

jimczi commented Feb 6, 2020 •

edited

Loading

dnhatn Feb 6, 2020 •

edited

Loading