Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RestGetRollupCapsAction and RestGetRollupIndexCapsAction invoke expensive GetRollupCapsAction on transport threads #92179

Closed
original-brownbear opened this issue Dec 7, 2022 · 7 comments · Fixed by #98314
Assignees
Labels
>bug :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@original-brownbear
Copy link
Member

original-brownbear commented Dec 7, 2022

This is a known issue around rest actions. They execute transport actions in the current thread instead of the thread that the action is registered for with the transport service in code like:

return channel -> client.execute(GetRollupIndexCapsAction.INSTANCE, request, new RestToXContentListener<>(channel));

This is normally not an issue since the expensive actions that also have a REST endpoint are TransportMasterNodeActions. In the case of RestGetRollupCapsAction and RestGetRollupIndexCapsAction which aren't, this breaks the fix in #89803 if the action is used via the REST layer.

-> I think until we figure out if we want to change the way threading works with the client in the above code, we should manually fork to the management pool in these actions to fix the slow action blocking transport threads.

@original-brownbear original-brownbear added >bug :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data labels Dec 7, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Dec 7, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@original-brownbear original-brownbear changed the title RestGetRollupCapsAction and RestGetRollupIndexCapsAction invokes expensive GetRollupCapsAction on transport threads RestGetRollupCapsAction and RestGetRollupIndexCapsAction invoke expensive GetRollupCapsAction on transport threads Jan 3, 2023
@wchaparro
Copy link
Contributor

With the 8.7 release of Elasticsearch, we have made a new downsampling capability associated with the new time series datastreams functionality generally available (GA). This capability was in tech preview in ILM since 8.5. Downsampling provides a method to reduce the footprint of your time series data by storing it at reduced granularity. The downsampling process rolls up documents within a fixed time interval into a single summary document. Each summary document includes statistical representations of the original data: the min, max, sum, value_count, and average for each metric. Data stream time series dimensions are stored unchanged.

Downsampling is superior to rollup because:

  • Downsampled indices are searched through the _search API
  • It is possible to query multiple downsampled indices together with raw data indices
  • The pre-aggregation is based on the metrics and time series definitions in the index mapping so very little configuration is required (i.e. much easier to add new time serieses)
  • Downsampling is managed as an action in ILM
  • It is possible to downsample a downsampled index, and reduce granularity as the index ages
  • The performance of the pre-aggregation process is superior in downsampling, as it builds on the time_series index mode infrastructure

Because of the introduction of this new capability, we are deprecating the rollups functionality, which never left the Tech Preview/Experimental status, in favor of downsampling and thus we are closing this issue. We encourage you to migrate your solution to downsampling and take advantage of the new TSDB functionality.

@wchaparro wchaparro closed this as not planned Won't fix, can't repro, duplicate, stale Jun 30, 2023
@DaveCTurner
Copy link
Contributor

I think we need to keep this open, it's a pretty bad bug (it can have a severe performance impact on unrelated operations and in extreme cases will destabilise the cluster). Deprecating the feature isn't really enough to stop folks from hitting this bug, and we will live with this deprecated feature for a very long time yet. Also the fix should be pretty straightforward, just fork the expensive computation to a better threadpool manually.

@DaveCTurner DaveCTurner reopened this Jul 20, 2023
@DaveCTurner
Copy link
Contributor

DaveCTurner commented Jul 24, 2023

To give a sense of the scale of the problem this bug causes, according to the overview cluster I count several thousand cases where these actions blocked a transport thread for over 5s in the last week:

image

@salvatore-campagna
Copy link
Contributor

salvatore-campagna commented Jul 24, 2023

Isn't this just a matter of replacing this

return channel -> client.execute(GetRollupIndexCapsAction.INSTANCE, request, new RestToXContentListener<>(channel));

with something like this?

return channel -> client.threadPool().executor(ThreadPool.Names.MANAGEMENT).execute...;

(for both)

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Jul 24, 2023

Yeah something like that, plus some calls to org.elasticsearch.transport.Transports#assertNotTransportThread within the action.

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Aug 2, 2023

FWIW we've worked around #97916 in various other places in the transport layer rather than the REST layer as follows:

// TODO replace SAME when removing workaround for https://github.com/elastic/elasticsearch/issues/97916
super(FieldCapabilitiesAction.NAME, transportService, actionFilters, FieldCapabilitiesRequest::new, ThreadPool.Names.SAME);

...

@Override
protected void doExecute(Task task, FieldCapabilitiesRequest request, final ActionListener<FieldCapabilitiesResponse> listener) {
// workaround for https://github.com/elastic/elasticsearch/issues/97916 - TODO remove this when we can
searchCoordinationExecutor.execute(ActionRunnable.wrap(listener, l -> doExecuteForked(task, request, l)));
}
private void doExecuteForked(Task task, FieldCapabilitiesRequest request, final ActionListener<FieldCapabilitiesResponse> listener) {

Doing in the REST layer would also be ok, but this is the established workaround pattern. Either way, please include https://github.com/elastic/elasticsearch/issues/97916 in a nearby comment so we can remove all the workarounds later on.

@csoulios csoulios self-assigned this Aug 8, 2023
csoulios added a commit that referenced this issue Aug 9, 2023
As discussed in #92179, RestGetRollupCapsAction and RestGetRollupIndexCapsAction 
invoke expensive GetRollupCapsAction on transport threads.

This PR modifies the rest actions so that they invoke GetRollupCapsAction in a separate thread

Fixes #92179
Relates to #97916
DaveCTurner added a commit to DaveCTurner/elasticsearch-support-diagnostics that referenced this issue Sep 24, 2023
These APIs are potentially harmful to large clusters prior to 8.10, see
elastic/elasticsearch#92179.
pickypg pushed a commit to elastic/support-diagnostics that referenced this issue Sep 25, 2023
These APIs are potentially harmful to large clusters prior to 8.10, see
elastic/elasticsearch#92179.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants