Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force execution of finish shard bulk request #51957

Merged
merged 5 commits into from
Feb 15, 2020

Conversation

Tim-Brooks
Copy link
Contributor

Currently the shard bulk request can be rejected by the write threadpool
after a mapping update. This introduces a scenario where the mapping
listener thread will attempt to finish the request and fsync. This
thread can potentially be a transport thread. This commit fixes this
issue by forcing the finish action to happen on the write threadpool.

Fixes #51904.

Currently the shard bulk request can be rejected by the write threadpool
after a mapping update. This introduces a scenario where the mapping
listener thread will attempt to finish the request and fsync. This
thread can potentially be a transport thread. This commit fixes this
issue by forcing the finish action to happen on the write threadpool.

Fixes elastic#51904.
@Tim-Brooks Tim-Brooks added >bug :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v8.0.0 v7.7.0 labels Feb 5, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/CRUD)

@Tim-Brooks
Copy link
Contributor Author

How far should we back port this?

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment for consideration.


@Override
protected void doRun() {
finishRequest();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that rather than scheduling only finishRequest on the write thread pool, we should execute the entirety of onRejection on the write thread pool (and still forcing execution, since rejecting execution here would be very bad):

ActionListener.wrap(v -> executor.execute(this), this::onRejection)) == false) {

The reason I think this is because we're still doing work in onRejection that is linear in the number of documents in the bulk request. It seems we'd want to get that off of the networking/cluster state applier thread too given that we're going to fork anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should instead let onRejection force-push the onRejection handling onto the queue of the requested executor. The current "direct" handling kind of prioritizes the onRejection handling over everything else in the queue, which I think there is really no good reason for.

Looking at various onRejection handlers, some call listener.onFailure and verifying that none of those do bad things is tricky.

Also notice that onAfter is also called in the caller thread when requests are rejected, this poses similar issues (not that I found a smoking gun there).

Finally, notice that AbstractRunnable.onRejection by default calls onFailure.

I doubt that we careful consider that onAfter and onFailure might run in the current thread when using AbstractRunnable and executing the onRejection handling on the target thread-pool would fix this, making it easier to reason about.

That said, I think this PR is good and I am not objecting to it going in. Following my suggestion is likely to surface a few additional things to resolve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made @jasontedor change.

I think maybe Henning's comment about maybe onRejection should be executed on the thread pool anyway is beyond the scope of this PR? Or at least a larger discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbrooks8, yes, that is fine, I just found it most natural to put it here. I will open a PR with that change so we can discuss based on that PR instead.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment, but this LGTM.

context, null);
}
finishRequest();
// Force the execution to finish the request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth a comment here why we fork to the executor (since it's not obvious from the code).

@jasontedor
Copy link
Member

How far should we back port this?

I think to the 7.6, 7.x, and master branches.

@Tim-Brooks Tim-Brooks merged commit 0dd1ed8 into elastic:master Feb 15, 2020
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Feb 18, 2020
Currently the shard bulk request can be rejected by the write threadpool
after a mapping update. This introduces a scenario where the mapping
listener thread will attempt to finish the request and fsync. This
thread can potentially be a transport thread. This commit fixes this
issue by forcing the finish action to happen on the write threadpool.

Fixes elastic#51904.
Tim-Brooks added a commit that referenced this pull request Feb 25, 2020
Currently the shard bulk request can be rejected by the write threadpool
after a mapping update. This introduces a scenario where the mapping
listener thread will attempt to finish the request and fsync. This
thread can potentially be a transport thread. This commit fixes this
issue by forcing the finish action to happen on the write threadpool.

Fixes #51904.
Tim-Brooks added a commit that referenced this pull request Feb 25, 2020
Currently the shard bulk request can be rejected by the write threadpool
after a mapping update. This introduces a scenario where the mapping
listener thread will attempt to finish the request and fsync. This
thread can potentially be a transport thread. This commit fixes this
issue by forcing the finish action to happen on the write threadpool.

Fixes #51904.
@Tim-Brooks Tim-Brooks deleted the force_shard_finish branch April 30, 2020 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elasticsearch is fsyncing on transport threads
5 participants