Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransportListTaskAction: wait for tasks to finish asynchronously #90977

Merged
merged 20 commits into from
Jan 26, 2023

Conversation

arteam
Copy link
Contributor

@arteam arteam commented Oct 18, 2022

Instead of synchronously blocking a thread in the management pool, add a listener on removed tasks and calls nodeOperation after all matched tasks have been removed. Also add a scheduled tasks to bail out after the specified wait timeout if the tasks haven't been finished.

Fixes #89564, #90988

@arteam
Copy link
Contributor Author

arteam commented Nov 14, 2022

@elasticmachine update branch

@kingherc kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022
@arteam arteam marked this pull request as ready for review January 4, 2023 14:36
@arteam arteam added the :Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. label Jan 4, 2023
@arteam arteam force-pushed the transport-list-task-actions-async branch from f82b7ea to 3e9edef Compare January 4, 2023 14:37
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Jan 4, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @arteam, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @arteam, I've updated the changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Hi @arteam, I've updated the changelog YAML for you.

@arteam
Copy link
Contributor Author

arteam commented Jan 13, 2023

@elasticmachine update branch

Copy link
Contributor

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me so far! Just left a couple of comments before approving, to see if a couple of questions I have (due to my inexperience with this part of the code) are already done or need further attention.


@Override
public void subscribeForRemovedTasks(RemovedTaskListener removedTaskListener) {
waitForWaitingToStart.countDown();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need any additional tests for the new feature added in this PR or are the existing tests already indirectly testing the new feature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we worked around this by always having 2 MANAGEMENT threads in #90193. I think we would get sufficient coverage by reverting that change, and we should do that here.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left just a couple more nits. I'd like someone else to review too tho.

@arteam
Copy link
Contributor Author

arteam commented Jan 23, 2023

@elasticmachine update branch

@fcofdez fcofdez self-requested a review January 23, 2023 16:16
@arteam
Copy link
Contributor Author

arteam commented Jan 23, 2023

@elasticmachine update branch

@DaveCTurner DaveCTurner dismissed their stale review January 23, 2023 20:38

comments all addressed

Copy link
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left a couple of naming suggestions. I wonder if we could simplify this logic by changing how the subclasses interact with the task collection, etc. But I guess that route has been considered already.

@arteam arteam merged commit 6813012 into elastic:main Jan 26, 2023
@arteam arteam deleted the transport-list-task-actions-async branch January 26, 2023 10:20
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Feb 1, 2023
In elastic#90977 we made the list tasks API fully async, but failed to notice
that if we waited for a task to complete then we would respond in the
thread context of the last-completing task. This commit fixes the
problem by restoring the context of the list-tasks task before
responding.

Closes elastic#93428
DaveCTurner added a commit that referenced this pull request Feb 2, 2023
In #90977 we made the list tasks API fully async, but failed to notice
that if we waited for a task to complete then we would respond in the
thread context of the last-completing task. This commit fixes the
problem by restoring the context of the list-tasks task before
responding.

Closes #93428
mark-vieira pushed a commit to mark-vieira/elasticsearch that referenced this pull request Feb 2, 2023
In elastic#90977 we made the list tasks API fully async, but failed to notice
that if we waited for a task to complete then we would respond in the
thread context of the last-completing task. This commit fixes the
problem by restoring the context of the list-tasks task before
responding.

Closes elastic#93428
arteam added a commit that referenced this pull request Feb 6, 2023
Wait for the requested task asynchronously in a similar fashion to TransportListTaskAction from #90977

See #90977

---------

Co-authored-by: David Turner <david.turner@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. >enhancement Team:Distributed Meta label for distributed team v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] XPackRestIT test {p0=ml/upgrade_job_snapshot/Test existing but corrupt snapshot} failing
6 participants